Take the plunge of distributed machine learning training with Spark, Pytorch and TensorFlow
A wee bit about me: I am an experienced Software Engineer and people manager with technical expertise in Apache Kafka, Flink, Spark, HDFS, AWS, Azure, machine learning, and distributed large-scale systems.
I'm highly motivated, and always excited about solving problems and learning. I possess a curious, positive, and can-do attitude. Drove success and improvement for both distributed machine systems and people systems, optimizing Spark cluster, driving +350% throughout at Akamai scale [billions of events a day, processing 1.3PT], saving the company money on compute and optimizing complex ml model deployment from months to 2–3 days cycle, by aligning people-based systems, influencing strategic software integrations, and adopting software best practices.
I have been honored with the Beacon award in the Databricks Ambassadors Program, a testament to my commitment to contributing to data and AI technologies and sharing my expertise with others.
virtual-kubelet-kotlin-spring - how to leverage virtual kubelete and manage serverless services from your Kubernetes cluster
build-e2e-ml-bigdata - full end-to-end application on creating machine learning pipelines on top of parquet compressed data leveraging cloud services.
Author of O’Reilly’s book: Scaling Machine Learning with Spark: Distributed ML with MLlib, TensorFlow, and PyTorch, Adi Polak, 2023.
“Small Files in a Big Data World.” Chapter By Adi Polak, In 97 Things Every Data Engineer Should Know. Edited by Tobias Macey. O’Reilly, 2021: 131-133.
“Three Important Distributed Programming Concepts.” Chapter By Adi Polak, In 97 Things Every Data Engineer Should Know. Edited by Tobias Macey. O’Reilly, 2021: 175-176.
“Deploying Kubernetes in an Enterprise Environment.” in Kubernetes in the Enterprise Trends Report. DZone, 2020.
“Big Data Building Blocks: Selecting Architectures and Open-Source Frameworks.” In DZone 2019 Guide to Big Data. DZone, 2019.
Technical reviewer for Delta Lake: The Definitive Guide, O’Reilly Media, and Databricks, upcoming book, 2024.
Technical reviewer for Fundamentals of Data Observability. O’Reilly Media and Andy Patrella, 2023.
Technical reviewer for Introducing MLOps. How to Scale Machine Learning in the Enterprise. O’Reilly Media and Dataiku, 2020.
Committee member at conferences: Scale By the Bay 2021 & 2023, Data & AI/Spark Summit 2021, 2022 & 2023, Voxxeddays Australia 2021.
“Apache Spark ML First Steps. How to Build Your Own Machine Learning Model at Scale.” Presentation for O'Reilly Media, Inc., July 15, 2020.
“Demystifying Scalable Machine Learning with the Spark Ecosystem.” AI Superstream Series: Scaling AI” Course for O'Reilly Media, Inc., September 2021.
“CI/CD for Data Lakes, Managing your data like code.” Presentation for O'Reilly Media, Inc., December. 7, 2022.
“Scaling Machine Learning in 3 weeks.” Three weeks course for O'Reilly Media, Inc. February 10, 17 & 24, 2023.
FlipCon – co-organization of functional programming conference, 2018. KotlinTLV – co-leading the KotlinTLV meetup group, 2019. She Codes – Nationwide Director of Coding Skills, March 2017 to October 2018. BIPA – Team Lead at Germany - Bavaria Israel Partnership Accelerator, driving innovative solutions to traditional markets from 2016 to 2017.
“Unlock The Full Business Value Of Data With A Better Engineering Process,” in Forbes. May 26, 2022. “COVID-19 and Mining Social Media - Enabling Machine Learning Workloads with Big Data,” InfoQ. October 2, 2022. “What is Serverless SQL? And How to Use it for Data Exploration,” Towards Data Science. December 1, 2020. “What is TensorFrames? TensorFlow + Apache Spark,” Microsoft Azure. March 25. 2019. “Data at Scale: Learn How Predicate Pushdown Will Save You Money.” Microsoft Azure. December 18, 2018. “Apache Spark — Catalyst Deep Dive,” Microsoft Azure. November 13, 2018.