Skip to content

Latest commit

 

History

History
28 lines (21 loc) · 4.15 KB

README.md

File metadata and controls

28 lines (21 loc) · 4.15 KB

Spark Performance Tuning

This repository is the ultimate guide for mastering advanced Spark Performance Tuning and Optimization concepts and for anyone preparing for Data Engineering Interviews involving Spark. Additionally, this repository serves as a reference for all the code snippets used in my Spark Performance Tuning Playlist on YouTube. The goal of the playlist and the accompanying code snippets is to make complex concepts in Apache Spark easy to understand, while also developing a deep understanding of how things work under the hood.

Concepts Covered

Concept YouTube Link Code
Spark Query Plans YouTube Python
Spark DAGs YouTube Python
Spark Memory Management YouTube
Spark Executor Tuning YouTube
Shuffle Partitions YouTube
Data Partitioning YouTube Python
Bucketing YouTube Python
Caching YouTube Python
Data Skew YouTube Python
Salting YouTube Python
AQE & Broadcast Joins YouTube Python
Dynamic Partition Pruning YouTube Python

Contact

For any questions or feedback, feel free to reach out: