-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Ray shuffle experiment #2883
Conversation
CodSpeed Performance ReportMerging #2883 will not alter performanceComparing Summary
|
…a if this works though
6b5a053
to
cef8843
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2883 +/- ##
==========================================
- Coverage 78.71% 78.38% -0.34%
==========================================
Files 597 599 +2
Lines 69735 70131 +396
==========================================
+ Hits 54895 54969 +74
- Misses 14840 15162 +322
Flags with carried forward coverage won't be shown. Click here to find out more.
|
…an variant (#3083) # Summary Refactors our `PhysicalPlan` by adding a new `PhysicalPlan::ShuffleExchange` variant. ## Why? **This PR makes it easier to add new shuffle implementations, by simply extending the `PhysicalPlan::ShuffleExchange` enum** 1. To users, the physical plan looks a lot more legible (even I don't really know what a "Flatten" is...) 2. Easier to add new types of shuffles... Also, there's a nice place to add a layer of heuristics to figure out when to use the new shuffles (the new `ShuffleExchangeBuilder` builder can let us direct clients to use more complex shuffles by consulting environment variables, available cluster resources, number of partitions etc) ## Elaboration Our previous PhysicalPlan was a bit of a leaky abstraction, expressing 2 types of shuffles by invoking an unnecessarily low-level chain of operations to users of the abstraction, as well as users of Daft: 1. `FanoutHash/Range/Random -> Flatten -> ReduceMerge`: this pattern was used to express a "NaiveFullyMaterializingMapReduce" shuffle (materialize all Map tasks, perform a "flatten" which makes no sense to users, then reduce and merge) 3. `Split/Coalesce -> Flatten`: this pattern was used to express a "SplitOrCoalesceToTargetNum" shuffle This leaky abstraction makes it difficult to add new types of shuffles (e.g. the push-based shuffle implemented in #2883) as it involves adding new PhysicalPlan variant(s). Additionally, all of these shuffles share similar characteristics during plan optimization, and the actual implementation of "how" to execute these shuffles should be highly dependent on factors such as available cluster resources, expected complexity of the shuffle and more. --------- Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
@jaychia is PR still relevant after merging in your other one? |
No description provided.