-
Notifications
You must be signed in to change notification settings - Fork 174
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
There's a couple outstanding issues / inefficiencies / ugliness in the swordfish code. I originally intended of breaking these up into smaller PRs, but during the process of trying to split it up, I realized that all the changes are quite intertwined, and it may be easier on the reviewer to just see all of them in one. That being said, I'll try my best to explain all the changes and rationale in detail. ### Problems - The `PipelineChannel` abstraction doesn't play well with streaming sinks, in fact, in only really works for intermediate ops. This is because the `PipelineChannel` assumes sending/receiving is round robin, but streaming sinks can send data from the workers, as well as after the finalize. - Pipeline nodes currently send across both data and probe tables in the same channel, which can be confusing. Furthermore, the logic of dispatching probe tables and data is also different, adding more complexity. Probe tables need to be broadcasted to all workers, while data should only be distributed to a single worker. - If an operator does not require input to be ordered, it shouldn't dispatch work to workers in round robin, it should dispatch to any available worker. - Some operators don't do work in their `execute` or `finalize` methods. These don't need to be spawned on the compute runtime. ### Proposed Changes - Pipeline nodes should just send data across, via a simple `Receiver<Arc<Micropartition>>`. - Probe tables should be sent separately from data, via a `ProbeStateBridge`. - Implement an unordered dispatcher, an optimization for if ordering is not required. This uses `loole` channels, which are multi-producer multi-consumer. For consistency, use `loole` channels across all swordfish. In the future we can think about implementing our custom channels. https://users.rust-lang.org/t/loole-a-safe-async-sync-multi-producer-multi-consumer-channel/113785 - Give the compute runtime to the operators, and let them decide whether they want to spawn. --------- Co-authored-by: Colin Ho <[email protected]> Co-authored-by: Colin Ho <[email protected]>
- Loading branch information
1 parent
5c00dbc
commit ca8f25d
Showing
30 changed files
with
1,478 additions
and
1,078 deletions.
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.