Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sharded probe table #3556

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from
Draft

Sharded probe table #3556

wants to merge 7 commits into from

Conversation

samster25
Copy link
Member

No description provided.

@samster25 samster25 changed the title Shared probe table Sharded probe table Dec 12, 2024
colin-ho added a commit that referenced this pull request Dec 17, 2024
Optimize swordfish grouped aggs for high cardinality groups

### Approach

There's 3 strategies for grouped aggs:
1. Partition each input morsel into `N` partitions, then do a partial
agg. (good for high cardinality).
2. Do a partial agg, then partition into `N` partitions. (good for low
cardinality). Can be optimized with
#3556
3. Partition only, no partial agg. (only for map_groups, which has no
partial agg).

### Notes on alternative approaches
- Distributing partitions across workers (i.e. having each worker being
responsible for accumulating only one partition) is much slower for low
cardinality aggs (TPCH Q1 would have been 1.5x slower). This is because
most of the work will end up being on only a few workers, reducing
parallelism.
- Simply partitioning the input and then only aggregating at the end
works well with higher cardinality, but low cardinality takes a hit.
(TPCH Q1 would have been 2.5x slower).
- Probe Table approach was much slower, due to many calls to the
multi-table dyn comparator. It was also much more complex to implement.

### Benchmarks
[MrPowers Benchmarks](https://github.com/MrPowers/mrpowers-benchmarks)
results (seconds, lower is better).

| Query | this PR | Pyrunner | Current swordfish |
|-------|---------|----------|-------------------|
| q1    | 0.285720| 0.768858 | 0.356499         |
| q2    | 4.780064| 6.122199 | 53.340565        |
| q3    | 2.201079| 3.922857 | 16.935125        |
| q4    | 0.313106| 0.545192 | 0.335541         |
| q5    | 1.618228| 2.889354 | 10.665339        |
| q7    | 2.087872| 3.856998 | 16.072660        |
| q10   | 6.306756| 8.173738 | 53.800501        |

---------

Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Colin Ho <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Base automatically changed from colin/swordfish-grouped-aggs to main December 17, 2024 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant