Enable push partial aggregation though join #23812

raunaqmorarka · 2024-10-17T04:25:40Z

Description

For queries like

select sum(sales) from fact, date_dim where fact.date_id = date_dim.date_id group by date_dim.year

partial aggregation on date_dim.year can be pushed below join with grouping key of "date_id", which should can reduce number of rows greatly before join operator.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## General
* Improve performance of queries with grouping on joins. ({issue}`23812`)

martint · 2024-10-17T16:32:06Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/AggregationNode.java

+    /**
+     * Indicates whether aggregation is potentially reducing rows that are propagated though exchange operator.
+     */
+    private final Optional<Boolean> exchangeInputAggregation;


This seems like an abstraction leakage. The aggregation node shouldn't have any knowledge about whether it feeds into an exchange.

What's the concept this is trying to capture? Let's think of a name that's more descriptive of such concept without tying it to the physicality of an exchange.

The problem that is solved here:
PushPartialAggregationThroughJoin might want to keep or not intermediate aggregation above join. This is governed by:

// Keep intermediate aggregation below remote exchange to reduce network traffic. // Intermediate aggregation can be skipped if pushed aggregation has subset of grouping // symbols as join is not expanding. if (aggregation.isExchangeInputAggregation() && !ImmutableSet.copyOf(aggregation.getGroupingKeys()).containsAll(pushedAggregation.getGroupingKeys())) { result = toIntermediateAggregation(aggregation, result, context); }

This is based on empirical observation and tuning:
a. We don't want to always keep intermediate aggregations above join as it would lead to multiple intermediate aggregations if there are many joins on top of each other. This is causing significant regressions.
b. We also want to make special case for PA before data shuffle because even if PA is pushed below join, than CBO rule could make wrong decision, which we want to contain.

@martint do you have some better idea for a name?

I can do another experiment with:

if (context.getStatsProvider().getStats(aggregation).getOutputRowCount() * 1.1 >= context.getStatsProvider().getStats(pushedAggregation).getOutputRowCount()) { return result; } // if aggregation is reducing data, keep it return toIntermediateAggregation(aggregation, result, context);

to see if we can actually improve rule more and remove intermediate aggregation before exchange.

The thing is that for queries like

select sum(sales) from fact, date_dim where fact.date_id = date_dim.date_id group by date_dim.year

PA on date_id will be pushed below join, but keeping intermediate aggregation on year still makes sense.

I would still lean on keeping intermediate aggregation before exchange just in case

I see. That's brittle. There are no guarantees that the node will remain to be the input of an exchange if other rules push things below the join afterwards, for instance.

If we can model it solely based on properties of the aggregation (e.g., whether it reduces the size of its input, as you suggested) that would be better.

I've benchmarked Relax intermediate aggregation constraint, but I've seen some pretty big regressions like q47 (170%) vs initial approach. I suggest to keep initial approach

We don't want to always keep intermediate aggregations above join as it would lead to multiple intermediate aggregations if there are many joins on top of each other. This is causing significant regressions.

Ideally, shouldn't adaptive partial aggregation avoid big regressions in this case ? Is this happening because partial/intermediate aggregation has significant cost even in the "disabled" state ?

How about we rename this to isInputReducingAggregation ?

Yeah, that would be aligned with what I was suggesting above, but what needs to change to address the regression @sopel39 described?

Also, what would be the definition of “input reducing aggregation”?

As I understood it, the optimizer might be wrong about the decision to push down partial aggregation through join, so we want to retain a partial aggregation on top of the join while pushing a partial aggregation through. The simple approach would be to leave an intermediate aggregation on top of every join after pushdown. But when we have multiple joins on top of each other in a fragment, this leads to many intermediate aggregations. Ideally, these should adaptively turn off at runtime, but this is still not free and leads to significant regressions. Hence the extra field in AggregationNode to retain intermediate aggregation only on the join before remote data exchange. There's no regressions when benchmarking with this approach.
I would define isInputReducingAggregation as an auxiliary aggregation step introduced as a hedge against potentially non-optimal decision to push down partial aggregation more aggressively.

...ain/src/main/java/io/trino/sql/planner/iterative/rule/PushPartialAggregationThroughJoin.java

sopel39 · 2024-10-18T13:20:02Z

fyi: benchmark results:

20-25% gain for sf10000 part iceberg!

sopel39 · 2024-10-22T11:33:25Z

I've benchmarked Relax intermediate aggregation constraint, but I've seen some pretty big regressions like q47 (170%) vs initial approach. I suggest to keep initial approach

raunaqmorarka · 2024-10-23T07:46:16Z

Partial agg pushdown iceberg parquet partitioned.pdf

Partial agg pushdown iceberg parquet unpartitioned.pdf

sopel39 · 2024-10-23T10:57:27Z

yeah, q47 regression is pretty big with alternative approach

core/trino-main/src/main/java/io/trino/sql/planner/OptimizerConfig.java

...ain/src/main/java/io/trino/sql/planner/iterative/rule/PushPartialAggregationThroughJoin.java

raunaqmorarka · 2024-11-18T17:40:10Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/AggregationNode.java

+    /**
+     * Indicates whether aggregation is potentially reducing rows that are propagated though exchange operator.
+     */
+    private final Optional<Boolean> exchangeInputAggregation;


We don't want to always keep intermediate aggregations above join as it would lead to multiple intermediate aggregations if there are many joins on top of each other. This is causing significant regressions.

Ideally, shouldn't adaptive partial aggregation avoid big regressions in this case ? Is this happening because partial/intermediate aggregation has significant cost even in the "disabled" state ?

raunaqmorarka · 2024-11-18T17:49:09Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/AggregationNode.java

+    /**
+     * Indicates whether aggregation is potentially reducing rows that are propagated though exchange operator.
+     */
+    private final Optional<Boolean> exchangeInputAggregation;


How about we rename this to isInputReducingAggregation ?

martint · 2024-11-20T19:46:35Z

core/trino-main/src/main/java/io/trino/sql/planner/plan/AggregationNode.java

+    /**
+     * Indicates whether this is an auxiliary aggregation step introduced as a hedge against
+     * potentially non-optimal decision to push down partial aggregation more aggressively.
+     */


This should be described in terms of what the aggregation is expected to do, or what kind of input would result in this aggregation "reducing input". For instance, one could argue that almost every aggregation reduces inputs, since it collapses multiple rows into a single scalar value.

Make push partial aggregation CBO based. Enable it for cases where pushed aggregation has same grouping keys. Additionally, for queries like select sum(sales) from fact, date_dim where fact.date_id = date_dim.date_id group by date_dim.year partial aggregation on date_dim.year can be pushed below join with grouping key of "date_id", which can greatly reduce number of rows before join operator.

cla-bot bot added the cla-signed label Oct 17, 2024

raunaqmorarka requested review from martint, lukasz-stec, sopel39 and dain October 17, 2024 04:26

sopel39 approved these changes Oct 17, 2024

View reviewed changes

martint reviewed Oct 17, 2024

View reviewed changes

raunaqmorarka added the performance label Oct 18, 2024

sopel39 force-pushed the ks/agg-pr branch from 4098063 to 2c543b5 Compare October 21, 2024 12:48

sopel39 force-pushed the ks/agg-pr branch from 484a1aa to 2c543b5 Compare October 23, 2024 10:58

sopel39 force-pushed the ks/agg-pr branch from 2c543b5 to 42f6e5a Compare November 4, 2024 14:59

raunaqmorarka requested a review from martint November 5, 2024 09:53

raunaqmorarka commented Nov 18, 2024

View reviewed changes

raunaqmorarka force-pushed the ks/agg-pr branch 3 times, most recently from ab1d317 to dc9f171 Compare November 20, 2024 06:55

martint approved these changes Nov 20, 2024

View reviewed changes

raunaqmorarka force-pushed the ks/agg-pr branch from dc9f171 to 1cc38ba Compare November 21, 2024 08:04

sopel39 added 2 commits November 21, 2024 13:34

Add AggregationNode#isInputReducingAggregation

af08170

raunaqmorarka force-pushed the ks/agg-pr branch from 1cc38ba to b9f0374 Compare November 21, 2024 08:04

raunaqmorarka merged commit ef267fc into trinodb:master Nov 21, 2024
91 checks passed

raunaqmorarka deleted the ks/agg-pr branch November 21, 2024 09:23

github-actions bot added this to the 466 milestone Nov 21, 2024

mosabua mentioned this pull request Nov 25, 2024

Add Trino 466 release notes #24208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable push partial aggregation though join #23812

Enable push partial aggregation though join #23812

raunaqmorarka commented Oct 17, 2024 •

edited

Loading

martint Oct 17, 2024

sopel39 Oct 21, 2024 •

edited

Loading

martint Oct 31, 2024

sopel39 Nov 5, 2024

raunaqmorarka Nov 18, 2024

raunaqmorarka Nov 18, 2024

martint Nov 19, 2024

raunaqmorarka Nov 19, 2024

sopel39 commented Oct 18, 2024

sopel39 commented Oct 22, 2024

raunaqmorarka commented Oct 23, 2024 •

edited

Loading

sopel39 commented Oct 23, 2024 •

edited

Loading

raunaqmorarka Nov 18, 2024

raunaqmorarka Nov 18, 2024

martint Nov 20, 2024

Enable push partial aggregation though join #23812

Enable push partial aggregation though join #23812

Conversation

raunaqmorarka commented Oct 17, 2024 • edited Loading

Description

Additional context and related issues

Release notes

Choose a reason for hiding this comment

sopel39 Oct 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 commented Oct 18, 2024

sopel39 commented Oct 22, 2024

raunaqmorarka commented Oct 23, 2024 • edited Loading

sopel39 commented Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raunaqmorarka commented Oct 17, 2024 •

edited

Loading

sopel39 Oct 21, 2024 •

edited

Loading

raunaqmorarka commented Oct 23, 2024 •

edited

Loading

sopel39 commented Oct 23, 2024 •

edited

Loading