-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-7905][CH] Implete window's topk
by aggregation
#7976
base: main
Are you sure you want to change the base?
Conversation
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
A benchmark on following queries low cardinality partition keysinsert overwrite table dump_line
select l_orderkey, l_partkey, l_suppkey, l_linenumber from (
select l_orderkey, l_partkey, l_suppkey, l_linenumber, row_number() over (partition by l_suppkey order by l_orderkey, l_partkey) as r from tpch_pq.lineitem
) where r = 1;
high cardinality partition keys
0: jdbc:hive2://localhost:10000> insert overwrite table dump_line select l_orderkey, l_partkey, l_suppkey, l_linenumber from (select l_orderkey, l_partkey, l_suppkey, l_linenumber, row_number() over (partition by l_suppkey, l_orderkey order by l_partkey) as r from tpch_pq.lineitem) where r = 1;
+---------+
| Result |
+---------+
+---------+
No rows selected (41.441 seconds)
0: jdbc:hive2://localhost:10000> insert overwrite table dump_line select l_orderkey, l_partkey, l_suppkey, l_linenumber from (select l_orderkey, l_partkey, l_suppkey, l_linenumber, row_number() over (partition by l_suppkey, l_orderkey order by l_partkey) as r from tpch_pq.lineitem) where r = 1;
+---------+
| Result |
+---------+
+---------+
No rows selected (50.714 seconds) |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
For high cardinality partition keys, fallback to window. We have following result
0: jdbc:hive2://localhost:10000> insert overwrite table dump_line select l_orderkey, l_partkey, l_suppkey, l_linenumber from (select l_orderkey, l_partkey, l_suppkey, l_linenumber, row_number() over (partition by l_suppkey, l_orderkey order by l_partkey) as r from tpch_pq.lineitem) where r = 1;
+---------+
| Result |
+---------+
+---------+
No rows selected (26.549 seconds)
0: jdbc:hive2://localhost:10000> insert overwrite table dump_line select l_orderkey, l_partkey, l_suppkey, l_linenumber from (select l_orderkey, l_partkey, l_suppkey, l_linenumber, row_number() over (partition by l_suppkey, l_orderkey order by l_partkey) as r from tpch_pq.lineitem) where r = 1;
+---------+
| Result |
+---------+
+---------+
No rows selected (25.58 seconds) |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
Run Gluten Clickhouse CI on x86 |
What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
Fixes: #7905
This PR will use aggregation to calculate window's topk automatically when the partition keys are low cardinality ones.
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
unit tests
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)