Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Enable group by keys in aggregation expressions #3399

Merged
merged 4 commits into from
Nov 22, 2024

Conversation

kevinzwang
Copy link
Member

No description provided.

@github-actions github-actions bot added the enhancement New feature or request label Nov 22, 2024
Copy link

codspeed-hq bot commented Nov 22, 2024

CodSpeed Performance Report

Merging #3399 will degrade performances by 13.57%

Comparing kevin/groupby-key-in-agg (3f6cf55) with main (60ae62f)

Summary

⚡ 1 improvements
❌ 1 regressions
✅ 15 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main kevin/groupby-key-in-agg Change
test_iter_rows_first_row[100 Small Files] 388.4 ms 303 ms +28.21%
test_show[100 Small Files] 23.1 ms 26.8 ms -13.57%

Copy link

codecov bot commented Nov 22, 2024

Codecov Report

Attention: Patch coverage is 81.08108% with 7 lines in your changes missing coverage. Please review.

Project coverage is 77.17%. Comparing base (ec39dc0) to head (3f6cf55).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-dsl/src/resolve_expr/mod.rs 77.41% 7 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3399      +/-   ##
==========================================
+ Coverage   76.54%   77.17%   +0.63%     
==========================================
  Files         685      685              
  Lines       85269    83696    -1573     
==========================================
- Hits        65266    64593     -673     
+ Misses      20003    19103     -900     
Files with missing lines Coverage Δ
src/daft-logical-plan/src/ops/agg.rs 60.00% <100.00%> (+1.86%) ⬆️
src/daft-logical-plan/src/ops/pivot.rs 71.42% <100.00%> (+0.84%) ⬆️
src/daft-dsl/src/resolve_expr/mod.rs 90.21% <77.41%> (+0.04%) ⬆️

... and 28 files with indirect coverage changes

---- 🚨 Try these New Features:

Copy link

graphite-app bot commented Nov 22, 2024

Graphite Automations

"Request reviewers once CI passes" took an action on this PR • (11/22/24)

1 reviewer was added to this PR based on Andrew Gazelka's automation.

@@ -208,7 +208,7 @@ fn expand_wildcards(
}

/// Checks if an expression used in an aggregation is well formed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
/// Checks if an expression used in an aggregation is well formed.
/// Checks if an expression used in an aggregation is well-formed.

adding new line makes look better in rustdocs

https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph

Comment on lines 227 to 237
fn has_single_agg_layer(expr: &ExprRef, groupby: &HashSet<ExprRef>) -> bool {
groupby.contains(expr)
|| match expr.as_ref() {
Expr::Agg(agg_expr) => !agg_expr.children().iter().any(has_agg),
Expr::Column(_) => false,
Expr::Literal(_) => true,
_ => expr
.children()
.iter()
.all(|e| has_single_agg_layer(e, groupby)),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm I'm not sure this function really tests has_single_agg_layer anymore. Maybe rename or rethink logic? or make into two separate functions?

also potential bug (although unsure it could even occur):

groupby contains expr with multiple aggs -> returns true (as in single agg) even though not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point but I'm not really sure what to name it. I think the logic of the function itself is fairly straightforward though. I'll give it a thought

As for the bug, group bys should never have aggregations in them and will fail in another place in the code if they do.

}
/// - sum(col("a")) + col("b") when "b" is not a group by key
/// - not all branches are aggregations, literals, or group by keys
fn has_single_agg_layer(expr: &ExprRef, groupby: &HashSet<ExprRef>) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move has_single_agg_layer and validate_expr_in_agg to be a member of ExprResolver. It looks like they're only ever used as part of the expr resolving.

@kevinzwang kevinzwang enabled auto-merge (squash) November 22, 2024 21:43
@kevinzwang kevinzwang merged commit 68248a9 into main Nov 22, 2024
41 of 42 checks passed
@kevinzwang kevinzwang deleted the kevin/groupby-key-in-agg branch November 22, 2024 22:05
/// - sum(col("a")) + col("b") when "b" is not a group by key
/// - not all branches are aggregations, literals, or group by keys
fn is_valid_expr_in_agg(&self, expr: &ExprRef) -> bool {
self.groupby.unwrap().contains(expr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd add a comment about how self.groupby.unwrap() is safe here since it's checked previously on l:247

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops didn't realize you had some other comments. Will make a quick follow-up PR for these

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants