Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Sort expression (Expr::Sort) #12177

Merged
merged 11 commits into from
Aug 29, 2024

Conversation

findepi
Copy link
Member

@findepi findepi commented Aug 26, 2024

Remove sort as an expression, i.e. remove Expr::Sort from Expr enum.
Use expr::Sort directly when sorting.

The sort expression was used in context of ordering (sort, topk, create
table, file sorting). Those places require their sort expression to be
of type Sort anyway and no other expression was allowed, so this change
improves static typing. Sort as an expression was illegal in other
contexts.

For #1468 (comment)
Fixes #12193

@findepi findepi marked this pull request as draft August 26, 2024 15:27
@github-actions github-actions bot added sql SQL Planner logical-expr Logical plan and expressions optimizer Optimizer rules core Core DataFusion crate proto Related to proto crate functions labels Aug 26, 2024
@findepi
Copy link
Member Author

findepi commented Aug 26, 2024

Not ready for review. Just for FYI.

datafusion/core/src/datasource/listing/table.rs Outdated Show resolved Hide resolved
datafusion/core/tests/expr_api/mod.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/eliminate_limit.rs Show resolved Hide resolved
@findepi findepi force-pushed the findepi/sort-is-not-expression branch from c29d147 to cdea34e Compare August 27, 2024 08:50
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Aug 27, 2024
@findepi findepi changed the title Remove Expr::Sort Remove Sort expression (Expr::Sort) Aug 27, 2024
@findepi
Copy link
Member Author

findepi commented Aug 27, 2024

The build is all green (https://github.com/findepi/datafusion/actions/runs/10574858019)!

@findepi findepi force-pushed the findepi/sort-is-not-expression branch from cdea34e to 1c3f194 Compare August 27, 2024 13:28
@findepi findepi marked this pull request as ready for review August 27, 2024 13:31
@findepi
Copy link
Member Author

findepi commented Aug 27, 2024

Rebased, should be ready for initial review pass.

I split the change into couple commits, but bulk of it is still one big commit.
Should i continue along this pattern?
Should i extract some commits into own PRs
let me know what's best from review perspective.

cc @alamb @comphead @jonahgao @jayzhan211 @andygrove @crepererum

Copy link
Contributor

@crepererum crepererum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at all the invalid states (panics and "unimplemented" errors) that are removed in this PR, this is a clear improvement. I also agree with the overall premise that "sorts" are not really a subtype/variant of Expr.

This should probably get a 2nd ACK by another maintainer.

@findepi
Copy link
Member Author

findepi commented Aug 27, 2024

there is a new conflict (perhaps due to #12196), let me rebase.

Part of effort to remove `Expr::Sort`.
Part of effort to remove `Expr::Sort`.
Take `expr::Sort` in `LogicalPlanBuilder.sort`.
Accept any `Expr` in new function, `LogicalPlanBuilder.sort_by` which
apply default sort ordering.

Part of effort to remove `Expr::Sort`.
Part of effort to remove `Expr::Sort`.
Remove sort as an expression, i.e. remove `Expr::Sort` from `Expr` enum.
Use `expr::Sort` directly when sorting.

The sort expression was used in context of ordering (sort, topk, create
table, file sorting).  Those places  require their sort expression to be
of type Sort anyway and no other expression was allowed, so this change
improves static typing.  Sort as an expression was illegal in other
contexts.
@findepi findepi force-pushed the findepi/sort-is-not-expression branch from 1c3f194 to 9840b8f Compare August 27, 2024 20:11
}))
}

pub fn replace_sort_expressions(sorts: Vec<Sort>, new_expr: Vec<Expr>) -> Vec<Sort> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe returns with Result<Vec<Sort>>?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. This is used in two places. In one we know the list of expressions is exact size.
the other is LogicalPlan.with_new_exprs which does assert! on provided new expr length. so it looks we don't need to be permissive here?

datafusion/expr/src/tree_node.rs Outdated Show resolved Hide resolved
datafusion/optimizer/src/eliminate_duplicated_expr.rs Outdated Show resolved Hide resolved
None,
));
let plan = LogicalPlanBuilder::from(table_scan)
.aggregate(vec![col("c")], vec![expr, count_distinct(col("b"))])?
.build()?;
// Do nothing
let expected = "Aggregate: groupBy=[[test.c]], aggr=[[sum(test.a) ORDER BY [test.a], count(DISTINCT test.b)]] [c:UInt32, sum(test.a) ORDER BY [test.a]:UInt64;N, count(DISTINCT test.b):Int64]\
let expected = "Aggregate: groupBy=[[test.c]], aggr=[[sum(test.a) ORDER BY [test.a ASC NULLS LAST], count(DISTINCT test.b)]] [c:UInt32, sum(test.a) ORDER BY [test.a ASC NULLS LAST]:UInt64;N, count(DISTINCT test.b):Int64]\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as previous?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default ASC/DESC NULLS FIRST/LAST behavior seems to come from

let asc = asc.unwrap_or(true);
expr_vec.push(Expr::Sort(Sort::new(
Box::new(expr),
asc,
// when asc is true, by default nulls last to be consistent with postgres
// postgres rule: https://www.postgresql.org/docs/current/queries-order.html
nulls_first.unwrap_or(!asc),

if i read this correctly, ASC NULLS LAST is the default

@findepi findepi mentioned this pull request Aug 28, 2024
@findepi findepi force-pushed the findepi/sort-is-not-expression branch from 2f0ea05 to 39877e4 Compare August 28, 2024 07:26
@findepi
Copy link
Member Author

findepi commented Aug 28, 2024

thank you @jayzhan211 for your review! would you mind taking another look?

this commit is longer than advised in the review comment, but after
squashing the diff will be smaller
Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Aug 28, 2024

EPIC!

@alamb alamb added the api change Changes the API exposed to users of the crate label Aug 28, 2024
@findepi
Copy link
Member Author

findepi commented Aug 29, 2024

thank you @jayzhan211 @crepererum @alamb for all your time spent reviewing this!

@crepererum crepererum merged commit 85adb6c into apache:main Aug 29, 2024
25 checks passed
@findepi findepi deleted the findepi/sort-is-not-expression branch August 29, 2024 15:46
@findepi
Copy link
Member Author

findepi commented Aug 29, 2024

❤️ thank you for the (brave) merge, @crepererum !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate documentation Improvements or additions to documentation functions logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sql SQL Planner substrait
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Sort from Expr
4 participants