-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(optimizer): convert filter predicate to CNF to push through join #3623
Conversation
@@ -273,7 +273,8 @@ impl PushDownFilter { | |||
let left_cols = HashSet::<_>::from_iter(child_join.left.schema().names()); | |||
let right_cols = HashSet::<_>::from_iter(child_join.right.schema().names()); | |||
|
|||
for predicate in split_conjunction(&filter.predicate) { | |||
// TODO: simplify predicates, since they may be expanded with `to_cnf` | |||
for predicate in split_conjunction(&to_cnf(filter.predicate.clone())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this only applicable to predicates? I'm wondering if this should be part of the SimplifyExprs optimizer pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be a general simplification yes. Just put the comment here because this part specifically expands the predicate
CodSpeed Performance ReportMerging #3623 will degrade performances by 33.13%Comparing Summary
Benchmarks breakdown
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #3623 +/- ##
==========================================
- Coverage 77.84% 77.55% -0.29%
==========================================
Files 719 719
Lines 88280 88958 +678
==========================================
+ Hits 68720 68992 +272
- Misses 19560 19966 +406
|
are we worried about expression explosion? like if we have 10 expressions on LHS an 10 on RHS I'd imagine we would sometimes get 100 = |
use crate::boolean::{to_cnf, to_dnf}; | ||
|
||
#[test] | ||
fn dnf_simple() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have more tests with negations?
assert_eq!(expected, to_dnf(expr)); | ||
} | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in future I suppose could do some rust proptest stuff here to verify they are correct but not needed for this issue imo
I do want to fix this at some point but most filter predicates should not balloon to significantly large sizes. Will work on more expression simplification stuff in the future to address this |
By converting the predicate to Conjunctive Normal Form, we are able to split the predicate into expressions that are isolated to the schemas of an individual side of a join. Then, we can split by conjunctions and push those new expressions through the joins.
For example, consider the schemas:
If our plan was
That filter predicate in CNF is
The first and last expressions in this conjunction only have columns from a single side of the join, so can be pushed down, resulting in this optimized plan: