-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PERF] Predicate Pushdown into Scan Operator #1730
Conversation
samster25
commented
Dec 15, 2023
•
edited
Loading
edited
- Implements Predicate Pushdown into Scan Operator with using Native Downloads
- Implements Limit-Limit Folding
- Fixes bug with Statistics evaluation where we were not performing bitwise ands and ors
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1730 +/- ##
=======================================
Coverage 85.02% 85.02%
=======================================
Files 55 55
Lines 5515 5517 +2
=======================================
+ Hits 4689 4691 +2
Misses 826 826 |
This reverts commit 25b6808.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, feel free to disregard the nits around the plan reprs, since we have a decent bit of general reworking to do there anyways!
@@ -53,8 +53,7 @@ impl Source { | |||
partitioning_keys, | |||
pushdowns, | |||
})) => { | |||
res.push("Source:".to_string()); | |||
res.push(format!("Scan op = {}", scan_op)); | |||
res.push(format!("Source: Operator = {}", scan_op)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this more readable when visualizing the plan? In the past, we've tried to keep the first line of the logical op repr super concise, essentially just a name for the logical op, but it looks like the scan operator repr can be pretty long:
Daft/src/daft-scan/src/anonymous.rs
Line 36 in b6d6669
write!(f, "AnonymousScanOperator: File paths=[{}], Format-specific config = {:?}, Storage config = {:?}", self.files.join(", "), self.file_format_config, self.storage_config) |
IMO each string in this returned vec should be pretty atomic/granular, and it should be up to the display mode (e.g. tree plan visualization vs. single-line summary) to condense it as desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does make it more readable imo! But the main reason to do this was to make it much closer to the Legacy Source repr to be able to write the repr based tests much easier. Both should be able about the same length
predicate.apply(&mut |e: &Expr| { | ||
|
||
match e { | ||
#[cfg(feature = "python")] | ||
Expr::Function{func: FunctionExpr::Python(..), .. } => { | ||
has_udf = true; | ||
Ok(VisitRecursion::Stop) | ||
}, | ||
Expr::Function{func: FunctionExpr::Uri(..), .. } => { | ||
has_udf = true; | ||
Ok(VisitRecursion::Stop) | ||
}, | ||
_ => Ok(VisitRecursion::Continue) | ||
} | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So nice!
@@ -33,7 +33,7 @@ impl AnonymousScanOperator { | |||
|
|||
impl Display for AnonymousScanOperator { | |||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | |||
write!(f, "{:#?}", self) | |||
write!(f, "AnonymousScanOperator: File paths=[{}], Format-specific config = {:?}, Storage config = {:?}", self.files.join(", "), self.file_format_config, self.storage_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to do the same for GlobScanOperator
as well!
Daft/src/daft-scan/src/glob.rs
Line 217 in b6d6669
write!(f, "{:#?}", self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah was thinking that, can do that in a follow up!