-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Filter predicates in SQL join #3371
Conversation
CodSpeed Performance ReportMerging #3371 will not alter performanceComparing Summary
|
let mut rel = self.new_with_context().plan_relation(&first.relation)?; | ||
let mut rel = self.plan_relation(&first.relation)?; | ||
self.table_map.insert(rel.get_name(), rel.clone()); | ||
for tbl in from_iter { | ||
let right = self.new_with_context().plan_relation(&tbl.relation)?; | ||
let right = self.plan_relation(&tbl.relation)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decided revert my new_with_context
here because SQLPlanner.plan_relation
does not mutate the plan nor does it use the alias map. Subquery relations are already handled in plan_relation
with a new_with_context
so this is redundant.
macro_rules! return_non_ident_errors { | ||
($e:expr) => { | ||
if !matches!( | ||
$e, | ||
PlannerError::ColumnNotFound { .. } | PlannerError::TableNotFound { .. } | ||
) { | ||
return Err($e); | ||
} | ||
// only one is fully qualified: `join on x = b.y` | ||
([Ident{value: col_a, ..}], [tbl_b, Ident{value: col_b, ..}]) => { | ||
if tbl_b.value == right_rel.get_name() { | ||
(col_a.clone(), col_b.clone()) | ||
} else if tbl_b.value == left_rel.get_name() { | ||
(col_b.clone(), col_a.clone()) | ||
} else { | ||
unsupported_sql_err!("Could not determine which table the identifiers belong to") | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I use a macro here because a function would have to own the error object to return it, and I would like to use the error in later parts of the code if it is not returned.
@kevinzwang wouldn't this be better to handle directly in the logical plan? That would allow us to support non equi joins in the dataframe api as well. ex df1.join(df2, on=(df1["a"] == df2["a"] & df2["b"] > 0)) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3371 +/- ##
==========================================
+ Coverage 77.39% 77.45% +0.05%
==========================================
Files 678 678
Lines 83300 83278 -22
==========================================
+ Hits 64469 64501 +32
+ Misses 18831 18777 -54
|
Graphite Automations"Request reviewers once CI passes" took an action on this PR • (11/21/24)1 reviewer was added to this PR based on Andrew Gazelka's automation. |
I agree. However, we would potentially need to have a concept of table-associated columns in our logical plan as well as make changes to our join op struct. I think we should definitely do it in the future, but I didn't want to broaden the scope of this PR. Additionally, I think we would still need to have SQL-specific logic to identify the join keys for each side in a query, so a lot of the work in this PR is still relevant. Let me know what you think! |
that sounds good for now. I think an interesting dataframe case is Additionally, how would you express this using the dsl
I'll open up an issue for dataframe non-equi joins just so we don't lose this! |
DataFusion has a concept of a table reference in a column expression and I think we could something similar if you use |
Adds support for things like:
Enables TPC-H q13