-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIG domains for planner and plan evaluator #584
PIG domains for planner and plan evaluator #584
Conversation
…CompilerPipeline and future PlannerPipeline
b26029e
to
3c583bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave a cursory review of the PIG domains defined in partiql.ion
. Looks reasonable based off what I remember from some offline discussions. I think @jpschorr and @am357 should also take a look to check the plans align with the reviewed query plan docs.
I also think it would be good to include some of the docs part of those offline discussions in the partiql-docs
repo since the query plan and planner are to be open-sourced and public APIs.
// (struct (struct_field (lit a) (lit 42)) | ||
// Returns: { a: 42 } | ||
// | ||
// Example as a struct-union. Given a global environment `{| foo: { a: 42 }, bar: { b: 43} |}`, then: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do the vbars (i.e. |
) surrounding foo: { a: 42 }, bar: { b: 43}
mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's supposed to denote a bindings environment which is similar to but different than a struct. However, I really don't feel like providing a section describing notation here so I'll change this to be more clear without the notation.
// - `name`: the name of the variable as specified by the query author or determined statically. | ||
(product var_decl name::symbol) | ||
|
||
// The operators of PartiQL's relational algebra. See `$projectDir/docs/dev/RELATIONAL-ALGEBRA.md` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The doc RELATIONAL-ALGEBRA.md
isn't included yet on this branch. Can it be included in this PR? Or possibly ported to partiql-docs/drafts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to port it to to the partiql-docs/drafts
and referencing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That particular document is out of date. I'll remove the reference to it for now. I have a draft of a better version but it needs to be cleaned up and made ready for public consumption, so I'll add this ticket to the current milestone: #599
// Elements: | ||
// - `name`: the unique name of the implementation. Each operator has a different namespace containing its | ||
// default and PartiQL-specific | ||
(product impl name::symbol static_args::(* ion 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could also add a description for the static_args
element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
// If `<expr>` is not a struct, The field `_n` will be included in the struct, where `n` is the | ||
// ordinal of the field in the final merged struct. If `expr` returns a container that is not a struct, | ||
// field names will be assigned in the format of `_n` where `n` is the ordinal position of the | ||
// field within the struct. If `expr` returns a scalar value, it will be coerced into a singleton bag |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the merged struct
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
// - `name`: the name of the variable as specified by the query author or determined statically. | ||
(product var_decl name::symbol) | ||
|
||
// The operators of PartiQL's relational algebra. See `$projectDir/docs/dev/RELATIONAL-ALGEBRA.md` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to port it to to the partiql-docs/drafts
and referencing it here.
// For `.*` in SELECT list | ||
// If `<expr>` is a struct, the fields of that struct will be part of the merged struct. | ||
// If `<expr>` is not a struct, The field `_n` will be included in the struct, where `n` is the | ||
// ordinal of the field in the final merged struct. If `expr` returns a container that is not a struct, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit.: <expr>
? also could we change the name of the lhs
from expr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm confused as to what you are asking here. lhs
typically refers to the "left hand side" of a binary expression but this isn't a binary expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, indeed, it's confusing. I meant in expr::expr
, changing the property name expr
; or making the reference in the documentation more clear b/c IMO for fresh eyes, it's confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How's part_expr
?
// Basic join operator. Covers cross, inner, left, right and full joins. | ||
// For cross joins, set `join_type` to `(inner)` and the `predicate` to `(lit true)`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this and the other join definition in physical_plan: curious why going w/ this approach for representing cross join? Does going w/ the following make sense?
(sum join_type (cross) (inner) (left) (right) (full))
(join
join_type::join_type
left::bexpr
right::bexpr
predicate::(? expr))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bristled when I first saw this particular modeling of cross joins as well but on deep inspection IMHO it really is the best way to model cross joins. (And this is fundamentally similar to how they are modeled with the AST today.)
To demonstrate why it doesn't make sense to add (cross)
to join_type
: how should the evaluator behave when the predicate
is not null and the join_type
is (cross)
?
- Throw an exception? (That would be an edge case that would have to be handled many places, which would increase the noise level in the implementation.)
- Honor the predicate anyway? But that would be identical to an inner join, making
(cross)
redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could ignore the predicate when the join_type
is cross
. Although still not ideal, having it defined explicitly is better than implicit IMO. Thinking about this further and also looking at the grammar, perhaps we can also go with the following model (or something similar)?
(sum join
(product left left::bexpr predicate::expr)
(product right right::bexpr predicate::expr)
(product inner left::bexpr right::bexpr predicate::expr)
(product cross left::bexpr right::bexpr))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left
,right
should have the same elements asinner
--they each require a left and rightbexpr
and apredicate
.- With this proposal, the code that consumes the generated classes is actually more complex. I.e. to transform all joins within a
VisitorTransform
, we would have to override 4 different methods instead of 1. Pattern matching too has similar concerns. - With
inner
, I can still set the predicate to(lit true)
and get something that's semantically equivalent tocross
). This makescross
kind of redundant. Such redundancies generally increase complexity.
It's funny--I distinctly remember having this exact discussion (but related to ExprNode
) with @almann and @therapon. I had your position back then and they argued strongly for modeling cross
as an inner
with a (lit true)
predicate. It's taken time but I've come around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1
, yes that was a miss from my side. For 2
we could have a wrapper type with having a member property as kind
that references the join sum. I see the 3
orthogonal to choice of having cross
or not as this is still the case without it.
I hope I come around one day, but at this stage, I just can't see having an implicit logic to model rather than a precise model is a better choice. Let me know if you want to discuss offline and we can take it from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From offline discussion we will make predicate
nullable so that if the user did not specify a predicate (as with cross join
), the AST will not lie by saying there was a (lit true)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved, but putting this non-blocking comment here: Although we disregard a join w/ predicate as cross join, to be more defensive, I think we could still go w/ having the (cross_join)
as a join type to ensure the absence of predicate is not accidental to infer the cross join.
|
||
(with expr | ||
// At this point, there should be no undefined variables in the plan since the area all rewritten to | ||
// dynamic lookup function call sites (if enabled). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate, enabled by what? CompilerPipeline
? Also perhaps there is an assume knowledge here that requires expanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep references to implementation specific details such as CompilerPipeline
(actually PlannerPipeline
, introduced in #590) out of this document since it can/should be used to generate the data structures of multiple partiql implementations. However, based on that argument, dynamic lookup function call sites are also an implementation detail. I'll remove that.
|
||
// Global variable reference--typically a table although it can actually be bound to any value. Unlike | ||
// local variables, global variables are not stored in registers. Instead, they are typically stored | ||
// in persistent storage. Evaluating a `global_id` will return a value with an open iterator. There |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to change persistent storage
to non-ephemeral meaning the value persists after query evaluation
(or something along those lines)? This is to cater for in-memory database systems or RDBMSs that can map a table/schema to memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the term "non-ephemeral" wouldn't apply to an in-memory only database, would it? I'll rephrase this to be more inclusive of different types of storage systems.
Co-authored-by: Alan Cai <[email protected]>
Merges the following PRs to `main`: - #583 - #584 - #587 - #588 - #589 - #609 - #590 - #592 Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Merges the following PRs to `main`: - #583 - #584 - #587 - #588 - #589 - #609 - #590 - #592 Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Merges the following PRs to `main`: - #583 - #584 - #587 - #588 - #589 - #609 - #590 - #592 Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Merges the following PRs to `main`: - #583 - #584 - #587 - #588 - #589 - #609 - #590 - #592 Also, this adds a TODO comment (ea40e4a, requested by @am357 in offline chat) and excludes `PlannerPipeline` from the new aggregate tests which were pulled in when merging from `main` and were not part of the other PRs (34025b3). By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Originally part of #582 and based on #586 (which must be merged first) this PR contains all the PIG domain (tree data structure) changes needed for the planner and plan evaluator. Documentation in
partiql.ion
should be relatively self-explanatory, please let me know if not.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.