Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UDFs to catalog; Add extension UDFs for TUPLEUNION/TUPLECONCAT #496

Merged
merged 13 commits into from
Oct 2, 2024

Conversation

jpschorr
Copy link
Contributor

@jpschorr jpschorr commented Sep 17, 2024

Fixes #459 and #495

Extensions and scalar UDFs

Adds ability to create and register scalar User Defined Functions (UDFs)

TUPLEUNION & TUPLECONCAT

Adds TUPLEUNION and TUPLECONCAT functions via an extension crate under the extension crate namespace as scalar UDFs.

TUPLEUNION is as per 6.3.2 of the spec (https://partiql.org/assets/PartiQL-Specification.pdf#subsubsection.6.3.2)

TUPLECONCAT is inspired by description of concatenating binding environments as per section 3.3 of the spec (https://partiql.org/assets/PartiQL-Specification.pdf#subsection.3.3) and as requested in #495.


tupleunion({ 'bob': 1, 'sally': 'error' }, { 'sally': 1 }, { 'sally': 2 }, { 'sally': 3 }, { 'sally': 4 })
->
{ 'bob': 1, 'sally': 'error', 'sally': 1, 'sally': 2, 'sally': 3, 'sally': 4 }


tupleconcat({ 'bob': 1, 'sally': 'error' }, { 'sally': 1 }, { 'sally': 2 }, { 'sally': 3 }, { 'sally': 4 })
->
{ 'sally': 4, 'bob': 1 }


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jpschorr jpschorr requested review from alancai98, am357 and johnedquinn and removed request for alancai98 September 17, 2024 19:20
Copy link

codecov bot commented Sep 17, 2024

Codecov Report

Attention: Patch coverage is 80.64516% with 84 lines in your changes missing coverage. Please review.

Project coverage is 80.64%. Comparing base (70b5ea2) to head (8a54bb1).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
partiql-catalog/src/catalog.rs 78.83% 21 Missing and 8 partials ⚠️
partiql-eval/src/plan.rs 47.22% 19 Missing ⚠️
partiql-catalog/src/scalar_fn.rs 68.96% 9 Missing ⚠️
partiql-logical-planner/src/functions.rs 80.48% 8 Missing ⚠️
...nsion/partiql-extension-value-functions/src/lib.rs 92.98% 4 Missing ⚠️
partiql-eval/src/eval/expr/functions.rs 83.33% 4 Missing ⚠️
...tension/partiql-extension-ion-functions/src/lib.rs 0.00% 2 Missing and 1 partial ⚠️
partiql-catalog/src/call_defs.rs 0.00% 3 Missing ⚠️
partiql-common/src/catalog.rs 87.50% 3 Missing ⚠️
partiql-eval/src/eval/eval_expr_wrapper.rs 96.77% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #496      +/-   ##
==========================================
+ Coverage   80.49%   80.64%   +0.15%     
==========================================
  Files          73       78       +5     
  Lines       18947    19157     +210     
  Branches    18947    19157     +210     
==========================================
+ Hits        15251    15450     +199     
- Misses       3271     3283      +12     
+ Partials      425      424       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Sep 17, 2024

Conformance comparison report

Base (70b5ea2) 3b90fcf +/-
% Passing 90.35% 90.35% 0.00%
✅ Passing 5731 5731 0
❌ Failing 612 612 0
🔶 Ignored 0 0 0
Total Tests 6343 6343 0

Number passing in both: 5731

Number failing in both: 612

Number passing in Base (70b5ea2) but now fail: 0

Number failing in Base (70b5ea2) but now pass: 0

@johnedquinn
Copy link
Member

The main alternative I can think of for naming has to do with SQL's UNION:

  • TUPLEUNION_DISTINCT and TUPLEUNION/TUPLEUNION_ALL -- like SQL's UNION ALL/DISTINCT, though, SQL's UNION is distinct by default.

Though,TUPLEMERGE is quite clear as well.

@johnedquinn
Copy link
Member

Also, looking at Asterix and Couchbase:

  • Asterix has object_merge. It throws errors when duplicate fields are encountered.
  • Couchbase has object_concat which favors later arguments.

@jpschorr jpschorr changed the title Add TUPLE_UNION and TUPLE_MERGE; Add 'vararg' functions Add TUPLEUNION and TUPLEMERGE; Add 'vararg' functions Sep 18, 2024
@jpschorr
Copy link
Contributor Author

Also, looking at Asterix and Couchbase:

* Asterix has [object_merge](https://asterixdb.apache.org/docs/0.9.9/sqlpp/builtins.html#ObjectFunctions). It throws errors when duplicate fields are encountered.

* Couchbase has [object_concat](https://docs.couchbase.com/cloud/n1ql/n1ql-language-reference/objectfun.html#fn-obj-concat) which favors later arguments.

The initial version of this did use TUPLECONCAT as the binding concatenation was the inspiration (as seen in the PR description)... perhaps we should return to that?

partiql/tests/tuple_ops.rs Show resolved Hide resolved
@johnedquinn
Copy link
Member

johnedquinn commented Sep 18, 2024

The initial version of this did use TUPLECONCAT as the binding concatenation was the inspiration (as seen in the PR description)... perhaps we should return to that?

Merriam's definitions:

  • Concatenate (verb): to link together in a series or chain
  • Merge (verb): to cause to combine, unite, or coalesce / to blend gradually by stages that blur distinctions
  • Union (noun): an act or instance of uniting or joining two or more things into one

In hindsight, from the definitions above, TUPLE_UNION should have been TUPLE_CONCAT, and the proposed functionality should be TUPLE_MERGE. In any case, I believe TUPLE_MERGE is an appropriate name for this functionality.

@johnedquinn
Copy link
Member

That being said, TUPLEUNION has, thus far, only existed in the specification. As far as I'm aware, TUPLEUNION is not exposed in any way to end-customers. This may be a good opportunity to clear the air and retroactively adopt a new name that we, as maintainers, wouldn't regret in the future.

I don't want to distract from the intention of this PR, but seeing that this is the first instance of TUPLEUNION being used as an end-customer invokable function, it may be worth at least briefly discussing.

@am357
Copy link
Contributor

am357 commented Sep 18, 2024

That being said, TUPLEUNION has, thus far, only existed in the specification. As far as I'm aware, TUPLEUNION is not exposed in any way to end-customers. This may be a good opportunity to clear the air and retroactively adopt a new name that we, as maintainers, wouldn't regret in the future.

I don't want to distract from the intention of this PR, but seeing that this is the first instance of TUPLEUNION being used as an end-customer invokable function, it may be worth at least briefly discussing.

Good points John. Sharing my thoughts: I am less opinionated in this regard mainly because this is an implementation specific function and we can always add aliased functions later.

@sadderchris
Copy link

This looks like it will be sufficient for our purposes. I don't have any particular preference about the choice of name here.

@jpschorr
Copy link
Contributor Author

jpschorr commented Oct 1, 2024

That being said, TUPLEUNION has, thus far, only existed in the specification. As far as I'm aware, TUPLEUNION is not exposed in any way to end-customers. This may be a good opportunity to clear the air and retroactively adopt a new name that we, as maintainers, wouldn't regret in the future.

I don't want to distract from the intention of this PR, but seeing that this is the first instance of TUPLEUNION being used as an end-customer invokable function, it may be worth at least briefly discussing.

This PR has been reworked to:

  • Add the ability to register scalar UDFs into catalog via extension
  • Move the TUPLEUNION and TUPLECONCAT (renamed from TUPLEMERGE) functions into an extension

@jpschorr jpschorr requested review from am357 and johnedquinn October 1, 2024 20:53
@jpschorr jpschorr force-pushed the feat-tuple-union_merge branch from 169ce4f to 802cebf Compare October 1, 2024 20:58
@jpschorr jpschorr force-pushed the feat-tuple-union_merge branch from 802cebf to 6f30dc9 Compare October 1, 2024 21:13

pub type ScalarFnExprResultValue<'a> = Cow<'a, Value>;
pub type ScalarFnExprResult<'a> = Result<ScalarFnExprResultValue<'a>, ExtensionResultError>;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this and table_fn move to partiql_eval?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't.
partiql-eval imports partiql-catalog. Moving them to eval would necessitate catalog importing eval which would create a cycle.

For this reason, I tried to keep anything other than Value out of the signature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, how about partiql-commons?

partiql-eval/src/plan.rs Outdated Show resolved Hide resolved
partiql-eval/src/plan.rs Outdated Show resolved Hide resolved
Comment on lines 100 to 103
pub struct ScalarFnCallSpec {
pub input: Vec<CallSpecArg>,
pub output: Box<dyn ScalarFnExpr>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it also need to include isNullCall and isMissingCall as partiql-lang-kotlin implements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the fullness of time. I think that will become more apparent (and more testable) when the existing builtins are migrated into the catalolg.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #499

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands, all scalar functions using this mechanism have NULL and MISSING propagated as per https://github.com/partiql/partiql-lang-rust/pull/496/files#diff-e9f95c45f5307b665fabb97d0b6f9441210bd70f3934fb9ae806c58e8aa4cf82R42

partiql-catalog/src/call_defs.rs Outdated Show resolved Hide resolved
@jpschorr jpschorr changed the title Add TUPLEUNION and TUPLEMERGE; Add 'vararg' functions Add UDFs to catalog; Add extension UDFs for TUPLEUNION/TUPLECONCAT Oct 2, 2024
@jpschorr jpschorr merged commit 104a7d1 into main Oct 2, 2024
19 checks passed
@jpschorr jpschorr deleted the feat-tuple-union_merge branch October 3, 2024 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature Request: scalar-valued user-defined functions
4 participants