Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON predicate pushdown for Pinot #23966

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

robertzych
Copy link

@robertzych robertzych commented Oct 30, 2024

Description

These changes allow the Trino-Pinot connector to perform predicate pushdowns for a combination of JSON functions.

For example, the following Trino function calls:

where
    contains(ARRAY['O'], json_extract_scalar(json, '$.orderstatus')) 
    AND (
      json_array_contains(json_extract(json, '$.stringArray'), 'O') 
      OR json_array_contains(json_extract(json, '$.stringArray'), 'F') 
    ) 
    AND (
        json_extract(json, '$.dne') IS NULL 
        OR json_array_contains(json_extract(json, '$.stringArray'), 'X') = false
        OR json_array_contains(json_extract(json, '$.intArray'), 0)
    );

Are translated to the following Pinot function calls:

where (
    JSON_MATCH(json, '"$.orderstatus" in (''O'')') 
    AND (
        JSON_MATCH(json, '"$.stringArray[*]" = ''O''') 
        OR JSON_MATCH(json, '"$.stringArray[*]" = ''F''')
    ) 
    AND (
        JSON_MATCH(json, '"$.dne" IS NULL') 
        OR NOT(JSON_MATCH(json, '"$.stringArray[*]" = ''X'''))
        OR JSON_MATCH(json, '"$.intArray[*]" = 0')
    )
)

The new session property (pinot.json_predicate_pushdown_enabled) defaults to false because JSON_MATCH can fail when a JSON index hasn't been configured in Pinot.

Additional context and related issues

PinotMetadata.applyFilter calls buildConstraintPQL which recursively walks through the call tree and returns the converted SQL if the entire call tree is supported.

Release notes

TODO

( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

Copy link

cla-bot bot commented Oct 30, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Robert Zych.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

@robertzych
Copy link
Author

@cla-bot check

@robertzych
Copy link
Author

I just signed and submitted the individual CLA.

Copy link

cla-bot bot commented Oct 31, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@robertzych robertzych force-pushed the pinot-json-predicate-pushdown branch from 475e169 to 5fcc0fd Compare October 31, 2024 20:36
Copy link

cla-bot bot commented Oct 31, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@robertzych robertzych changed the title JSON predicate pushdown for Pinot (draft) [WIP] JSON predicate pushdown for Pinot Oct 31, 2024
Copy link

cla-bot bot commented Nov 1, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

6 similar comments
Copy link

cla-bot bot commented Nov 1, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Nov 6, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Nov 6, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Nov 7, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Nov 7, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

Copy link

cla-bot bot commented Nov 7, 2024

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@robertzych robertzych force-pushed the pinot-json-predicate-pushdown branch 3 times, most recently from 0747378 to 54fc345 Compare November 10, 2024 16:23
@robertzych robertzych marked this pull request as ready for review November 10, 2024 16:24
@robertzych robertzych changed the title [WIP] JSON predicate pushdown for Pinot JSON predicate pushdown for Pinot Nov 10, 2024
@robertzych robertzych force-pushed the pinot-json-predicate-pushdown branch from 54fc345 to 62e1dc6 Compare November 13, 2024 17:06
@robertzych
Copy link
Author

@cla-bot check

@cla-bot cla-bot bot added the cla-signed label Nov 14, 2024
@robertzych robertzych force-pushed the pinot-json-predicate-pushdown branch from 62e1dc6 to 5ae5480 Compare November 14, 2024 16:09
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add query based tests to TestPinotConnectorTest or TestPinotConnectorSmokeTest. assertThat(query("...")) provides isFullyPushedDown() method.

@@ -229,6 +235,14 @@ public PinotConfig setCountDistinctPushdownEnabled(boolean countDistinctPushdown
return this;
}

@Config("pinot.json-predicate-pushdown.enabled")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this toggle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new session property (pinot.json_predicate_pushdown_enabled) defaults to false because JSON_MATCH can fail when a JSON index hasn't been configured in Pinot.

Is it possible to check the index config during analyze?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a way to determine if the JSON index has been completely applied. Retrieving the table config from the Pinot Controller API (/tables/{tableName}) and inspecting tableIndexConfig.jsonIndexColumns would only confirm that it has been configured and the query could still fail if the JSON index was recently added and the table's segments haven't been reloaded.

Copy link
Author

@robertzych robertzych Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr Assuming a missing JSON index rarely happens (or only during development), I propose we change the default value of this config to true. Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we definitely need this toogle. If we can switch it to true by default is a bit interesting. @robertzych mentioned in the Trino Contributor Call https://github.com/trinodb/trino/wiki/Contributor-meetings#trino-contributor-call-13-dec-2024 that this could lead to a scenario where queries that used to work fail after an upgrade to a Trino version with this pushdown support.

So I think it depends a bit on what we did in similar situations for other connectors. Given that it can be breaking queries for users we seem to need the toggle.

@ebyhr ebyhr requested a review from elonazoulay November 15, 2024 04:08
@robertzych robertzych force-pushed the pinot-json-predicate-pushdown branch 2 times, most recently from 9545484 to 815af61 Compare November 19, 2024 00:54
@robertzych robertzych force-pushed the pinot-json-predicate-pushdown branch from 815af61 to 359f735 Compare November 19, 2024 01:43
@robertzych robertzych requested a review from ebyhr November 19, 2024 02:17
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
@trinodb trinodb deleted a comment from cla-bot bot Dec 13, 2024
Copy link
Member

@mosabua mosabua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am hoping that @ebyhr or @elonazoulay or maybe others can help review the code. At a minimum we need to document the new flag in the properties docs section for the connector and add some info about this pushdown in the performance section.

@github-actions github-actions bot added the docs label Dec 17, 2024
@robertzych robertzych requested a review from mosabua December 17, 2024 23:06
Copy link
Member

@elonazoulay elonazoulay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great feature! I'll take another pass, left some nits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants