Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: order_by for compile_sql now works as expected #49

Merged
merged 5 commits into from
Oct 16, 2024

Conversation

serramatutu
Copy link
Collaborator

@serramatutu serramatutu commented Oct 1, 2024

Summary

This PR changes the internal representation of QueryParameters to add stricter validation and fix the ambiguity of order_by parameters when sent via GraphQL. It also exposes new OrderByMetric and OrderByGroupBy top-level objects so that users can specify their clauses more precisely if strings don't suffice.

I recommend you review this commit by commit :)

Rationale of the changes

Improving the internal representation of query parameters

To facilitate typing across different classes, we created a QueryParameters class that is a TypedDict. While this is convenient for static typing of kwargs, TypedDicts don't do any validation. To fix that, I created two dataclasses: AdhocQueryParametersStrict and SavedQueryQueryParametersStrict, which represent "strictly" validated parameters that are ready to be sent to the API. Then, I improved the validate_query_parameters method so that it returns one of these, and raises errors in case there was any validation error of the parameters dict.

Remove ambiguity of order_by

The GraphQL API forces us to specify whether an order_by clause is a metric or a dimension. To remove ambiguity from the representation, the "strict" parameter dataclasses only support OrderByMetric and OrderByGroupBy, and it is the job of the validator to convert strings like -my_metric into an object like OrderByMetric(name="my_metric", descending=True). Now, the protocol implementations don't need to worry about what is what since that's solved in the validation layer, before calling the ADBC or GraphQL protocols.

Making protocols use the new representation

Finally, I changed the GraphQL and ADBC protocols to use the return value of validate_query_parameters when building a query request.

Breaking changes

Since we don't have the list of known metrics/dimensions at query time when querying through a saved query, you now have to specifically provide a OrderByMetric or OrderByGroupBy in that case, otherwise the SDK will raise an error. For adhoc queries, you can still provide a string and the SDK will parse the spec objects from that.

The tests for `query` and `compile_sql` did not test all allowed
parameters, which made the `order_by` bug go unnoticed. This commit
fixes that by adding all the remaining parameters to the tests.

Note that this commit is in a broken state since the fix hasn't been
applied yet. That will come in a future patch.
@serramatutu serramatutu force-pushed the serramatutu/order-by-fix branch 2 times, most recently from d056afa to 55b4ea6 Compare October 1, 2024 14:38
"""

name: str
grain: Optional[TimeGranularity]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to update this (along with whereever else uses TimeGranularity), since I believe with custom granularity, this can be an arbitrary string. cc: @courtneyholcomb to confirm

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's correct!

Copy link
Collaborator Author

@serramatutu serramatutu Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! I wanted to do that in a followup PR to avoid mixing things but I'm aware :)

In the folowup PR, I'll do something along the lines of

Grain = Union[TimeGranularity, str]

Then update all the call sites that take TimeGranularity and make them use Grain instead.

dbtsl/api/shared/query_params.py Outdated Show resolved Hide resolved
dbtsl/api/shared/query_params.py Outdated Show resolved Hide resolved
@@ -24,27 +52,30 @@ def test_serialize_query_params_complete_query() -> None:
"metrics": ["a", "b"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any tests against object syntax through adbc? Something like this should work

{{ semantic_layer.query(metrics=[Metric("orders")]) }}

Copy link
Collaborator Author

@serramatutu serramatutu Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh the SDK only generates object syntax for order by. For metrics and group by we just use regular string arrays.

Is there a reason we would need that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For order by there are tests tho

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For extra context: there's no way for a user to submit a raw string to be sent via ADBC. They have to do like:

with client.session():
    table = client.query(metrics=["my_metric"], group_by=["my_dim"])
    print(table)

Under the hood, the SDK will generate a SQL statement like the following and send it via ADBC.

SELECT * FROM {{ semantic_layer.query(metrics=["my_metric"], group_by=["my_dim"]) }}

@serramatutu serramatutu requested a review from WilliamDee October 2, 2024 13:54
@serramatutu serramatutu force-pushed the serramatutu/order-by-fix branch 3 times, most recently from d954309 to 3a107d2 Compare October 16, 2024 14:00
This commit improves our validation and representation of query
parameters, and fixes the bug with `order_by`.

We're still in an inconsistent state: we gotta propagate the changes and
use them in other classes. Will do that in the following patch.
This commit makes the ADBC and GraphQL protocol implementations use the
new stricter representation for query params. It updates the tests
accordingly.
This commit updates the public `.pyi` files to ensure we use the new
order by spec
Added changelog entries related to the order by changes.
@serramatutu serramatutu force-pushed the serramatutu/order-by-fix branch from 3a107d2 to 72dbfa5 Compare October 16, 2024 14:04
@serramatutu serramatutu merged commit 9896a1c into main Oct 16, 2024
4 checks passed
@serramatutu serramatutu deleted the serramatutu/order-by-fix branch October 16, 2024 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants