fix: `order_by` for `compile_sql` now works as expected #49

serramatutu · 2024-10-01T14:10:06Z

Summary

This PR changes the internal representation of QueryParameters to add stricter validation and fix the ambiguity of order_by parameters when sent via GraphQL. It also exposes new OrderByMetric and OrderByGroupBy top-level objects so that users can specify their clauses more precisely if strings don't suffice.

I recommend you review this commit by commit :)

Rationale of the changes

Improving the internal representation of query parameters

To facilitate typing across different classes, we created a QueryParameters class that is a TypedDict. While this is convenient for static typing of kwargs, TypedDicts don't do any validation. To fix that, I created two dataclasses: AdhocQueryParametersStrict and SavedQueryQueryParametersStrict, which represent "strictly" validated parameters that are ready to be sent to the API. Then, I improved the validate_query_parameters method so that it returns one of these, and raises errors in case there was any validation error of the parameters dict.

Remove ambiguity of `order_by`

The GraphQL API forces us to specify whether an order_by clause is a metric or a dimension. To remove ambiguity from the representation, the "strict" parameter dataclasses only support OrderByMetric and OrderByGroupBy, and it is the job of the validator to convert strings like -my_metric into an object like OrderByMetric(name="my_metric", descending=True). Now, the protocol implementations don't need to worry about what is what since that's solved in the validation layer, before calling the ADBC or GraphQL protocols.

Making protocols use the new representation

Finally, I changed the GraphQL and ADBC protocols to use the return value of validate_query_parameters when building a query request.

Breaking changes

Since we don't have the list of known metrics/dimensions at query time when querying through a saved query, you now have to specifically provide a OrderByMetric or OrderByGroupBy in that case, otherwise the SDK will raise an error. For adhoc queries, you can still provide a string and the SDK will parse the spec objects from that.

The tests for `query` and `compile_sql` did not test all allowed parameters, which made the `order_by` bug go unnoticed. This commit fixes that by adding all the remaining parameters to the tests. Note that this commit is in a broken state since the fix hasn't been applied yet. That will come in a future patch.

WilliamDee · 2024-10-01T16:46:54Z

dbtsl/api/shared/query_params.py

+    """
+
+    name: str
+    grain: Optional[TimeGranularity]


may need to update this (along with whereever else uses TimeGranularity), since I believe with custom granularity, this can be an arbitrary string. cc: @courtneyholcomb to confirm

that's correct!

yep! I wanted to do that in a followup PR to avoid mixing things but I'm aware :)

In the folowup PR, I'll do something along the lines of

Grain = Union[TimeGranularity, str]

Then update all the call sites that take TimeGranularity and make them use Grain instead.

dbtsl/api/shared/query_params.py

WilliamDee · 2024-10-01T16:56:13Z

tests/api/adbc/test_protocol.py

@@ -24,27 +52,30 @@ def test_serialize_query_params_complete_query() -> None:
            "metrics": ["a", "b"],


Do we have any tests against object syntax through adbc? Something like this should work

{{ semantic_layer.query(metrics=[Metric("orders")]) }}

Oh the SDK only generates object syntax for order by. For metrics and group by we just use regular string arrays.

Is there a reason we would need that?

For order by there are tests tho

For extra context: there's no way for a user to submit a raw string to be sent via ADBC. They have to do like:

with client.session(): table = client.query(metrics=["my_metric"], group_by=["my_dim"]) print(table)

Under the hood, the SDK will generate a SQL statement like the following and send it via ADBC.

SELECT * FROM {{ semantic_layer.query(metrics=["my_metric"], group_by=["my_dim"]) }}

This commit improves our validation and representation of query parameters, and fixes the bug with `order_by`. We're still in an inconsistent state: we gotta propagate the changes and use them in other classes. Will do that in the following patch.

This commit makes the ADBC and GraphQL protocol implementations use the new stricter representation for query params. It updates the tests accordingly.

This commit updates the public `.pyi` files to ensure we use the new order by spec

Added changelog entries related to the order by changes.

serramatutu requested review from DevonFulcher, courtneyholcomb and WilliamDee October 1, 2024 14:10

serramatutu force-pushed the serramatutu/order-by-fix branch 2 times, most recently from d056afa to 55b4ea6 Compare October 1, 2024 14:38

WilliamDee reviewed Oct 1, 2024

View reviewed changes

serramatutu requested a review from WilliamDee October 2, 2024 13:54

WilliamDee approved these changes Oct 15, 2024

View reviewed changes

serramatutu force-pushed the serramatutu/order-by-fix branch 3 times, most recently from d954309 to 3a107d2 Compare October 16, 2024 14:00

serramatutu added 4 commits October 16, 2024 16:02

refactor: make protocols use the new validations

d3ab417

This commit makes the ADBC and GraphQL protocol implementations use the new stricter representation for query params. It updates the tests accordingly.

refactor: typing public interfaces with order by

b74b061

This commit updates the public `.pyi` files to ensure we use the new order by spec

docs: changelog

72dbfa5

Added changelog entries related to the order by changes.

serramatutu force-pushed the serramatutu/order-by-fix branch from 3a107d2 to 72dbfa5 Compare October 16, 2024 14:04

serramatutu mentioned this pull request Oct 16, 2024

[Bug] compile_sql with order_by failing #53

Closed

3 tasks

serramatutu merged commit 9896a1c into main Oct 16, 2024
4 checks passed

serramatutu deleted the serramatutu/order-by-fix branch October 16, 2024 14:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: `order_by` for `compile_sql` now works as expected #49

fix: `order_by` for `compile_sql` now works as expected #49

serramatutu commented Oct 1, 2024 •

edited

Loading

WilliamDee Oct 1, 2024

courtneyholcomb Oct 1, 2024

serramatutu Oct 2, 2024 •

edited

Loading

WilliamDee Oct 1, 2024

serramatutu Oct 2, 2024 •

edited

Loading

serramatutu Oct 2, 2024

serramatutu Oct 2, 2024

		@@ -24,27 +52,30 @@ def test_serialize_query_params_complete_query() -> None:
		"metrics": ["a", "b"],

fix: order_by for compile_sql now works as expected #49

fix: order_by for compile_sql now works as expected #49

Conversation

serramatutu commented Oct 1, 2024 • edited Loading

Summary

Rationale of the changes

Improving the internal representation of query parameters

Remove ambiguity of order_by

Making protocols use the new representation

Breaking changes

WilliamDee Oct 1, 2024

Choose a reason for hiding this comment

courtneyholcomb Oct 1, 2024

Choose a reason for hiding this comment

serramatutu Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

WilliamDee Oct 1, 2024

Choose a reason for hiding this comment

serramatutu Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

serramatutu Oct 2, 2024

Choose a reason for hiding this comment

serramatutu Oct 2, 2024

Choose a reason for hiding this comment

fix: `order_by` for `compile_sql` now works as expected #49

fix: `order_by` for `compile_sql` now works as expected #49

serramatutu commented Oct 1, 2024 •

edited

Loading

Remove ambiguity of `order_by`

serramatutu Oct 2, 2024 •

edited

Loading

serramatutu Oct 2, 2024 •

edited

Loading