Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add New SavedQuery Protocol #148

Merged
merged 8 commits into from
Sep 19, 2023
Merged

Add New SavedQuery Protocol #148

merged 8 commits into from
Sep 19, 2023

Conversation

plypaul
Copy link
Contributor

@plypaul plypaul commented Sep 14, 2023

Resolves #144

Description

Please see the associated issue for details.

Checklist

@cla-bot cla-bot bot added the cla:yes label Sep 14, 2023
@plypaul plypaul force-pushed the plypaul--49--saved-queries2 branch from 9d81690 to 57df3a2 Compare September 14, 2023 01:07
@plypaul plypaul marked this pull request as ready for review September 14, 2023 01:09
@plypaul plypaul requested a review from QMalcolm September 14, 2023 01:09
Copy link
Collaborator

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic! Some small questions that we should figure out before approving and merging. Just want to have things really nailed down before starting the core work.


@property
@abstractmethod
def group_by_item_names(self) -> Sequence[str]: # noqa: D
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the idea is that group_by_item_names aren't required, but should be represented as an empty set when there are none? Was there a conversation somewhere about whether not we should mark them as optional (Optional[Sequence[str]])? I'd just like to understand the logic so I can keep it in mind when reviewing other parts of the protocol and ensuring consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, collections should not be typed as optional unless there's a meaningful difference between "no collection" and "empty collection".

So we should not make these optional return types, but on implementation we can default them to an empty Sequence type so users are not required to pass in a value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, that works for me 🙂

dbt_semantic_interfaces/protocols/semantic_manifest.py Outdated Show resolved Hide resolved
Comment on lines +27 to +31
* Check if metric names exist in the manifest.
* Check that the where filter is valid using the same logic as WhereFiltersAreParsable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check group_by_item_names to check that they reference existing things? Additionally, what are valid group_by_item_names, is it just categorical dimensions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QMalcolm Yeah, that would be good to check, but I didn't see similar checks being done for filters. Is that right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right group_bys take the form of Dimension(...). Then we need a validation similar to the WhereFilter validations that checks that they're parsable by the jinja templater, because as with WhereFilters the jinja template parser is the only thing can enforce the protocol structure of these structured strings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this look good!

@tlento tlento self-requested a review September 14, 2023 20:31

@property
@abstractmethod
def group_by_item_names(self) -> Sequence[str]: # noqa: D
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a general rule, collections should not be typed as optional unless there's a meaningful difference between "no collection" and "empty collection".

So we should not make these optional return types, but on implementation we can default them to an empty Sequence type so users are not required to pass in a value.


name: str
metrics: List[str]
group_bys: List[str]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QMalcolm does it matter that this is not part of the protocol? It shouldn't, but I wanted to make sure this won't cause havoc with the core parsing setup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It would make things weird. To stick to the protocol we'd have to serialize group_by_item_names, but then MetricFlow uses these pydantic objects for deserialization, and since their is no setter for group_by_item_names on this object, they'd get dropped 😬 In a separate comment you brought up a question about why do we have both group_bys and group_by_item_names, and I think that's actually the better question. I think we just want group_bys? I'm not seeing what group_by_item_names provides.

dbt_semantic_interfaces/protocols/semantic_manifest.py Outdated Show resolved Hide resolved
dbt_semantic_interfaces/protocols/saved_query.py Outdated Show resolved Hide resolved
@plypaul plypaul requested a review from QMalcolm September 15, 2023 18:12
Copy link
Collaborator

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're looking really good, we just need to do a bit more on handing of group_bys I believe.

issues: List[ValidationIssue] = []

for group_by_item_name in saved_query.group_bys:
structured_name = DunderedNameFormatter.parse_name(group_by_item_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already talked about it in a call, but documenting it here. This assumes that the group_bys will be strings of the form '<primary_enitity>__<dimension>'. Although eventually we want to support that, the strings right now will be of the form 'Dimension("<primary_entity>__<dimension>")'. We should probably have a GroupByParser, similar to the WhereFilterParser for parsing group_bys, and use that for validating that the group_bys have valid structuring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-used the WhereFilterParser for now - there's some work going on right now to support Dimension(...).grain(...), so I'll wait until that's in place to switch out the implementation.

@plypaul plypaul force-pushed the plypaul--49--saved-queries2 branch 2 times, most recently from 5c3593c to 00a0821 Compare September 16, 2023 00:38
@plypaul plypaul requested a review from QMalcolm September 16, 2023 00:56
@plypaul plypaul force-pushed the plypaul--49--saved-queries2 branch from 00a0821 to 5e75b16 Compare September 16, 2023 00:59
Copy link
Collaborator

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes, left, but the overall implementation looks great!

dbt_semantic_interfaces/protocols/semantic_manifest.py Outdated Show resolved Hide resolved
Comment on lines +27 to +31
* Check if metric names exist in the manifest.
* Check that the where filter is valid using the same logic as WhereFiltersAreParsable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to this look good!

dbt_semantic_interfaces/implementations/saved_query.py Outdated Show resolved Hide resolved
"items": {"type": "string"},
},
},
"additionalProperties": False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think we have to, but we should probably mark the properties name and metrics to be required. I think parsing will error out either way if its not specified, as the creation of the object will fail. But specifying them in the jsonschema spec will give better errors I believe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@plypaul plypaul requested a review from QMalcolm September 18, 2023 22:05
Copy link
Collaborator

@QMalcolm QMalcolm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Sorry that there was so much back and forth on this one 🙃

@plypaul plypaul force-pushed the plypaul--49--saved-queries2 branch from 961e4ab to 37c96f2 Compare September 19, 2023 18:26
@plypaul plypaul merged commit 3d02807 into main Sep 19, 2023
9 checks passed
@plypaul plypaul deleted the plypaul--49--saved-queries2 branch September 19, 2023 18:34
QMalcolm added a commit that referenced this pull request Sep 26, 2023
This probably should have been part of #148, but we forgot. I'm adding
these tests here because I plan to add `label` to the `SavedQuery`
protocol in the coming commits. With that, I'll want to updated the
parsing tests to check it. To do that, the tests need to exist.
@QMalcolm QMalcolm mentioned this pull request Oct 23, 2023
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add new SavedQuery protocol
3 participants