-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Saved Queries #794
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a reasonable first cut.
How were you planning to address the parameterizable where filter? What we really want out of a saved query is something where the filter is more like:
saved_queries:
- name: date_filter_example
filter: metric_time BETWEEN <start_date> AND <end_date>
Then on invocation we want people to be able to do something like:
mf query --saved-query date_filter_example --params "start_date='2021-01-01',end_date='2021-01-05'"
metricflow/cli/utils.py
Outdated
@@ -46,7 +46,8 @@ def query_options(function: Callable) -> Callable: | |||
)(function) | |||
function = click.option( | |||
"--metrics", | |||
type=click_custom.SequenceParamType(min_length=1), | |||
# Validity checks for this parameter was moved to the MetricFlowEngine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for now, this is better:
# Validity checks for this parameter was moved to the MetricFlowEngine. | |
# Validation is handled in the MetricFlowEngine |
Longer term I'd like to build out a formal API that manages all of this stuff and just make the CLI interface an example caller of that API. The MetricFlowEngine is a little too much of a mixture between internal classes and public-facing things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -84,6 +85,8 @@ class MetricFlowQueryType(Enum): | |||
class MetricFlowQueryRequest: | |||
"""Encapsulates the parameters for a metric query. | |||
|
|||
TODO: This has turned into a bag of parameters that make it difficult to use without a bunch of conditionals. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course it has, that's what always happens! Where's the lolsob emoji when I need it....
assert_exactly_one_arg_set(metric_names=metric_names, metrics=metrics) | ||
assert not ( | ||
group_by_names and group_by | ||
), "Both group_by_names and group_by were set, but if a group by is specified you should only use one of these!" | ||
assert not ( | ||
order_by_names and order_by | ||
), "Both order_by_names and order_by were set, but if an order by is specified you should only use one of these!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we removing these? Is this a redundant validation? If it isn't we should leave them in, because we're in this ugly transitional state where we accept both type inputs via different parameters instead of either merging to a Union or pushing that to an outer interface where everything can be normalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a redundant validation that's handled in MetricFlowQueryParser
.
metricflow/query/query_parser.py
Outdated
""" | ||
saved_query = self._get_saved_query(saved_query_parameter) | ||
|
||
# This logic could be encapsulated in the WhereFilter through a merge interface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and even more so now that the object is going to hold a list of WhereFilters.
I haven't added it yet, but it might be something we should add in to the base protocol. If we don't think that's worth doing we can re-type everything internally to take an MFWhereFilter that just extends the WhereFilter protocol with merge or combine or whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we could add Mergeable
to the base class, but we can visit that once Mergeable
is in.
metricflow/query/query_parser.py
Outdated
where_conditions_with_parenthesis = tuple(f"({where_condition})" for where_condition in where_conditions) | ||
combined_where_filter = PydanticWhereFilter( | ||
where_sql_template=" AND ".join(where_conditions_with_parenthesis) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eeeewwwwww......
I was thinking about eventually moving this to the renderers, since we won't always render the combined filter. The container class I created should help with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The container class helped, but I didn't see a place to render it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was thinking eventually we'd just use the container and pass it all the way through but maybe that's not the right way to do it. What you did in #809 seems reasonable for now.
metricflow/specs/python_object.py
Outdated
handles operations related to the object-builder naming scheme. | ||
|
||
Additional issues: | ||
* The call parameter sets in DSI does not support date part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also not support date part in saved queries, it's a partially supported feature built solely for use in interactive sessions through the Tableau connector. As we get more experience with how the date_part mechanics are used in practice we'll push more robust support down to MetricFlow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine by me - updated comment.
from metricflow.specs.query_param_implementations import DimensionOrEntityParameter, TimeDimensionParameter | ||
|
||
|
||
def parse_object_builder_naming_scheme(group_by_item_name: str) -> GroupByParameter: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how much of this still applies and what's been improved. Agreed we should check in with @DevonFulcher, probably worth doing that before merge since you'll need to update some stuff due to me mucking about with WhereFilter interfaces in dbt-semantic-interfaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, sorry I am just catching up here. In general, I agree that query interface objects should share more code. Now that they all share an interface, this should be easier. I'm happy to chat about that more about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with @DevonFulcher - using this for now until an improved setup is ready.
""" | ||
try: | ||
call_parameter_sets = PydanticWhereFilter( | ||
where_sql_template="{{ " + group_by_item_name + " }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for group_by_item_name
to contain malicious code? We should consider validating the string before calling this constructor. That is probably a concern of MFS though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mainly for the CLI, and verified with @DevonFulcher that it's handled on the MFS side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should all go away when we finish cleaning up the parameter parsing stuff, right?
402e286
to
a800cb9
Compare
a800cb9
to
56eaf0c
Compare
56eaf0c
to
1361f70
Compare
In this initial version, there is support for additional where filters when running a saved query, so users would have to pass in an appropriately constructed where filter to do something like - run the saved query for a different set of dates. We haven't discussed nor designed the parameterization interface, so that will have to be handled later. |
metrics: | ||
- bookings | ||
- instant_bookings | ||
group_bys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plypaul should this be group_by?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a spec change? This is following #765
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a typo in the spec. We use group_by
everywhere else, so it should be group_by
for saved queries.
We used to call them group_bys
but it was horribly confusing for everybody so we switched to group_by
.
group_bys: | ||
- TimeDimension('metric_time', 'day') | ||
- Dimension('listing__capacity_latest') | ||
where: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plypaul I think this should be filter
to keep it consistent with metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jstein77 we decided that where
was what we'd use for saved queries because they're queries, and where
matches the query interfaces for the CLI, JDBC and GraphQL.
filter
is what we use for metrics and other non-query objects in the spec.
I'd prefer to pick one and use it everywhere, but alas that ship has sailed. Or we can go with filter
here and commit to changing the other query interfaces.
One argument in favor of where
- we may need to introduce a post-aggregation filter (a having
input) for query time specification and that can be a natural distinction for people accustomed to SQL. That said, I don't like calling anything having
because it's a goofy name for a filter expression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think this is good to merge at this point, but it seems like we need a dbt-semantic-interfaces 0.3.1 bugfix to roll out for the group_by
thing.
metricflow/cli/utils.py
Outdated
@@ -46,7 +46,8 @@ def query_options(function: Callable) -> Callable: | |||
)(function) | |||
function = click.option( | |||
"--metrics", | |||
type=click_custom.SequenceParamType(min_length=1), | |||
# Validation is handled in the MetricFlowEngine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this comment, especially since we're removing this requirement anyway.
""" | ||
try: | ||
call_parameter_sets = PydanticWhereFilter( | ||
where_sql_template="{{ " + group_by_item_name + " }}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this should all go away when we finish cleaning up the parameter parsing stuff, right?
metrics: | ||
- bookings | ||
- instant_bookings | ||
group_bys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a typo in the spec. We use group_by
everywhere else, so it should be group_by
for saved queries.
We used to call them group_bys
but it was horribly confusing for everybody so we switched to group_by
.
group_bys: | ||
- TimeDimension('metric_time', 'day') | ||
- Dimension('listing__capacity_latest') | ||
where: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jstein77 we decided that where
was what we'd use for saved queries because they're queries, and where
matches the query interfaces for the CLI, JDBC and GraphQL.
filter
is what we use for metrics and other non-query objects in the spec.
I'd prefer to pick one and use it everywhere, but alas that ship has sailed. Or we can go with filter
here and commit to changing the other query interfaces.
One argument in favor of where
- we may need to introduce a post-aggregation filter (a having
input) for query time specification and that can be a natural distinction for people accustomed to SQL. That said, I don't like calling anything having
because it's a goofy name for a filter expression.
1361f70
to
9daf141
Compare
9daf141
to
41679b9
Compare
This broke the engine tests. Also Redshift is no longer connecting, so I can't just fix them. |
Resolves #765
Description
Please see the linked issue for more details. As this PR depends on
dbt-semantic-interfaces~=0.3.0a2
which includes the interface definition for saved queries, and it's currently not possible to specify that dependency due to a conflict withdbt-core~=1.6.0
as required by this package. Oncedbt-core
is updated, this PR will be updated so that CI checks can pass. Locally, a custompyproject.yaml
was used to test this feature.