Support Saved Queries #794

plypaul · 2023-10-05T05:33:23Z

Resolves #765

Description

Please see the linked issue for more details. As this PR depends on dbt-semantic-interfaces~=0.3.0a2 which includes the interface definition for saved queries, and it's currently not possible to specify that dependency due to a conflict with dbt-core~=1.6.0 as required by this package. Once dbt-core is updated, this PR will be updated so that CI checks can pass. Locally, a custom pyproject.yaml was used to test this feature.

tlento

This seems like a reasonable first cut.

How were you planning to address the parameterizable where filter? What we really want out of a saved query is something where the filter is more like:

saved_queries:
  - name: date_filter_example
    filter: metric_time BETWEEN <start_date> AND <end_date>

Then on invocation we want people to be able to do something like:

mf query --saved-query date_filter_example --params "start_date='2021-01-01',end_date='2021-01-05'"

tlento · 2023-10-06T18:11:28Z

metricflow/cli/utils.py

@@ -46,7 +46,8 @@ def query_options(function: Callable) -> Callable:
    )(function)
    function = click.option(
        "--metrics",
-        type=click_custom.SequenceParamType(min_length=1),
+        # Validity checks for this parameter was moved to the MetricFlowEngine.


nit: for now, this is better:

Suggested change

# Validity checks for this parameter was moved to the MetricFlowEngine.

# Validation is handled in the MetricFlowEngine

Longer term I'd like to build out a formal API that manages all of this stuff and just make the CLI interface an example caller of that API. The MetricFlowEngine is a little too much of a mixture between internal classes and public-facing things.

tlento · 2023-10-06T18:13:07Z

metricflow/engine/metricflow_engine.py

@@ -84,6 +85,8 @@ class MetricFlowQueryType(Enum):
 class MetricFlowQueryRequest:
    """Encapsulates the parameters for a metric query.

+    TODO: This has turned into a bag of parameters that make it difficult to use without a bunch of conditionals.


Of course it has, that's what always happens! Where's the lolsob emoji when I need it....

tlento · 2023-10-06T18:13:53Z

metricflow/engine/metricflow_engine.py

-        assert_exactly_one_arg_set(metric_names=metric_names, metrics=metrics)
-        assert not (
-            group_by_names and group_by
-        ), "Both group_by_names and group_by were set, but if a group by is specified you should only use one of these!"
-        assert not (
-            order_by_names and order_by
-        ), "Both order_by_names and order_by were set, but if an order by is specified you should only use one of these!"


Why are we removing these? Is this a redundant validation? If it isn't we should leave them in, because we're in this ugly transitional state where we accept both type inputs via different parameters instead of either merging to a Union or pushing that to an outer interface where everything can be normalized.

Yeah, it's a redundant validation that's handled in MetricFlowQueryParser.

tlento · 2023-10-10T04:35:40Z

metricflow/query/query_parser.py

+        """
+        saved_query = self._get_saved_query(saved_query_parameter)
+
+        # This logic could be encapsulated in the WhereFilter through a merge interface.


Yes, and even more so now that the object is going to hold a list of WhereFilters.

I haven't added it yet, but it might be something we should add in to the base protocol. If we don't think that's worth doing we can re-type everything internally to take an MFWhereFilter that just extends the WhereFilter protocol with merge or combine or whatever.

Yeah, we could add Mergeable to the base class, but we can visit that once Mergeable is in.

tlento · 2023-10-10T04:36:59Z

metricflow/query/query_parser.py

+            where_conditions_with_parenthesis = tuple(f"({where_condition})" for where_condition in where_conditions)
+            combined_where_filter = PydanticWhereFilter(
+                where_sql_template=" AND ".join(where_conditions_with_parenthesis)
+            )


eeeewwwwww......

I was thinking about eventually moving this to the renderers, since we won't always render the combined filter. The container class I created should help with this.

The container class helped, but I didn't see a place to render it?

It's currently here: https://github.com/dbt-labs/metricflow/pull/809/files#diff-14f344e3a9bd66bc0d46537a8017023c16cd242d65ace611485b8277e0719b74

Yeah I was thinking eventually we'd just use the container and pass it all the way through but maybe that's not the right way to do it. What you did in #809 seems reasonable for now.

tlento · 2023-10-10T04:40:34Z

metricflow/specs/python_object.py

+    handles operations related to the object-builder naming scheme.
+
+    Additional issues:
+    * The call parameter sets in DSI does not support date part.


Let's also not support date part in saved queries, it's a partially supported feature built solely for use in interactive sessions through the Tableau connector. As we get more experience with how the date_part mechanics are used in practice we'll push more robust support down to MetricFlow.

Fine by me - updated comment.

tlento · 2023-10-10T04:41:54Z

metricflow/specs/python_object.py

+from metricflow.specs.query_param_implementations import DimensionOrEntityParameter, TimeDimensionParameter
+
+
+def parse_object_builder_naming_scheme(group_by_item_name: str) -> GroupByParameter:


I don't know how much of this still applies and what's been improved. Agreed we should check in with @DevonFulcher, probably worth doing that before merge since you'll need to update some stuff due to me mucking about with WhereFilter interfaces in dbt-semantic-interfaces.

Hey, sorry I am just catching up here. In general, I agree that query interface objects should share more code. Now that they all share an interface, this should be easier. I'm happy to chat about that more about that.

Chatted with @DevonFulcher - using this for now until an improved setup is ready.

DevonFulcher · 2023-10-12T14:25:09Z

metricflow/specs/python_object.py

+    """
+    try:
+        call_parameter_sets = PydanticWhereFilter(
+            where_sql_template="{{ " + group_by_item_name + " }}"


Is it possible for group_by_item_name to contain malicious code? We should consider validating the string before calling this constructor. That is probably a concern of MFS though

This is mainly for the CLI, and verified with @DevonFulcher that it's handled on the MFS side.

Also, this should all go away when we finish cleaning up the parameter parsing stuff, right?

plypaul · 2023-10-14T06:00:19Z

How were you planning to address the parameterizable where filter?

In this initial version, there is support for additional where filters when running a saved query, so users would have to pass in an appropriately constructed where filter to do something like - run the saved query for a different set of dates. We haven't discussed nor designed the parameterization interface, so that will have to be handled later.

Jstein77 · 2023-10-17T06:44:26Z

metricflow/test/fixtures/semantic_manifest_yamls/simple_manifest/saved_queries.yaml

+  metrics:
+    - bookings
+    - instant_bookings
+  group_bys:


@plypaul should this be group_by?

Was there a spec change? This is following #765

That's a typo in the spec. We use group_by everywhere else, so it should be group_by for saved queries.

We used to call them group_bys but it was horribly confusing for everybody so we switched to group_by.

Jstein77 · 2023-10-17T06:45:57Z

metricflow/test/fixtures/semantic_manifest_yamls/simple_manifest/saved_queries.yaml

+  group_bys:
+    - TimeDimension('metric_time', 'day')
+    - Dimension('listing__capacity_latest')
+  where:


@plypaul I think this should be filter to keep it consistent with metrics.

@Jstein77 we decided that where was what we'd use for saved queries because they're queries, and where matches the query interfaces for the CLI, JDBC and GraphQL.

filter is what we use for metrics and other non-query objects in the spec.

I'd prefer to pick one and use it everywhere, but alas that ship has sailed. Or we can go with filter here and commit to changing the other query interfaces.

One argument in favor of where - we may need to introduce a post-aggregation filter (a having input) for query time specification and that can be a natural distinction for people accustomed to SQL. That said, I don't like calling anything having because it's a goofy name for a filter expression.

tlento

So I think this is good to merge at this point, but it seems like we need a dbt-semantic-interfaces 0.3.1 bugfix to roll out for the group_by thing.

tlento · 2023-10-23T23:51:25Z

metricflow/cli/utils.py

@@ -46,7 +46,8 @@ def query_options(function: Callable) -> Callable:
    )(function)
    function = click.option(
        "--metrics",
-        type=click_custom.SequenceParamType(min_length=1),
+        # Validation is handled in the MetricFlowEngine


I don't think we need this comment, especially since we're removing this requirement anyway.

tlento · 2023-10-24T00:03:24Z

metricflow/specs/python_object.py

+    """
+    try:
+        call_parameter_sets = PydanticWhereFilter(
+            where_sql_template="{{ " + group_by_item_name + " }}"


Also, this should all go away when we finish cleaning up the parameter parsing stuff, right?

tlento · 2023-10-24T00:07:37Z

metricflow/test/fixtures/semantic_manifest_yamls/simple_manifest/saved_queries.yaml

+  metrics:
+    - bookings
+    - instant_bookings
+  group_bys:


That's a typo in the spec. We use group_by everywhere else, so it should be group_by for saved queries.

We used to call them group_bys but it was horribly confusing for everybody so we switched to group_by.

tlento · 2023-10-24T00:11:26Z

metricflow/test/fixtures/semantic_manifest_yamls/simple_manifest/saved_queries.yaml

+  group_bys:
+    - TimeDimension('metric_time', 'day')
+    - Dimension('listing__capacity_latest')
+  where:


@Jstein77 we decided that where was what we'd use for saved queries because they're queries, and where matches the query interfaces for the CLI, JDBC and GraphQL.

filter is what we use for metrics and other non-query objects in the spec.

I'd prefer to pick one and use it everywhere, but alas that ship has sailed. Or we can go with filter here and commit to changing the other query interfaces.

One argument in favor of where - we may need to introduce a post-aggregation filter (a having input) for query time specification and that can be a natural distinction for people accustomed to SQL. That said, I don't like calling anything having because it's a goofy name for a filter expression.

tlento · 2023-10-25T15:17:52Z

This broke the engine tests. Also Redshift is no longer connecting, so I can't just fix them.

plypaul requested a review from tlento October 5, 2023 05:39

tlento reviewed Oct 10, 2023

View reviewed changes

DevonFulcher reviewed Oct 12, 2023

View reviewed changes

plypaul force-pushed the plypaul--57--saved-query4 branch from 402e286 to a800cb9 Compare October 12, 2023 21:43

cla-bot bot added the cla:yes label Oct 12, 2023

plypaul changed the base branch from main to plypaul--59--update-dsi October 12, 2023 21:44

plypaul force-pushed the plypaul--57--saved-query4 branch from a800cb9 to 56eaf0c Compare October 12, 2023 22:07

plypaul marked this pull request as ready for review October 12, 2023 22:10

plypaul force-pushed the plypaul--57--saved-query4 branch from 56eaf0c to 1361f70 Compare October 12, 2023 22:15

Base automatically changed from plypaul--59--update-dsi to main October 13, 2023 00:27

plypaul requested a review from tlento October 16, 2023 17:58

Jstein77 reviewed Oct 17, 2023

View reviewed changes

tlento approved these changes Oct 24, 2023

View reviewed changes

plypaul force-pushed the plypaul--57--saved-query4 branch from 1361f70 to 9daf141 Compare October 24, 2023 05:41

plypaul added 3 commits October 23, 2023 22:48

Allow the use of saved queries in the engine / CLI.

485c06f

Add test cases for saved queries in the CLI.

633e56b

Add change log for #765.

41679b9

plypaul force-pushed the plypaul--57--saved-query4 branch from 9daf141 to 41679b9 Compare October 24, 2023 05:48

plypaul merged commit e7ccd73 into main Oct 24, 2023
6 checks passed

plypaul deleted the plypaul--57--saved-query4 branch October 24, 2023 05:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Saved Queries #794

Support Saved Queries #794

plypaul commented Oct 5, 2023

tlento left a comment

tlento Oct 6, 2023

plypaul Oct 12, 2023

tlento Oct 6, 2023

tlento Oct 6, 2023

plypaul Oct 12, 2023

tlento Oct 10, 2023

plypaul Oct 12, 2023

tlento Oct 10, 2023

plypaul Oct 12, 2023

plypaul Oct 12, 2023

tlento Oct 12, 2023

tlento Oct 10, 2023

plypaul Oct 12, 2023

tlento Oct 10, 2023

DevonFulcher Oct 12, 2023

plypaul Oct 13, 2023

DevonFulcher Oct 12, 2023

plypaul Oct 13, 2023

tlento Oct 24, 2023

plypaul commented Oct 14, 2023

Jstein77 Oct 17, 2023

plypaul Oct 17, 2023

tlento Oct 24, 2023

Jstein77 Oct 17, 2023

tlento Oct 24, 2023

tlento left a comment

tlento Oct 23, 2023

tlento Oct 24, 2023

tlento Oct 24, 2023

tlento Oct 24, 2023

tlento commented Oct 25, 2023

	# Validity checks for this parameter was moved to the MetricFlowEngine.
	# Validation is handled in the MetricFlowEngine

		from metricflow.specs.query_param_implementations import DimensionOrEntityParameter, TimeDimensionParameter


		def parse_object_builder_naming_scheme(group_by_item_name: str) -> GroupByParameter:

Support Saved Queries #794

Support Saved Queries #794

Conversation

plypaul commented Oct 5, 2023

Description

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

plypaul commented Oct 14, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlento left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tlento commented Oct 25, 2023