Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querying with DATE PART #772

Merged
merged 14 commits into from
Sep 19, 2023
Merged

Querying with DATE PART #772

merged 14 commits into from
Sep 19, 2023

Conversation

courtneyholcomb
Copy link
Contributor

Description

Enable aggregating time dimensions by their date part (e.g., EXTRACT(month FROM '2020-01-01') -> 1).
Completes SL-817.

@linear
Copy link

linear bot commented Sep 13, 2023

@cla-bot cla-bot bot added the cla:yes label Sep 13, 2023
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS September 13, 2023 19:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS September 13, 2023 19:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS September 13, 2023 19:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS September 13, 2023 19:13 — with GitHub Actions Inactive
@courtneyholcomb courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS September 13, 2023 19:13 — with GitHub Actions Inactive
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this seems reasonable although it's hard for me to see what issues might crop up.

I've left a bunch of small things inline, but my main concerns are with validation vs overrides for mismatched grain/extract parameters, and with the way qualified_name gets used.

For the latter, looking at the codebase, I think we're safe for now, but I'm wondering if there's some scenario where we update some code that accidentally tries to parse a qualified name with a date_part appended and then weird stuff happens. I'll think about this a bit more but if we can't come up with a reasonable safeguard (maybe we have a different property for "select names" or something?) I guess we'll have to fall back on documentation for qualified_name.

metricflow/sql/render/big_query.py Show resolved Hide resolved
metricflow/naming/linkable_spec_name.py Show resolved Hide resolved
metricflow/engine/metricflow_engine.py Show resolved Hide resolved
metricflow/query/query_parser.py Outdated Show resolved Hide resolved
metricflow/query/query_parser.py Outdated Show resolved Hide resolved
metricflow/query/query_parser.py Show resolved Hide resolved
metricflow/query/query_parser.py Show resolved Hide resolved
metricflow/specs/where_filter_time_dimension.py Outdated Show resolved Hide resolved
metricflow/test/integration/test_cases/itest_metrics.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! The one big change we discuss inline about moving from granularity overrides to requiring default granularity. I think your plan makes the most sense.

If you'd prefer to merge this as-is and follow up with that change that's fine with me, just request review with a note. I don't have strong preferences but I'd like to make sure I get to read the change just so I understand how it works.

Thanks!

metricflow/naming/linkable_spec_name.py Show resolved Hide resolved
metricflow/query/query_parser.py Show resolved Hide resolved
metricflow/query/query_parser.py Show resolved Hide resolved
metricflow/query/query_parser.py Show resolved Hide resolved
metricflow/sql/render/big_query.py Show resolved Hide resolved
metricflow/test/integration/test_cases/itest_metrics.yaml Outdated Show resolved Hide resolved
metricflow/naming/linkable_spec_name.py Show resolved Hide resolved
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve this because after looking at the update I think it's appropriately restrictive.

With this set of changes, we currently allow date_part queries against default granularity, but the default granularity effectively aggregates to date_part. What I mean is, if a metric is DAILY and we request YEAR for date_part, we'll just add the whole thing up across the year and return a single row. The granularity effectively gets overridden, and the user gets one row per date_part value.

From an end user perspective, they can't alter this behavior. So they can't say "I want distinct users, which is day grain, but computed at month grain, grouped by date_part YEAR" because there's no reasonable way for us to do that - we'd end up returning 12 rows per YEAR but with the same YEAR grain. The way for them to get what they want is to group by both date_part YEAR and date_part MONTH.

Therefore, we only allow them to set a granularity equal to the base granularity of the metric, and we only allow that because certain interfaces require it. That's a rough edge we can clean up later, but it does feel like any granularity adjustments should be made via the date_part values, and we should simply block everything else.

metricflow/query/query_parser.py Outdated Show resolved Hide resolved
requested_dimension_structured_name: StructuredLinkableSpecName,
partial_time_dimension_spec: PartialTimeDimensionSpec,
metric_references: Sequence[MetricReference],
) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost forgot, can we add a docblock explaining why this is here? Something like:

"""
Enforce that any granularity value associated with a date part query is the minimum.

By default, we will always ensure that a date_part query request uses the minimum granularity.
However, there are some interfaces where the user must pass in a granularity, so we need a check to
ensure that the correct value was passed in.
"""

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some interfaces where the user must pass in a granularity

Which interface(s) require this ?

@courtneyholcomb courtneyholcomb merged commit b566ef9 into main Sep 19, 2023
8 checks passed
@courtneyholcomb courtneyholcomb deleted the court/date-part branch September 19, 2023 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants