-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Querying with DATE PART #772
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems reasonable although it's hard for me to see what issues might crop up.
I've left a bunch of small things inline, but my main concerns are with validation vs overrides for mismatched grain/extract parameters, and with the way qualified_name gets used.
For the latter, looking at the codebase, I think we're safe for now, but I'm wondering if there's some scenario where we update some code that accidentally tries to parse a qualified name with a date_part appended and then weird stuff happens. I'll think about this a bit more but if we can't come up with a reasonable safeguard (maybe we have a different property for "select names" or something?) I guess we'll have to fall back on documentation for qualified_name.
ca8ab1d
to
8a63473
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! The one big change we discuss inline about moving from granularity overrides to requiring default granularity. I think your plan makes the most sense.
If you'd prefer to merge this as-is and follow up with that change that's fine with me, just request review with a note. I don't have strong preferences but I'd like to make sure I get to read the change just so I understand how it works.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to approve this because after looking at the update I think it's appropriately restrictive.
With this set of changes, we currently allow date_part queries against default granularity, but the default granularity effectively aggregates to date_part. What I mean is, if a metric is DAILY and we request YEAR for date_part, we'll just add the whole thing up across the year and return a single row. The granularity effectively gets overridden, and the user gets one row per date_part value.
From an end user perspective, they can't alter this behavior. So they can't say "I want distinct users, which is day grain, but computed at month grain, grouped by date_part YEAR" because there's no reasonable way for us to do that - we'd end up returning 12 rows per YEAR but with the same YEAR grain. The way for them to get what they want is to group by both date_part YEAR and date_part MONTH.
Therefore, we only allow them to set a granularity equal to the base granularity of the metric, and we only allow that because certain interfaces require it. That's a rough edge we can clean up later, but it does feel like any granularity adjustments should be made via the date_part values, and we should simply block everything else.
requested_dimension_structured_name: StructuredLinkableSpecName, | ||
partial_time_dimension_spec: PartialTimeDimensionSpec, | ||
metric_references: Sequence[MetricReference], | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost forgot, can we add a docblock explaining why this is here? Something like:
"""
Enforce that any granularity value associated with a date part query is the minimum.
By default, we will always ensure that a date_part query request uses the minimum granularity.
However, there are some interfaces where the user must pass in a granularity, so we need a check to
ensure that the correct value was passed in.
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some interfaces where the user must pass in a granularity
Which interface(s) require this ?
Description
Enable aggregating time dimensions by their date part (e.g.,
EXTRACT(month FROM '2020-01-01')
->1
).Completes SL-817.