-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include Time Spine Nodes in Dataflow Plan #1548
Conversation
87ef573
to
107dcb7
Compare
107dcb7
to
d1921c9
Compare
c0b009e
to
fe9a711
Compare
84dffb9
to
2bb872b
Compare
fe9a711
to
5515aa1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but left some minor comments.
metricflow/dataset/sql_dataset.py
Outdated
@@ -144,7 +145,7 @@ def instances_for_time_dimensions( | |||
return instances_to_return | |||
|
|||
def instance_for_time_dimension(self, time_dimension_spec: TimeDimensionSpec) -> TimeDimensionInstance: | |||
"""Given the name of the time dimension, return the instance associated with it in the data set.""" | |||
"""Given ta time dimension spec, return the instance associated with it in the data set.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: typo
metricflow/dataset/sql_dataset.py
Outdated
for instance in instances: | ||
if instance.spec == spec: | ||
return instance | ||
raise RuntimeError(f"Did not find instance matching spec in dataset. Spec: {spec}\nInstances: {instances}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better formatting, you can do:
# Easier autocomplete.
raise RuntimeError(str(LazyFormat("Did not find instance matching spec in dataset.", spec=spec, instances=instances))
# Or call formatting function.
raise RuntimeError(mf_pformat_dict("Did not find instance matching spec in dataset.", {"spec": spec, "instances": instances))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good call! I need to change this habit.
class AliasSpecsNode(DataflowPlanNode, ABC): | ||
"""Change the columns matching the key specs to match the value specs.""" | ||
|
||
change_specs: Sequence[Tuple[InstanceSpec, InstanceSpec]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this class is frozen, it's preferable to keep the field types immutable as well. i.e.
change_specs: Tuple[...]
Also helpful to have a dataclass for Tuple[InstanceSpec, InstanceSpec]
so that you don't have to remember what index is what.
… for time spines The metric time nodes are used for resolving metric_time without metrics. The read SQL nodes will be used for time spine joins.
This will allow us to stop building time spine nodes on the fly in the dataflow to SQL logic.
Also fixes a small issue where we were prematurely filtering out columns in that node. This is what you see in the snapshot changes. It should not impact optimized queries.
d74a4c4
to
9be730c
Compare
Previously, time spine nodes were built on the fly in the dataflow to SQL logic. This came with several limitations, including:
This PR adds logic to build time spine nodes in the dataflow plan, instead.