Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to FULL OUTER JOIN for derived & ratio metrics #842

Merged
merged 8 commits into from
Nov 4, 2023
8 changes: 3 additions & 5 deletions metricflow/plan_conversion/dataflow_to_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -906,12 +906,10 @@ def visit_combine_metrics_node(self, node: CombineMetricsNode) -> SqlDataSet:
"""Join computed metric datasets together to return a single dataset containing all metrics.

This node may exist in one of two situations: when metrics need to be combined in order to produce a single
dataset with all required inputs for a derived metric (in which case the join type is INNER), or when
metrics need to be combined in order to produce a single dataset of output for downstream consumption by
the end user, in which case we will use FULL OUTER JOIN.
dataset with all required inputs for a derived metric, or when metrics need to be combined in order to produce
a single dataset of output for downstream consumption by the end user.

In the case of a multi-data-source FULL OUTER JOIN the join key will be a coalesced set of all previously
seen dimension values. For example:
The join key will be a coalesced set of all previously seen dimension values. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to do this here, but if there's logic in this function for rendering the INNER JOIN case maybe we can remove the join type parameter AND that logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain the INNER JOIN case is not used at all! I can remove the join_type param before merging!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...I take it back, it's more complicated than I thought 😅 so will deal with that in a follow up!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was just on the wrong branch. Easy to add here after all! Removed the param and all it changes is the displayed properties in the dataflow plans 👍

FROM (
...
) subq_9
Expand Down