Updates to reflect a single input measure per metric #843

courtneyholcomb · 2023-11-03T23:52:48Z

Description

We no longer support having multiple input measures in one metric. Instead, you might have multiple input metrics. This PR simplifies some code to reflect that change:

Removes the now-unused JoinAggregatedMeasuresByGroupByColumnsNode.

tlento

Love the cleanup, and I like the direction even if we do end up handling multiple measures.

However, before doing a detailed review we might want to check in with @WilliamDee on how he's planning to handle conversion metrics. The spec for them is based on measures, rather than metrics, and there will be two measure aggregation inputs (a base event measure and a conversion event measure).

We have a couple of options for managing this:

Keep the codebase as-is in trunk and let conversion metrics use the generalized "0 or more measures" approach we currently use
Merge this change and then have conversion metrics call build_aggregated_measure twice, and add a separate node to merge those built measures together in the most appropriate way (either via a single query or separate aggregations + join)
Merge this change and base conversion metrics on internal "metric" type objects rather than measures. Essentially, we pretend conversion metrics are just like derived metrics, even though they aren't that way in the user-facing config spec.

There are probably more I'm not thinking of as well.

Of the three listed above I'm leaning towards option 2 at this point, because it seems like the right balance. Where we are today means we have all of this logic for dealing with the "0 or more" scenarios when the vast majority of the time we will have exactly 1 measure.

WilliamDee · 2023-11-06T20:28:13Z

@tlento @courtneyholcomb

have conversion metrics call build_aggregated_measure twice, and add a separate node to merge those built measures together in the most appropriate way (either via a single query or separate aggregations + join)

Actually, I'm already doing that (code). However, I am using JoinAggregatedMeasuresByGroupByColumnsNode to combine that together.

courtneyholcomb · 2023-11-06T20:30:07Z

I am using JoinAggregatedMeasuresByGroupByColumnsNode to combine that together.

@WilliamDee oooo so I should just leave that node definition? Is there anything else from this PR you need us to keep?

WilliamDee · 2023-11-06T20:50:07Z

oooo so I should just leave that node definition? Is there anything else from this PR you need us to keep?

Yea lets leave it since I'm using that. I'm also using the metric_lookup to get the input measures too 😅 so that kinda kills this entire PR. But I don't want to keep stuff just for the sake of adding back my old PR which could be just doing things the wrong way with the current state of MF. This whole PR rewriting has got me feeling like i'm dropped in the middle of the ocean LOL. And I didn't fully like how I was using measures_for_metric. It was kinda terrible, I basically use it to return the 2 measures (base/conversion) then wrote a local function that matched the name to get the base and conversion specifically

def _get_matching_measure(
    measure_to_match: MeasureReference, measure_specs: Tuple[MetricInputMeasureSpec, ...]
) -> MetricInputMeasureSpec:
    matched_measure = next(
        filter(
            lambda x: measure_to_match == x.measure_spec.as_reference,
            measure_specs,
        ),
        None,
    )
    assert matched_measure, f"Unable to find {measure_to_match} in {measure_specs}."
    return matched_measure

So 1 way around it with your change is maybe add MetricLookup.get_base_measure and MetricLookup.get_conversion_measure? Not sure how well received this would be tho putting it in there considering it's very much specific to conversion metrics only.

Where we are today means we have all of this logic for dealing with the "0 or more" scenarios when the vast majority of the time we will have exactly 1 measure.

@tlento Following onto this, we don't expect multiple input measures for the current state of MF, but would that be the case moving forward? Like in an alternate universe, say we always handled only 1 measure from the start, then we decided to support conversion metrics and that required 2 measures, would there be a world where we're like "oh maybe we should start handling 0 or more scenarios" or would we be looking at alternatives (ie., using internal metrics) as the better option.

tlento

Yay! Thank you for the cleanup! Here's a gif of a raccoon sweeping the floor....

tlento · 2023-11-06T22:55:33Z

metricflow/dataflow/builder/dataflow_plan_builder.py

+                assert (
+                    len(metric_input_measure_specs) == 1
+                ), "Simple and cumulative metrics must have one input measure."


I was so happy to see this go away.....

I have some vague ideas for re-adding this bit of cleanup as we do conversion metrics, I'll check in with @WilliamDee about it. Not sure if they'll work with how he ends up building conversion metrics.

tlento · 2023-11-06T22:55:57Z

metricflow/dataflow/builder/dataflow_plan_builder.py

@@ -641,7 +642,7 @@ def build_computed_metrics_node(

    def build_aggregated_measures(


Shall we rename this, too?

tlento · 2023-11-06T22:58:07Z

metricflow/dataflow/builder/dataflow_plan_builder.py

+            cumulative=cumulative,
+            cumulative_window=cumulative_window,
+            cumulative_grain_to_date=cumulative_grain_to_date,
+        )

    def _build_aggregated_measures_from_measure_source_node(


And this? _build_aggregated_measure_from.....

tlento · 2023-11-06T23:03:32Z

@tlento Following onto this, we don't expect multiple input measures for the current state of MF, but would that be the case moving forward? Like in an alternate universe, say we always handled only 1 measure from the start, then we decided to support conversion metrics and that required 2 measures, would there be a world where we're like "oh maybe we should start handling 0 or more scenarios" or would we be looking at alternatives (ie., using internal metrics) as the better option.

From Slack - great question, personally, I'd just handle that one scenario for now.

Historically, I believe the reason this was here was because we used to have an expr type metric that took in an arbitrary number of measures, but that metric type has been shifted to derived, so I think this cleanup is generally the direction we should go.

Remove unused JoinAggregatedMeasuresByGroupByColumnsNode

dd1347d

courtneyholcomb added the Skip Changelog label Nov 3, 2023

cla-bot bot added the cla:yes label Nov 3, 2023

courtneyholcomb marked this pull request as ready for review November 3, 2023 23:53

courtneyholcomb changed the title ~~Remove unused JoinAggregatedMeasuresByGroupByColumnsNode~~ Updates to reflect a single input measure per metric Nov 4, 2023

courtneyholcomb requested review from tlento, plypaul and WilliamDee November 4, 2023 00:04

tlento reviewed Nov 6, 2023

View reviewed changes

Undo changes related to measures_for_metric()

994cf9f

courtneyholcomb requested review from tlento November 6, 2023 21:48

tlento approved these changes Nov 6, 2023

View reviewed changes

Rename build_aggregated_measures to build_aggregated_measure

10a0721

courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 6, 2023

courtneyholcomb temporarily deployed to DW_INTEGRATION_TESTS November 6, 2023 23:08 — with GitHub Actions Inactive

github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 6, 2023

courtneyholcomb merged commit 043bbbb into main Nov 6, 2023
19 checks passed

courtneyholcomb deleted the court/kill-join-agg-measures-node branch November 6, 2023 23:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to reflect a single input measure per metric #843

Updates to reflect a single input measure per metric #843

courtneyholcomb commented Nov 3, 2023 •

edited

Loading

tlento left a comment

WilliamDee commented Nov 6, 2023

courtneyholcomb commented Nov 6, 2023

WilliamDee commented Nov 6, 2023 •

edited

Loading

tlento left a comment

tlento Nov 6, 2023

tlento Nov 6, 2023

courtneyholcomb Nov 6, 2023

tlento Nov 6, 2023

tlento commented Nov 6, 2023

		@@ -641,7 +642,7 @@ def build_computed_metrics_node(

		def build_aggregated_measures(

Updates to reflect a single input measure per metric #843

Updates to reflect a single input measure per metric #843

Conversation

courtneyholcomb commented Nov 3, 2023 • edited Loading

Description

tlento left a comment

Choose a reason for hiding this comment

WilliamDee commented Nov 6, 2023

courtneyholcomb commented Nov 6, 2023

WilliamDee commented Nov 6, 2023 • edited Loading

tlento left a comment

Choose a reason for hiding this comment

tlento Nov 6, 2023

Choose a reason for hiding this comment

tlento Nov 6, 2023

Choose a reason for hiding this comment

courtneyholcomb Nov 6, 2023

Choose a reason for hiding this comment

tlento Nov 6, 2023

Choose a reason for hiding this comment

tlento commented Nov 6, 2023

courtneyholcomb commented Nov 3, 2023 •

edited

Loading

WilliamDee commented Nov 6, 2023 •

edited

Loading