Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to reflect a single input measure per metric #843

Merged
merged 3 commits into from
Nov 6, 2023

Conversation

courtneyholcomb
Copy link
Contributor

@courtneyholcomb courtneyholcomb commented Nov 3, 2023

Description

We no longer support having multiple input measures in one metric. Instead, you might have multiple input metrics. This PR simplifies some code to reflect that change:

  • Removes the now-unused JoinAggregatedMeasuresByGroupByColumnsNode.

@cla-bot cla-bot bot added the cla:yes label Nov 3, 2023
@courtneyholcomb courtneyholcomb marked this pull request as ready for review November 3, 2023 23:53
@courtneyholcomb courtneyholcomb changed the title Remove unused JoinAggregatedMeasuresByGroupByColumnsNode Updates to reflect a single input measure per metric Nov 4, 2023
Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the cleanup, and I like the direction even if we do end up handling multiple measures.

However, before doing a detailed review we might want to check in with @WilliamDee on how he's planning to handle conversion metrics. The spec for them is based on measures, rather than metrics, and there will be two measure aggregation inputs (a base event measure and a conversion event measure).

We have a couple of options for managing this:

  1. Keep the codebase as-is in trunk and let conversion metrics use the generalized "0 or more measures" approach we currently use
  2. Merge this change and then have conversion metrics call build_aggregated_measure twice, and add a separate node to merge those built measures together in the most appropriate way (either via a single query or separate aggregations + join)
  3. Merge this change and base conversion metrics on internal "metric" type objects rather than measures. Essentially, we pretend conversion metrics are just like derived metrics, even though they aren't that way in the user-facing config spec.

There are probably more I'm not thinking of as well.

Of the three listed above I'm leaning towards option 2 at this point, because it seems like the right balance. Where we are today means we have all of this logic for dealing with the "0 or more" scenarios when the vast majority of the time we will have exactly 1 measure.

@WilliamDee
Copy link
Contributor

@tlento @courtneyholcomb

have conversion metrics call build_aggregated_measure twice, and add a separate node to merge those built measures together in the most appropriate way (either via a single query or separate aggregations + join)

Actually, I'm already doing that (code). However, I am using JoinAggregatedMeasuresByGroupByColumnsNode to combine that together.

@courtneyholcomb
Copy link
Contributor Author

I am using JoinAggregatedMeasuresByGroupByColumnsNode to combine that together.

@WilliamDee oooo so I should just leave that node definition? Is there anything else from this PR you need us to keep?

@WilliamDee
Copy link
Contributor

WilliamDee commented Nov 6, 2023

oooo so I should just leave that node definition? Is there anything else from this PR you need us to keep?

Yea lets leave it since I'm using that. I'm also using the metric_lookup to get the input measures too 😅 so that kinda kills this entire PR. But I don't want to keep stuff just for the sake of adding back my old PR which could be just doing things the wrong way with the current state of MF. This whole PR rewriting has got me feeling like i'm dropped in the middle of the ocean LOL. And I didn't fully like how I was using measures_for_metric. It was kinda terrible, I basically use it to return the 2 measures (base/conversion) then wrote a local function that matched the name to get the base and conversion specifically

def _get_matching_measure(
    measure_to_match: MeasureReference, measure_specs: Tuple[MetricInputMeasureSpec, ...]
) -> MetricInputMeasureSpec:
    matched_measure = next(
        filter(
            lambda x: measure_to_match == x.measure_spec.as_reference,
            measure_specs,
        ),
        None,
    )
    assert matched_measure, f"Unable to find {measure_to_match} in {measure_specs}."
    return matched_measure

So 1 way around it with your change is maybe add MetricLookup.get_base_measure and MetricLookup.get_conversion_measure? Not sure how well received this would be tho putting it in there considering it's very much specific to conversion metrics only.

Where we are today means we have all of this logic for dealing with the "0 or more" scenarios when the vast majority of the time we will have exactly 1 measure.

@tlento Following onto this, we don't expect multiple input measures for the current state of MF, but would that be the case moving forward? Like in an alternate universe, say we always handled only 1 measure from the start, then we decided to support conversion metrics and that required 2 measures, would there be a world where we're like "oh maybe we should start handling 0 or more scenarios" or would we be looking at alternatives (ie., using internal metrics) as the better option.

Copy link
Contributor

@tlento tlento left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay! Thank you for the cleanup! Here's a gif of a raccoon sweeping the floor....

Comment on lines +257 to +259
assert (
len(metric_input_measure_specs) == 1
), "Simple and cumulative metrics must have one input measure."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was so happy to see this go away.....

I have some vague ideas for re-adding this bit of cleanup as we do conversion metrics, I'll check in with @WilliamDee about it. Not sure if they'll work with how he ends up building conversion metrics.

@@ -641,7 +642,7 @@ def build_computed_metrics_node(

def build_aggregated_measures(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we rename this, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

cumulative=cumulative,
cumulative_window=cumulative_window,
cumulative_grain_to_date=cumulative_grain_to_date,
)

def _build_aggregated_measures_from_measure_source_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this? _build_aggregated_measure_from.....

@tlento
Copy link
Contributor

tlento commented Nov 6, 2023

@tlento Following onto this, we don't expect multiple input measures for the current state of MF, but would that be the case moving forward? Like in an alternate universe, say we always handled only 1 measure from the start, then we decided to support conversion metrics and that required 2 measures, would there be a world where we're like "oh maybe we should start handling 0 or more scenarios" or would we be looking at alternatives (ie., using internal metrics) as the better option.

From Slack - great question, personally, I'd just handle that one scenario for now.

Historically, I believe the reason this was here was because we used to have an expr type metric that took in an arbitrary number of measures, but that metric type has been shifted to derived, so I think this cleanup is generally the direction we should go.

@courtneyholcomb courtneyholcomb added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 6, 2023
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Nov 6, 2023
@courtneyholcomb courtneyholcomb merged commit 043bbbb into main Nov 6, 2023
19 checks passed
@courtneyholcomb courtneyholcomb deleted the court/kill-join-agg-measures-node branch November 6, 2023 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants