-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Mergeable
to Simplify Merging Collections
#790
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
from __future__ import annotations | ||
|
||
import functools | ||
from abc import ABC, abstractmethod | ||
from typing import Iterable, Type, TypeVar | ||
|
||
from typing_extensions import Self | ||
|
||
MergeableT = TypeVar("MergeableT", bound="Mergeable") | ||
|
||
|
||
class Mergeable(ABC): | ||
"""Objects that can be merged together to form a superset object of the same type. | ||
|
||
Merging objects are frequently needed in MetricFlow as there are several recursive operations where the output is | ||
the superset of the result of the recursive calls. | ||
|
||
e.g. | ||
* The validation issue set of a derived metric includes the issues of the parent metrics. | ||
* The output of a node in the dataflow plan can include the outputs of the parent nodes. | ||
* The query-time where filter is useful to combine with the metric-defined where filter. | ||
|
||
Having a common interface also gives a consistent name to this operation so that we don't end up with multiple names | ||
to describe this operation (e.g. combine, add, concat). | ||
|
||
This is used to streamline the following case that occurs in the codebase: | ||
|
||
items_to_merge: List[ItemType] = [] | ||
... | ||
if ... | ||
items_to_merge.append(...) | ||
... | ||
if ... | ||
items_to_merge.append(...) | ||
... | ||
if ... | ||
... | ||
items_to_merge.append(...) | ||
return merge_items(items_to_merge) | ||
... | ||
return merge_items(items_to_merge) | ||
|
||
This centralizes the definition of the merge_items() call. | ||
""" | ||
|
||
@abstractmethod | ||
def merge(self: Self, other: Self) -> Self: | ||
"""Return a new object that is the result of merging self with other.""" | ||
raise NotImplementedError | ||
|
||
@classmethod | ||
@abstractmethod | ||
def empty_instance(cls: Type[MergeableT]) -> MergeableT: | ||
"""Create an empty instance to handle merging of empty sequences of items. | ||
|
||
As merge_iterable() returns an empty instance for an empty iterable, there needs to be a way of creating one. | ||
""" | ||
raise NotImplementedError | ||
|
||
@classmethod | ||
def merge_iterable(cls: Type[MergeableT], items: Iterable[MergeableT]) -> MergeableT: | ||
"""Merge all items into a single instance. | ||
|
||
If an empty iterable has been passed in, this returns an empty instance. | ||
""" | ||
return functools.reduce(cls.merge, items, cls.empty_instance()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
from __future__ import annotations | ||
|
||
from dataclasses import dataclass, field | ||
from typing import List, Tuple | ||
|
||
from typing_extensions import override | ||
|
||
from metricflow.collections.merger import Mergeable | ||
|
||
|
||
@dataclass(frozen=True) | ||
class NumberTuple(Mergeable): # noqa: D | ||
numbers: Tuple[int, ...] = field(default_factory=tuple) | ||
|
||
@override | ||
def merge(self: NumberTuple, other: NumberTuple) -> NumberTuple: | ||
return NumberTuple(self.numbers + other.numbers) | ||
|
||
@override | ||
@classmethod | ||
def empty_instance(cls) -> NumberTuple: | ||
return NumberTuple() | ||
|
||
|
||
def test_merger() -> None: # noqa: D | ||
items_to_merge: List[NumberTuple] = [ | ||
NumberTuple(()), | ||
NumberTuple((1,)), | ||
NumberTuple((2, 3)), | ||
] | ||
|
||
assert NumberTuple.merge_iterable(items_to_merge) == NumberTuple((1, 2, 3)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we separating these? Is there ever a case where we want duplicates in this output?
If not, this feels like we're adding an old bug factory back to the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferring to make
dedupe()
a separate method as it makes theMergeable
interface more flexible. I disagree onadding an old bug factory back to the codebase
. If there is always de-duplication behavior, it could hide bugs in testing. e.g. the output of a merged result is checked in a test, and it's not expected for the sources of the to produce duplicate items. However, the de-duplication of the merged result hides that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two words (
if not
) in that bug factory sentence are extremely important. If we never want duplicates to come out of a merge, then any case where we have them is a bug, and requiring developers to manually call a dedupe method can only lead to bugs, while encapsulating that behavior can only prevent them. If there is an upstream bug that the encapsulation hides that's a separate problem, and it's one that having developers blindly calldedupe()
on their merge outputs (which is a thing we should expect to happen if our contributor base grows) will also mask.As it stands right now every merge call either explicitly requests deduped results or implicitly resolves to them, which suggests that we shouldn't force developers to call it in order to get the behavior they need.
Ultimately, this isn't a big deal, but it's interesting to me that the only extant cases where we don't call
dedupe()
are still cases where we need deduplicated results.