-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Mergeable
to Simplify Merging Collections
#790
Conversation
Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide. |
Mergeable
/ MergeBuilder
.Mergeable
/ MergeBuilder
to Simplify Merging Collections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a vacuum where all implementation choices are reasonable and correct this all seems fine. However, we don't operate in such a vacuum, and there are a lot of "but WHY is it like this?" implementation choices in here. Every single one of them is preceded by a #noqa: D
override on docstrings, and every commit message is completely devoid of the crucial "why" for each change.
Please write docstrings explaining the context behind these decisions, and then I'll be able to evaluate this better. I'm sure it's fine, but I also know that I'm going to screw up using this and when I go through the code (and commit history) I'll be quite irritated to find that nothing has been explained anywhere.
metricflow/specs/merge_builder.py
Outdated
from abc import ABC, abstractmethod | ||
from typing import Generic, Iterable, TypeVar | ||
|
||
SelfTypeT = TypeVar("SelfTypeT", bound="Mergeable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing_extensions import Self
is a thing. Let's use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize that was a thing - updated.
metricflow/specs/merge_builder.py
Outdated
self._accumulator = initial_item | ||
|
||
def add(self, other: MergeableT) -> None: # noqa: D | ||
if self._accumulator is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this? Can the generic actually be Optional[MergeableThing]
? I don't think it can, with a bound of "Mergeable" here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran into some issues there, but this is no longer needed as this class was removed.
metricflow/specs/merge_builder.py
Outdated
|
||
self._accumulator = self._accumulator.merge(other) | ||
|
||
def add_all(self, *others: MergeableT) -> None: # noqa: D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I realize this is my not-Python developer side showing, but I prefer others: Sequence[...]
to *others
.
Given add_iterable is a thing, do we really need this method? If they can't be combined, some docstrings with explanations providing that context would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer relevant as this class was removed.
metricflow/specs/merge_builder.py
Outdated
return builder.build_result | ||
|
||
@property | ||
def build_result(self) -> MergeableT: # noqa: D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like an action rather than a property. Maybe merged_result
or merged_output
would be better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No longer relevant as this class was removed.
|
||
|
||
def test_merge_iterable() -> None: # noqa: D | ||
assert MergeBuilder.merge_iterable(NumberList(), [NumberList([1, 2, 3])]) == NumberList([1, 2, 3]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want a test that merges a non-empty thing as well, because return other_iterable.next()
passes this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test with a non-empty item, but not sure what you mean by return other_iterable.next()
entity_specs=self.entity_specs + other.entity_specs, | ||
) | ||
|
||
def dedupe(self) -> LinkableSpecSet: # noqa: D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we separating these? Is there ever a case where we want duplicates in this output?
If not, this feels like we're adding an old bug factory back to the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferring to make dedupe()
a separate method as it makes the Mergeable
interface more flexible. I disagree on adding an old bug factory back to the codebase
. If there is always de-duplication behavior, it could hide bugs in testing. e.g. the output of a merged result is checked in a test, and it's not expected for the sources of the to produce duplicate items. However, the de-duplication of the merged result hides that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two words (if not
) in that bug factory sentence are extremely important. If we never want duplicates to come out of a merge, then any case where we have them is a bug, and requiring developers to manually call a dedupe method can only lead to bugs, while encapsulating that behavior can only prevent them. If there is an upstream bug that the encapsulation hides that's a separate problem, and it's one that having developers blindly call dedupe()
on their merge outputs (which is a thing we should expect to happen if our contributor base grows) will also mask.
As it stands right now every merge call either explicitly requests deduped results or implicitly resolves to them, which suggests that we shouldn't force developers to call it in order to get the behavior they need.
Ultimately, this isn't a big deal, but it's interesting to me that the only extant cases where we don't call dedupe()
are still cases where we need deduplicated results.
metricflow/specs/specs.py
Outdated
@@ -594,6 +594,37 @@ def linkable_specs(self) -> LinkableSpecSet: # noqa: D | |||
entity_specs=self.entity_specs, | |||
) | |||
|
|||
@override | |||
def merge(self, other: MetricFlowQuerySpec) -> MetricFlowQuerySpec: # noqa: D | |||
# Can't merge if both properties are set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, why can we merge when these don't match at all?
Can you please add a docstring explaining this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context on the original reason for this change was that queries for derived metrics could be thought of as a composition of queries on separate metrics during query resolution. However, this is no longer needed due to a different approach that was taken in a later revision.
26d1c26
to
7352a38
Compare
Mergeable
/ MergeBuilder
to Simplify Merging CollectionsMergeable
to Simplify Merging Collections
7352a38
to
2799e2f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the commit messages and docstrings, and found a fix for type issues to allow removal of the MergeBuilder
and just have the Mergeable
interface.
2799e2f
to
809c5c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart from that last commit and the hanging If in the docstring.
You built the thing so you get to choose whether to opt for more flexibility here.
entity_specs=self.entity_specs + other.entity_specs, | ||
) | ||
|
||
def dedupe(self) -> LinkableSpecSet: # noqa: D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first two words (if not
) in that bug factory sentence are extremely important. If we never want duplicates to come out of a merge, then any case where we have them is a bug, and requiring developers to manually call a dedupe method can only lead to bugs, while encapsulating that behavior can only prevent them. If there is an upstream bug that the encapsulation hides that's a separate problem, and it's one that having developers blindly call dedupe()
on their merge outputs (which is a thing we should expect to happen if our contributor base grows) will also mask.
As it stands right now every merge call either explicitly requests deduped results or implicitly resolves to them, which suggests that we shouldn't force developers to call it in order to get the behavior they need.
Ultimately, this isn't a big deal, but it's interesting to me that the only extant cases where we don't call dedupe()
are still cases where we need deduplicated results.
mypy.ini
Outdated
|
||
# Added to resolve an issue with mypy in CI: | ||
# ... python3.8/site-packages/referencing/jsonschema.py:426: error: Invalid "type: ignore" comment [syntax] | ||
[mypy-referencing.jsonschema] | ||
follow_imports = skip |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed by updating mypy, please revert this commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
metricflow/collections/merger.py
Outdated
def merge_iterable(cls: Type[MergeableT], items: Iterable[MergeableT]) -> MergeableT: | ||
"""Merge all items into a single instance. | ||
|
||
If |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If what? 😛
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Merging objects are frequently needed in MetricFlow as there are several recursive operations where the output is the superset of the result of the recursive calls. Please see the docstring for Mergeable.
InstanceSpecSet already has a merge() call, so this updates the class to imlement the Mergeable interface for consistency.
LinkableSpecSet already has a merge() call, so this updates the class to imlement the Mergeable interface for consistency.
809c5c6
to
086cee4
Compare
Description
This PR adds the utility class
MergeBuilder
that allows you to combine collections that implementsMergeable
. This is useful as there are a few spec-related classes that are collections, and there are upcoming cases where they need to be merged. This replaces the collection of merge-related methods in the spec set classes.This PR also includes a minor update to the signature of
InstanceSpec
to make it more flexible.