Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting is.Equal and the role of reflect.DeepEqual #53

Open
willbicks opened this issue Mar 11, 2023 · 11 comments
Open

Revisiting is.Equal and the role of reflect.DeepEqual #53

willbicks opened this issue Mar 11, 2023 · 11 comments

Comments

@willbicks
Copy link

willbicks commented Mar 11, 2023

I started switching from google/go-cmp to is in pursuit of the latter's more pleasant and succinct syntax, but unfortunately have run into issues with how times are compared by is.

In the code being tested, a time is created using time.Now(), stored to a database, and then later retrieved. When comparing the two times (or the structs containing time values), is.Equal reports that the times are different because one has a monotonic counter and the other does not. This differs from how the standard library time.Equal() functions, which is recommend by the Go team instead of directly comparing time values.

For example, consider the following excerpt where time1 has a monotonic counter, and time2 is the same time without a monotonic counter:

time1 := time.Now()
time2 := time1.Round(0) // returns time1 stripped of any monotonic clock reading but otherwise unchanged (see https://pkg.go.dev/time#Time.Round)
fmt.Printf("time1.Equal(time2) = %v\n", time1.Equal(time2))
is.Equal(time1, time2)

which results in the following:

time1.Equal(time2) = true
test.go:25: 2023-03-10 19:18:46.3692787 -0500 EST m=+0.083328801 != 2023-03-10 19:18:46.3692787 -0500 EST

Is there a better way to handle these comparisons in is? Some options I can imagine include:

  1. Switch to using time.Equal() if the values being compared are times.
  2. Adopting a model similar to google/go-cmp where all "Types that have an Equal method may use that method to determine equality".
  3. Use the tried-and-true google/go-cmp library itself to perform underlying equality comparisons.
@breml
Copy link
Contributor

breml commented Mar 12, 2023

#49 is somewhat related to this issue.

@flowchartsman
Copy link
Collaborator

flowchartsman commented Mar 12, 2023

Ah, yes, good ol' monotonic time. One of my least favorite stdlib hacks. As you've probably already discovered from your research, time equality is potentially tricker than it might at first appear, and other testing libraries have struggled with this and potential fixes and their implications.

Taking a look at the options:

  1. Switch to using time.Equal() if the values being compared are times.

Easy enough at the value level, but would break again when comparing structs that contain a time.Time, since we're kind of tied to reflect.DeepEqual as the ultimate fallback in v1.

  1. Adopting a model similar to google/go-cmp where all "Types that have an Equal method may use that method to determine equality".

This seems like a good compromise approach until you consider that it would essentially involve recreating reflect.DeepEqual in order to keep the behavior as similar as possible to the current system, and I'm not terribly keen on maintaining that. Also, depending on how conservative you are with the definition of "breaking change", even adding an exception for time.Time would likely break existing tests that have come to expect the status quo (i.e. the DarkPAN problem). Which means a version bump unless we very narrowly restrict it to time.Time and any other stdlib types that might have a similar problem. This would ideally only break tests that are expected to fail, which might be acceptable, but would still involve reimplementing portions of reflect.DeepEqual.

Personally, I'd lean towards a version bump even for this, as I am very averse to breaking even the smallest number of tests in the field, so at that point you might as well swing for the fences and consider:

  1. Use the tried-and-true google/go-cmp library itself to perform underlying equality comparisons.

This is probably the most robust, future-proof approach, since it leverages the work on go-cmp, while still providing the ergonomics of the is API. But this would be even more of a breaking change, since reflect.Deepequal (the current approach) inspects unexported fields, while go-cmp explicitly does not, and will panic when encountering unexported fields unless you:

a. ignore them outright, or
b. explicitly list each type that is allowed to be compared in this way.

Of these two, 3.a has the least impact to the API, since it can simply be set as the default and Bob's your uncle. Unfortunately, it's also less useful for testing, since, presumably, you are going to want to do deep equality checks on your own types, and aren't going to want to clutter up your exported API with a bunch of comparator methods just to satisfy unit tests 1.

Which forces us to consider how to handle AllowUnexported in some ergonomic way within the is API, since the "pleasant and succinct syntax" that brought you here is the raison d'être of the package in the first place. At this point, we might as well just start talking in earnest about what github.com/matryer/[email protected] looks like, and so we should probably open up a Github Discussion on this, since it pulls in other issues like #52.

I don't have the perms for this, but I'll ping @matryer, and we can talk about getting that ball rolling.

To provide @willbicks with a bit of direction in the meantime,time.Round(0) is the canonical solution, and probably a worthy, if sadly unnecessary pre-test step in the meantime.

Footnotes

  1. We can also take the stance that reflect.DeepEqual is a bad idea in the first place and that explicit Equality checks are the way to go, which is certainly something a package can do, but this is again a larger discussion, and would be an explicitly v2 stance.

@flowchartsman
Copy link
Collaborator

Of course, if there's another short-term approach I've overlooked, feel free to chime in with suggestions, but we probably still want a V2 discussion started.

@flowchartsman flowchartsman changed the title Comparing times fails with one monotonic counter Revisiting is.Equal and the role of reflect.DeepEqual Mar 12, 2023
@flowchartsman
Copy link
Collaborator

#49 (mentioned by @breml above) is also worth discussing, though it's basically the same as option 2 above, and I'd be more inclined to adopt the Equal() convention rather than adding a new method... I'll leave them both open for now until we've had a bit more discussion.

@breml
Copy link
Contributor

breml commented Mar 12, 2023

Maybe the implementation for the equality check could be made configurable. reflect.DeepEqual would stay the default in this case and those it would not break anything for existing users of is. But if there are special circumstances (e.g. with time.Time), the user is free to provide it's own equality check function.

This equality check function would then be called instead of areEqual.

I guess, the signature would be the one of reflect.DeepEqual, that is func(x, y any) bool. For the use with google/go-cmp, one would need to use a small wrapper like

equal := func(x, y any) bool {
	return cmp.Equal(x, y)
}

since the signatures are not the same.

In regards to #49, I am more than happy to follow the Equal() convention.

@matryer
Copy link
Owner

matryer commented Mar 13, 2023

This is a very interesting thread, so firstly, thanks for that :)

A very simple solution might just be to format the time and compare the strings. Then you (the engineer) can be explicit about what's being compared.

is.Equal(t1.Format(time.RFC3999), t2.Format(time.RFC3999))

How do you feel about that?

@flowchartsman
Copy link
Collaborator

flowchartsman commented Mar 13, 2023

A very simple solution might just be to format the time and compare the strings. Then you (the engineer) can be explicit about what's being compared.

This could potentially run afoul of timezone issues, but aside from that I think it's conceptually the same as the "just use time.Equal()" option above in that it would work for explicit time value comparison, but wouldn't handle the case of types that contain a time.Time, retaining the current edge case of using is.Equal() on structs.

@flowchartsman
Copy link
Collaborator

flowchartsman commented Mar 13, 2023

This is one of those cases where API simplicity comes at the cost of hidden complexity. Saying is.Equal(testStruct, actualStruct) is very straightforward and (usually) works well enough, but it's deceptively declarative, since it glosses over the complexities of dealing with fiddly little details like monotonic time, which upon test failure, might be frustrating for users since "it's just a simple assertion". Except when it's not.

The package documentation doesn't make any guarantees about deep equality, nor do the tests exercise it on structs, so it's not exactly like the package fails a design goal, but it also doesn't mention any potential issues, either, which makes struct equality undefined behavior. So, it's either a documentation issue solved by saying "you can do this, but sometimes it will break in thus-and-such case" (which is certainly valid, but feels unsatisfying), or it's a more invasive change.

One alternative to bumping the version and going all-in on go-cmp might be to see if there is a less painful way to detect just those edge cases which occur in the the stdlib, and trying and work around those before calling DeepEqual I can only really think of the time.Time problem offhand, so it might not be too bad to implement, but it would still involve crawling the struct and performing some normalizing side-effect like rounding time values, which might have its own issues.

@breml
Copy link
Contributor

breml commented Mar 13, 2023

One alternative to bumping the version and going all-in on go-cmp might be to see if there is a less painful way to detect just those edge cases which occur in the the stdlib, and trying and work around those before calling DeepEqual I can only really think of the time.Time problem offhand, so it might not be too bad to implement, but it would still involve crawling the struct and performing some normalizing side-effect like rounding time values, which might have its own issues.

To me, this sounds like a rabbit hole is should not jump into. So far, the implementation of is has been quite simple (by heavily relying on reflect.DeepEqual) and those stable over the course of many years. If handling of special cases sneak into is, the complexity will only raise and in the end it might still not fit all the special cases, that appear in the wild.

To me, this issue as well as #49 (and with a different flavor #52) indicate, that there is a need for some better control of equality. So far, we have seen the following options to address this:

  1. Post-process the returned values before the assertion. This is not always possible, e.g. because a private fields needs to be considered.
  2. Delegate the equality check to the type that is check (basically what is proposed in Proposal: value matcher interface in is.Equal  #49).
  3. Bring-your-own equal function (proposed in Revisiting is.Equal and the role of reflect.DeepEqual #53 (comment)), maybe combined with functional options, see 4.
  4. Integrate is with reflect.DeepEqual and go-cmp side-by-side (e.g. by introducing additional New functions, e.g. is.NewWithGoCmp(t) and is.NewRelaxedWithGoCmp(t)). An other approach would be to embrace functional options and add NewWith(t *testing.T, opts ...func(*I)). This approach would allow to extend NewWith without breaking backwards compatibility.

In regards to the V2 / version bump discussion, I just would like to raise the question, what would happen to the V1 in this case. Would these two version co-exist for a longer (possibly infinite) time or would V1 be declared deprecated?
While I see the point, that changing such a fundamental part of the implementation (changing how is.Equal works) justifies such a version bump, I would not do this easily and especially, I would not go to V2 with mainly this change. I feel, that for V2, other topics like "how does is work in a generic world" (see #52) would also needed to be addressed.

@matryer
Copy link
Owner

matryer commented May 3, 2023

@flowchartsman how are you feeling about this one now?

@flowchartsman
Copy link
Collaborator

flowchartsman commented May 9, 2023

I have thoughts, but no firm conclusions yet:

To me, this sounds like a rabbit hole is should not jump into. So far, the implementation of is has been quite simple (by heavily relying on reflect.DeepEqual)...

I guess I'd argue we're kind of in the rabbit hole already, if you're using "simple" and reflect.DeepEqual in the same sentence. :)

This is to say that "simple" and "ergonomic" often come with a tradeoff in hidden complexity. In the case of this library, we obscure the complexity of our own reflective code behind the idiom of is.Equal, which asserts that a and b are equal. But, of course that's not entirely true, which is why this issue exists.

What this really comes down to is how we address deep equality, and what this library wants to be. To me, at least, the appeal has always been right there in bullet point number 2

Beautifully simple API with everything you need: is.Equal, is.True, is.NoErr, and is.Fail

To me, at least "beautifully simple" means that is.Equal does what it says on the tin, and all I need is is.Equal, which will mostly do the right thing.

  1. Delegate the equality check to the type that is check[ed]...

This wouldn't solve the issue at hand, since time.Time has an Equal method that refers to itself. For your own types, you'd be able to write an Equals(v any), method to deal with fields, but that pushes type assertion onto the user, which is kind of unsatisfying. It also won't help you with unexported fields in types from other packages, which takes you right back to reflect.DeepEqual.

  1. Bring-your-own equal function...

Is either a big type switch on the part of the user or it might as well be option 4.

  1. Integrate is with reflect.DeepEqual and go-cmp side-by-side...

My gut feeling on this one is that it is not beautifully simple, and that additional functional options only clutter the API surface. If we want to use go-cmp, maybe we just use it as a fallback by default.

Option: Use go-cmp as a fallback to reflect.DeepEqual

This could be as simple as falling back to it if reflect.DeepEqual fails. It would be slower and still potentially breaks old code, but would be most likely to break things that are explicit failure tests. It might be more likely that people have put workarounds in place to avoid using is.Equal and these would be unaffected.

Option: Use reflection to look for self-referential Equal methods.

Another option is hinted at with by the following generic code1:

type Equaler[T any] interface {
	Equal(T) bool
}

func areEqual[T Equaler[T]](a T, b T) bool {
	return a.Equal(b)
}

For the reasons discussed in #52, this won't work as-is, but it could be done with reflection. This is essentially what go-cmp does, and would entail a modified deepEqual.

Option: A few special cases.

... If handling of special cases sneak into is, the complexity will only raise and in the end it might still not fit all the special cases, that appear in the wild.

I think we could get away with a small number of special cases, perhaps limited only to stdlib types that have caused pain with testing frameworks, like time.Time

Option: Do nothing for now.

It's also worth noting that there's an active proposal to include go-cmp in the standard library. As of just last week, it's a likely decline, but this isn't necessarily a bad thing, since a big part of the reason it hasn't gone anywhere is the consensus that it needs some simplification (which is also a case against using it ourselves). We might yet see a better reflect.Deepequal in the standard library, depending on how motivated people are.

Summary

I agree we can probably do better, and I'm leaning towards a conservative approach. I'll probably try out an implementation of either the special-case or the reflection-based T.Equal(T) method check. Ideally, I'd like to back this up with some data from repos that use the package, so I'll probably take a look at ways I can automate that to see if anything breaks..

Regarding V2

I agree with the points @breml made: any V2 should probably be written around generics, not a change to equality. I've already mentioned some of my thoughts on this in #52 (comment), but given the amount of change that's still ongoing in this space, I think we're probably still a ways off (though I do have a couple of sketches). I think we should enable Github Discussions for the repo and make a topic there as a place to share ideas, since that seems like the spot for it.

Footnotes

  1. full playground link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants