Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reductions: fix text for associative binary ops #12

Closed

Conversation

davidozog
Copy link
Owner

@davidozog davidozog commented Aug 30, 2024

Summary of changes

This attempts to correct the language around "associativity" of binary operations in OpenSHMEM reductions.

The proposed changelog entry is here:
kwaters4#8

I believe this is an errata, so that's included there as well.

Proposal Checklist

  • Link to issue(s)
  • Changelog entry
  • Reviewed for changes to front matter
  • Reviewed for changes to back matter

Copy link
Collaborator

@maawad maawad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor typo but otherwise look good.

content/shmem_reductions.tex Outdated Show resolved Hide resolved
content/shmem_reductions.tex Outdated Show resolved Hide resolved
davidozog and others added 2 commits August 30, 2024 15:35
Copy link
Collaborator

@isubsmith isubsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have as much familiarity with our stance on associativity. I think this text change is basically saying the ops are not necessarily associative because they really can't be with FP. I think we want to make sure run-to-run results are the same though right? even if those results are maybe technically ""wrong"", at least they are the same. Maybe we should put explicit text to that effect if that's the case?

Otherwise minor an binary -> a binary in two places.

@davidozog
Copy link
Owner Author

I don't have as much familiarity with our stance on associativity. I think this text change is basically saying the ops are not necessarily associative because they really can't be with FP. I think we want to make sure run-to-run results are the same though right? even if those results are maybe technically ""wrong"", at least they are the same. Maybe we should put explicit text to that effect if that's the case?

I think this change is saying that results of FP reductions might have small rounding errors so they may never be guaranteed to be exactly the same run-to-run. But as an example, users could verify/validate FP reduction results by comparing the difference and if greater than some relevant threshold (say something like a 1e-6 for float or even a tighter "machine epsilon"). Maybe we need more clarity regarding this...

Looping in @jdinan and @BKP since they were interested and might have thoughts on how to explain it better.

@maawad
Copy link
Collaborator

maawad commented Aug 30, 2024

I don't have as much familiarity with our stance on associativity. I think this text change is basically saying the ops are not necessarily associative because they really can't be with FP. I think we want to make sure run-to-run results are the same though right? even if those results are maybe technically ""wrong"", at least they are the same. Maybe we should put explicit text to that effect if that's the case?

Otherwise minor an binary -> a binary in two places.

FWIW: The C++ Standard text equivalent to this is:

[Note 1: The difference between reduce and accumulate is that reduce applies binary_op in an unspecified order, which yields a nondeterministic result for non-associative or non-commutative binary_op such as floating-point addition. — end note]

https://eel.is/c++draft/reduce#9

@isubsmith
Copy link
Collaborator

I think this change is saying that results of FP reductions might have small rounding errors so they may never be guaranteed to be exactly the same run-to-run. But as an example, users could verify/validate FP reduction results by comparing the difference and if greater than some relevant threshold (say something like a 1e-6 for float or even a tighter "machine epsilon"). Maybe we need more clarity regarding this...

I think you can get repeatable results as long as you do something to guarantee the operations are always done in the same order. Maybe that's not possible to enforce in shmem reductions though, hence the language here. I'm certainly fine with whatever the group thoughts are, I just know we got bit by this at IBM long ago (we had a broadcast-based allreduce algorithm that was really fast, but it failed Livermore verification checks because you'd do the operations in the random order you got packets. Results would differ by something like 10^-15 but the standard allreduce that enforced a given order would give the exact same results to whatever precision they were looking for every time)

@isubsmith
Copy link
Collaborator

FWIW: The C++ Standard text equivalent to this is:

[Note 1: The difference between reduce and accumulate is that reduce applies binary_op in an unspecified order, which yields a nondeterministic result for non-associative or non-commutative binary_op such as floating-point addition. — end note]

https://eel.is/c++draft/reduce#9

So is that saying that accumulate is repeatable then?

@davidozog
Copy link
Owner Author

FWIW: The C++ Standard text equivalent to this is:

[Note 1: The difference between reduce and accumulate is that reduce applies binary_op in an unspecified order, which yields a nondeterministic result for non-associative or non-commutative binary_op such as floating-point addition. — end note]

https://eel.is/c++draft/reduce#9

I like that - I didn't really say anything like "[reductions apply binary ops] in an unspecified order" which is also applicable to OpenSHMEM I think.

Fun side question: Can we re-use this text? The ISO C++ standard doesn't have a clear re-use/derived-works allowed license to my eye, so I'm guessing not?

@maawad
Copy link
Collaborator

maawad commented Aug 30, 2024

FWIW: The C++ Standard text equivalent to this is:

[Note 1: The difference between reduce and accumulate is that reduce applies binary_op in an unspecified order, which yields a nondeterministic result for non-associative or non-commutative binary_op such as floating-point addition. — end note]

eel.is/c++draft/reduce#9

So is that saying that accumulate is repeatable then?

yes, it is like what you said in your previous comment.
https://eel.is/c++draft/accumulate#2

@davidozog
Copy link
Owner Author

FWIW: The C++ Standard text equivalent to this is:

[Note 1: The difference between reduce and accumulate is that reduce applies binary_op in an unspecified order, which yields a nondeterministic result for non-associative or non-commutative binary_op such as floating-point addition. — end note]

eel.is/c++draft/reduce#9

So is that saying that accumulate is repeatable then?

yes, it is like what you said in your previous comment. https://eel.is/c++draft/accumulate#2

Huh, accumulate defines a specific ordering: "first to last" whereas reduce does not. I don't think OpenSHMEM can get away with a well-defined ordering for reductions myself... but maybe?

Results would differ by something like 10^-15

My hunch is 1e-15 is pretty "large" for an epsilon - 64 bit machine epsilon is ~2.2e-16. I don't have enough experience to know what "worst case" is though, say with wildly varying element magnitudes and orderings... Maybe we should look into this before committing to the correction.

I'm curious to what extent this affects your use-cases @kwaters4?

@kwaters4
Copy link
Collaborator

kwaters4 commented Aug 30, 2024

The biggest issue that we have run across is how to deal with NaNs, similar to the following issue #467. Currently with some implementations these reductions will not handle this in a uniform manner and some PEs will get NaN, while other do not. Some implementations handle it well so all PEs get the value at the end of the operation. Defining the epsilon is not much of a concern, NaN is the only factor. If this were defined in the specification then it would force implementers to define the behavior around 'NaNs'

Today we have a higher level API that aborts a job whenever NaN is detected on any PE following a reduction operation. This is not the best scenario, but the pragmatic approach we take today.

I should have followed up on this issue earlier, but was unable to. I would like to push it into the upcoming specification, but need some help articulating the behavior.

content/shmem_reductions.tex Outdated Show resolved Hide resolved
@davidozog davidozog force-pushed the pr/associativity_of_reductions branch from b6cfcc6 to 3d53054 Compare September 17, 2024 15:04
identical to the \dest{} arrays produced by all other \acp{PE} in the team.

The result of any given \openshmem reduction is reproducible across job
execution environments.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we improve/clarify this? "job execution environments"

@davidozog
Copy link
Owner Author

Closing this in favor of a PR targeting RC2

@davidozog davidozog closed this Oct 3, 2024
@davidozog davidozog mentioned this pull request Oct 3, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants