Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reductions and NaN values #467

Open
nspark opened this issue May 6, 2021 · 4 comments
Open

Reductions and NaN values #467

nspark opened this issue May 6, 2021 · 4 comments
Assignees
Milestone

Comments

@nspark
Copy link
Contributor

nspark commented May 6, 2021

Description

Currently, the Specification does not specify the handling of NaN values in reductions over floating types.

Per C18 §7.12.14-1, a NaN value is unordered with respect to a numeric value or another NaN. For example, it is not clear what the result of shmem_double_max_reduce or shmem_double_max_to_all should be in the presence of NaN values.

In C, NaN values can be initially unintuitive; for example:

#include <math.h>
#define MAX(a, b) ((a) > (b) ? (a) : (b))

MAX(1.0, NAN) == NAN;
MAX(NAN, 1.0) == 1.0;

C provides fmax and fmin to handle these situations gracefully:

#include <math.h>
fmax(1.0, NAN) == 1.0;
fmax(NAN, 1.0) == 1.0;
fmax(NAN, NAN) == NAN;

In the tests I've performed on OpenSHMEM implementations readily accessible to me (certainly not all that exist), none handle min/max reductions correctly for NaN values. They did seem to handle sum reductions correctly.

Suggestions

  • MAX and MIN reductions with semantic equivalence to C's fmax and fmin functions.
    • Thus, such reductions will effectively drop NaN values unless all input values are NaN, in which case the result should be NaN.
  • SUM and PROD reductions should produce a NaN value if any input value is NaN.

Considerations

  • Some OpenSHMEM implementations leverage hardware-accelerated reductions. I am not aware of the state of NaN handling for such hardware.
@nspark
Copy link
Contributor Author

nspark commented May 6, 2021

I expected MPI might have plenty to say about handling NaN values, but all I see is the following:

According to IEEE specifications, the “NaN” (not a number) is system dependent. It should not be interpreted within MPI as anything other than “NaN.”

Advice to implementors. The MPI treatment of “NaN” is similar to the approach used in XDR (see ftp://ds.internic.net/rfc/rfc1832.txt). (End of advice to implementors.)

@nspark
Copy link
Contributor Author

nspark commented May 24, 2021

To implementors/vendors: Are there performance concerns if the result (e.g., of a MAX reduction where some entries are NaN values) is implementation defined but required to be single-valued? That is, all PEs would be expected to return the same value. (This result is not currently the case on all implementations.)

@jdinan
Copy link
Collaborator

jdinan commented Jun 15, 2021

Not for MAX, but for an arithmetic operation like SUM, there could be an associativity requirement in order to ensure that all PEs get identical results.

@nspark
Copy link
Contributor Author

nspark commented Jun 25, 2021

Some concerns that have been raised (off issue, obviously) are that "proper" NaN behavior may require an additional collective (to test for NaNs), which is not desirable.

Again, in my (limited) testing, implementations handled the sum-reduction properly in the face of NaN values. The max (and, presumably, min) reductions were what were not necessarily returning the same value on all PEs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants