Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify that clip() behavior is undefined when min or max is outside the bounds of x #814

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

asmeurer
Copy link
Member

As discussed in today's consortium meeting.

See the discussion at numpy/numpy#24976.

…the bounds of x

As discussed in today's consortium meeting.

See the discussion at numpy/numpy#24976.
@kgryte
Copy link
Contributor

kgryte commented Sep 19, 2024

@asmeurer You opened this PR, but, based on your comments in numpy/numpy#24976 (comment), it is not clear whether you still support making this change. Are you okay with this PR moving forward, assuming that the current standardized behavior remains the same (i.e., no type promotion)?

@kgryte kgryte added this to the v2024 milestone Sep 19, 2024
@kgryte kgryte added Narrative Content Narrative documentation content. topic: Type Promotion Type promotion. Needs Discussion Needs further discussion. labels Sep 19, 2024
@asmeurer
Copy link
Member Author

Yes, I think this text is especially important given that there is no promotion. If there were promotion this text would serve no purpose, because x would just promote to the common data type and there would be no "out-of-bounds" values.

@fancidev
Copy link

fancidev commented Oct 22, 2024

I find another corner case that seems not elaborated by the spec. Suppose x is float32, min and max are float64, min < max, but min and max are so close together that they are between two adjacent float32 numbers. Then there is no float32 number whose value is between min and max. The spec doesn’t specify the behavior in this case.

Maybe this and the case min > max could be combined into one rule:

”If there exists no element in the data type of x whose value is greater than or equal to min and less than or equal to max, the behavior is undefined.”


I’d also like to take the chance to try clarify the precise meaning of “clamp … x to the range [min, max]”, which seems a bit vague to me. Something like the following might be helpful:

”If the value of min and max are exactly representable by the data type of x and min <= max, then clip(x, min, max) has the same value as maximum(minimum(x, max), min) and the same type as x.”

If this definition is deemed undesirable because some implementation imposed a stricter requirement, the following more strict version might help:

”If min and max have the same data type as x and min <= max, then clip(x, min, max) is equivalent to maximum(minimum(x, max), min). Otherwise, the behavior is implementation defined.”


Btw, the parameter doc of min and max seems to contain a typo: x1 should be x if I read correctly.

@asmeurer
Copy link
Member Author

The more general issue with a min/max float64 is that rounding might work against you when downcasting to float32. This does not require min and max to be near each other, but rather just near x. See numpy/numpy#24976 (comment). (this is why I personally believe the non-promoting rule in clip was a bad idea. This sort of thing is exactly why type promotion behavior exists)

”If min and max have the same data type as x and min <= max, then clip(x, min, max) is equivalent to maximum(minimum(x, max), min). Otherwise, the behavior is implementation defined.”

Only defining clip for the case where x, min, and max have the same data type would be a significant change from the existing text, and probably not what people would want.

@fancidev
Copy link

fancidev commented Oct 22, 2024

The referenced numpy issue is an interesting one. It shows that Array API conformant semantics cannot be implemented by promotion followed by downcasting.

A general definition of clip that is consistent with the currently proposed semantics could be the following:

Let T be the data type of x. If there exists a value in T that is greater than or equal to min and less than or equal to max, then clip(x, min, max) returns (1) the smallest element in T that is greater than or equal to min if x < min, (2) the largest element in T that is less than or equal to max if x > max, or (3) x otherwise.

In math notation, this becomes

$$ \mathrm{clip}(x,a,b):= {\arg\min}_{ t \in T\cap [ a , b]} | t-x | $$

This definition consistently accounts for any mixture of data type. It is certainly implementable, but since some implementations already impose a stricter requirement on the input types, it may not be suitable for a standard. That’s why I proposed two more restrictive versions in the previous comment, one assuming min and max are representable in T, and one assuming min and max have the type T.

@kgryte
Copy link
Contributor

kgryte commented Oct 31, 2024

this is why I personally believe the non-promoting rule in clip was a bad idea. This sort of thing is exactly why type promotion behavior exists

As a general principle, yes, but clip doesn't fall neatly into the same category as, e.g., add, mul, etc, where type promotion makes more intuitive sense.

If we wanted to allow type promotion wiggle room for some libraries while also allowing other libraries to make the reasonable choice of raising an exception for undefined behavior, we could limit portable clip behavior to only the scenario in which x, min, and max all have the same data type. We'd need to tweak the language of the spec a bit to open the door for implementing libraries to implement non-portable behavior without running afoul of the standard.

Were we to circumscribe portable behavior, this would require downstream libraries building on the Array API to be explicit about what behavior they want (e.g., by wrapping x in astype or min and max in astype). E.g., if they always want type promotion semantics, then use result_type to determine the desired output array dtype and then use astype on x, min, and max. If they want min and max to always be casted to the dtype of x (either upcast or downcast), then use astype on min and max. Et cetera.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Narrative Content Narrative documentation content. Needs Discussion Needs further discussion. topic: Type Promotion Type promotion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants