You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
IEEE 754-2019 Min/Max for BF16/F16/F32/F64 vectors
BF16/F16/F32/F64 MinMagnitude (equivalent to IfThenElse(Lt(Abs(a), Abs(b)), a, b) if both a[i] and b[i] are non-NaN)
BF16/F16/F32/F64 MaxMagnitude (equivalent to IfThenElse(Lt(Abs(a), Abs(b)), b, a) if both a[i] and b[i] are non-NaN)
F16/BF16/F32->I8/U8 DemoteTo (there is already a use case for F16->I8/U8 DemoteTo in the implementation of I8/U8 Div on AVX3_SPR/AVX10_2/NEON_BF16)
F32->F16 OrderedDemote2To
New floating-point to integer PromoteTo/ConvertTo/DemoteTo instructions that saturate out-of-range non-NaN values to be within the range of the target integer type and convert NaNs to 0
F16->F32 WidenMulPairwiseAdd
U16xU16->U32 WidenMulPairwiseAdd/SatWidenMulPairwiseAccumulate/ReorderWidenMulAccumulate (originally introduced in AVX-VNNI-INT16, but extended to include 512-bit vectors on AVX10.2 CPU's that support 512-bit vectors)
I8xI8->I32 and U8xU8->I32 SumOfMulQuadAccumulate (originally introduced in AVX-VNNI-INT8, but extended to include 512-bit vectors on AVX10.2 CPU's that support 512-bit vectors)
GCC 15 and Clang 20, which are currently under development and scheduled to be released in Spring 2025, will have support for the new AVX10.2 intrinsics.
The new _mm*_cvttsp[h,s,d]_epi* intrinsics available on AVX10.2 should also fix the undefined behavior that is there with the conversion of out-of-range floating-point vectors to integer vectors with GCC (and this issue was described at #2183).
Also need to move some of the ops for 256-bit or smaller vectors that are currently implemented in the hwy/ops/x86_512-inl.h header on AVX3 targets into a separate header as support for 512-bit vectors is optional on AVX10.2.
The text was updated successfully, but these errors were encountered:
Thanks for starting the discussion! Looks like GNR has also just been introduced/launched, but that supports 10.1, I think.
Min/MaxNumber (Min with proper NaN handling per IEEE754:2019) and Min/MaxMagnitude look useful, as does F16 WidenMulPairwiseAdd. Would be very happy to see those added :)
I don't see a burning need for bf16 ops. This target is AFAIK the only platform that has them, and just about the only demand I see for bf16 is mul/add, which is mostly covered by the existing WidenMul.
I agree we'd want to split the "AVX3" and "512-bit" aspects of x86_512-inl.h.
How about I make a TODO for around 2025-03 to lay the groundwork by creating the HWY_AVX10_2 (or HWY_AVX102?) target/boilerplate? Would you later like to add some of its functionality?
The upcoming Intel AVX10.2 instruction set (which is described in the specification that can be found at https://www.intel.com/content/www/us/en/content-details/828965/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html) adds the following operations:
IfThenElse(Lt(Abs(a), Abs(b)), a, b)
if botha[i]
andb[i]
are non-NaN)IfThenElse(Lt(Abs(a), Abs(b)), b, a)
if botha[i]
andb[i]
are non-NaN)GCC 15 and Clang 20, which are currently under development and scheduled to be released in Spring 2025, will have support for the new AVX10.2 intrinsics.
The new _mm*_cvttsp[h,s,d]_epi* intrinsics available on AVX10.2 should also fix the undefined behavior that is there with the conversion of out-of-range floating-point vectors to integer vectors with GCC (and this issue was described at #2183).
Also need to move some of the ops for 256-bit or smaller vectors that are currently implemented in the hwy/ops/x86_512-inl.h header on AVX3 targets into a separate header as support for 512-bit vectors is optional on AVX10.2.
The text was updated successfully, but these errors were encountered: