Proposal: `simd_broadcast` intrinsic #2031

cshenton · 2022-09-07T01:58:13Z

cshenton
Sep 7, 2022

Problem

It would be useful to have a compiler intrinsic for creating #simd vectors from scalar and other vector values in a way that guaranteed similar code gen to the equivalent intel/arm intrinsics like

For example, initialising a 4-wide single precision float vector with a scalar float value has the following C intrinsics:

// Intel
__m128 _mm_set1_ps (float a)

// ARM
float32x4_t vdupq_n_f32 (float32_t a)

Broadcasting the lower two elements of a 4-wide float vector to an 8 wide float vector.

// Intel
__m256 _mm256_broadcast_f32x2 (__m128 a)

// ARM
// No equivalent

Potential solution

Odin currently has a swizzle intrinsic, but this only supports creating vectors of the same length or shorter, the following calls do not compile:

small_arr := #simd[2]f64{1.0, 2.0}
scalar_arr := #simd[1]f64{1.0}
scalar := 2.0

x: #simd[4]f64 = swizzle(small_vec, 0, 1, 0, 1)
y: #simd[4]f64 = swizzle(scalar_vec, 0, 0, 0, 0)
z: #simd[4]f64 = swizzle(scalar, 0, 0, 0, 0)

This largely makes sense and matches up with swizzle behaviour in shader langs.

I propose a new intrinsic, simd_broadcast for creating simd vectors larger that their inputs, which provides guarantees about generating similar code to the Intel/ARM intrinsics in C. To be complementary with swizzle it should fail to compile unless the output vector is larger than the input

// With annotated types
x : #simd[4]f64 = simd_broadcast(small_arr , 0, 1, 0, 1)
y : #simd[4]f64 = simd_broadcast(scalar_arr, 0, 0, 0, 0)
z : #simd[4]f64 = simd_broadcast(scalar, 0, 0, 0, 0)

// Without annotated types
x := simd_broadcast(small_arr , 0, 1, 0, 1)
y := simd_broadcast(scalar_arr, 0, 0, 0, 0)
z := simd_broadcast(scalar, 0, 0, 0, 0)

// These calls should error
simd_broadcast(scalar, 0)
simd_broadcast(scalar_arr, 0)
simd_broadcast(small_arr, 1, 0)

The intrinsic could be broadcast, but I'm not proposing this operation work on or produce regular arrays like swizzle does, so the simd_ prefix (and an alias in core:simd as simd.broadcast) is what I'm proposing to avoid any ambiguity.

import "core:simd"

// ...

// Using alias
x := simd.broadcast(small_arr , 0, 1, 0, 1)
y := simd.broadcast(scalar_arr, 0, 0, 0, 0)
z := simd.broadcast(scalar, 0, 0, 0, 0)

Problems with solution

Might be preferable to just #simd[4]T{s, s, s, s} and document code gen guarantees for scalar values
Broadcast intrinsics vary from platform to platform. Including this might communicate to the user that certain code gen will happen when it may not
simd_broadcast(x, 0, 0, 0, 0) is more characters than _mm_set1_ps(x), if being more terse than C is a goal this might need rethinking

cshenton · 2022-09-07T02:09:47Z

cshenton
Sep 7, 2022
Author

An alternative, to address just the scalar broadcast case, could be to simply extend the existing simd intrinsic operations to accept scalar inputs and broadcast them for you. So the following would be allowed:

import "core:simd"

a := 2.5
b := 3.5
vec := #simd[4]f64{1.0, 2.0, 3.0, 4.0}

x := a * vec // {2.5, 5, 7.5, 10}
y := simd.min(a, vec) // {1, 2, 2.5, 2.5}
z := simd.clamp(vec, a, b) // {2.5, 2.5, 3.0, 3.5}

1 reply

cshenton Sep 7, 2022
Author

This has the downside of putting a higher requirement on the compiler to de-duplicate those broadcast operations. Since we'd only want to braodcast a and b to simd vectors once in the above code, not at each callsite.

cshenton · 2022-09-08T00:19:09Z

cshenton
Sep 8, 2022
Author

Bill has kindly pointed out to me on the discord a shorthand for scalar broadcast currently available in the language:

v4f32 :: #simd[4]f32
x := v4f32{0..<4 = 3}
y := v4f32(3)

I'd add that, since core:simd declares type aliases for most small vector simd types, using those is also an option:

import "core:simd"

y := simd.f32x4(3)

Funnily enough that largely covers my needs and is preferably to my suggested broadcast syntax.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: `simd_broadcast` intrinsic #2031

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Proposal: simd_broadcast intrinsic #2031

cshenton Sep 7, 2022

Problem

Potential solution

Problems with solution

Replies: 2 comments · 1 reply

cshenton Sep 7, 2022 Author

cshenton Sep 7, 2022 Author

cshenton Sep 8, 2022 Author

Proposal: `simd_broadcast` intrinsic #2031

cshenton
Sep 7, 2022

Replies: 2 comments 1 reply

cshenton
Sep 7, 2022
Author

cshenton Sep 7, 2022
Author

cshenton
Sep 8, 2022
Author