You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 27, 2024. It is now read-only.
Since this is a minor feature request, I don't believe it to require a RFC. If the team thinks a formal RFC would be best, just let me know and I'll do the "pull request" dance.
Summary
Add "DS_Consume" and "DS_Append" intrinsics to HCC.
Motivation
DS_Consume and DS_Append can be used to implement highly efficient, compact queues to LDS within a wavefront. Support for these functions seems to exist as early as GCN 1.0.
Detailed design
According to the GCN ISA, DS_Append increment an LDS variable by the popcount of the execution mask. For example, if 40 threads are active, DS_Append would increment the location by +=40. DS_Consume is the inverse, it would decrement the location by the population count of the execution mask.
HCC already implements a number of intrinsics, such as __amdgcn_ds_bpermute. Following the convention, the functions would look something like this:
int __amdgcn_ds_append(tile_static int& val);
int __amdgcn_ds_consume(tile_static int& val);
The return value is the pre-operation value, as per the ISA.
Drawbacks
DS_Consume and DS_Append are somewhat obscure functions of the hardware. I'm not sure if many people would be aware of how to use the functions.
Alternatives
The functions could take a pointer instead, like this:
int __amdgcn_ds_append(tile_static int* val);
The pointer is more C-like, while the reference would be C++ like code.
Unresolved questions
These functions also can be used with GDS memory, but I don't know how GDS memory works.
The text was updated successfully, but these errors were encountered:
Support was just committed to LLVM trunk. The committed clang builtins are:
int __builtin_amdgcn_ds_append(__local int *v);
int __builtin_amdgcn_ds_append(__local int *v);
(_local here means address space 3).
I think a tile_static int * (or shared int *) argument will work, but an explicit cast to (attribute((address_space(3))) int *) is possible.
It might be best to consider this function similar to a relaxed atomic add. However, one significant difference from atomic add is that for all active lanes in the wavefront executing the call, the argument must be the same. If this is known not to be true, then explicit logic is necessary, e.g.
if (this_lane_is_accessing_var_1)
... _builtin_amdgcn_ds_append(var_1_ptr) ...
else
... __builtin_amdgcn_ds_append(var_2_ptr) ...
Finally, please note that while this has been checked in, it is not guaranteed to appear in the next release. It should be appearing relatively soon though.
Those restrictions make sense, given how the assembly statement works. It doesn't seem like HCC has a way to represent ISPC "uniform" variables. So just checking for that at compile time / assembly time is the best bet for now.
__local int * uniform probably would be the type in ISPC, a Uniform pointer to an int in the __local address space.
Anyway, I'll await the release of the feature. Thanks for accepting my request! In the meantime, I can use inline-assembly to access the functionality.
Since this is a minor feature request, I don't believe it to require a RFC. If the team thinks a formal RFC would be best, just let me know and I'll do the "pull request" dance.
Summary
Add "DS_Consume" and "DS_Append" intrinsics to HCC.
Motivation
DS_Consume and DS_Append can be used to implement highly efficient, compact queues to LDS within a wavefront. Support for these functions seems to exist as early as GCN 1.0.
Detailed design
According to the GCN ISA, DS_Append increment an LDS variable by the popcount of the execution mask. For example, if 40 threads are active, DS_Append would increment the location by +=40. DS_Consume is the inverse, it would decrement the location by the population count of the execution mask.
HCC already implements a number of intrinsics, such as __amdgcn_ds_bpermute. Following the convention, the functions would look something like this:
The return value is the pre-operation value, as per the ISA.
Drawbacks
DS_Consume and DS_Append are somewhat obscure functions of the hardware. I'm not sure if many people would be aware of how to use the functions.
Alternatives
The functions could take a pointer instead, like this:
int __amdgcn_ds_append(tile_static int* val);
The pointer is more C-like, while the reference would be C++ like code.
Unresolved questions
These functions also can be used with GDS memory, but I don't know how GDS memory works.
The text was updated successfully, but these errors were encountered: