-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hcc::__activelaneid_u32() doesn't seem to work in HCC 2.0 #1011
Comments
FYI, there is |
Not quite. __lane_id has -1 as the mask to mbcnt. "Active_laneid()" has "exec_lo" and "exec_hi" as the mask.
The active[] array will be While laneid[] array will be In effect, laneid[] always returns the same number, regardless of the execution mask. Active_laneid() depends on the execution mask, so it can be used for reading/writing from a queue without the need of atomics (once combined with this functionality I requested).
The above code (when __builtin_amdgcn_ds_append is finally added to HCC), should add data efficiently to the tile_static queue. |
Right, my mistake. I didn't notice However, I'm not sure that your code with
So it should be like this: tile_static int queue_head = 0; // Yeah, I know that's not how it works. But for simplicity's sake...
tile_static int queue[5000];
... barrier
if( someComplexCondition()){ // We don't know the lane access pattern
const int p = __builtin_amdgcn_ds_append(&queue_head);
queue[p + active_laneid()] = fooBar();
} |
Not quite yet. Just pointing out why I'm interested in __activelaneid_u32(). I haven't tested (or written) any code like that yet, its just conceptual. Overall, I'm trying to figure out a cheap and easy pattern for load-balancing (or work-stealing? Work-sharing??) on a GPU. This queue thing seems like it could be a useful pattern to build off of. |
Thanks. We're looking into this. |
PR #1018 was merged last week. Closing this Issue. |
This was originally reported in the ROCm issue tracker. (ROCm/ROCm#688) They asked me to report it here instead.
Summary of your hardware: Threadripper 1950x + Vega64
PCIe Information: PCIe 3.0.
Here's my test code.
When I compile...
Version information:
For now, my code is using the following as a workaround.
The text was updated successfully, but these errors were encountered: