[Feature Request]: atomicAdd() to support half2 #3573

ZJLi2013 · 2024-08-14T01:40:17Z

Suggestion Description

hi, hip team,

here is cuda version,

void atomic_add_gmem_h2(half2* addr, half2 in) {
	atomicAdd(addr, in);
}

looks there's non hip alternative yet, if built with hipcc, it gives:

/opt/rocm-6.1.3/include/hip/amd_detail/amd_hip_atomic.h:216:5: note: candidate function not viable: no known conversion from 'half2 *' (aka '__half2 *') to 'int *' for 1st argument
  216 | int atomicAdd(int* address, int val) {
      |     ^         ~~~~~~~~~~~~
/opt/rocm-6.1.3/include/hip/amd_detail/amd_hip_atomic.h:228:14: note: candidate function not viable: no known conversion from 'half2 *' (aka '__half2 *') to 'unsigned int *' for 1st argument
  228 | unsigned int atomicAdd(unsigned int* address, unsigned int val) {
      |              ^         ~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-6.1.3/include/hip/amd_detail/amd_hip_atomic.h:240:15: note: candidate function not viable: no known conversion from 'half2 *' (aka '__half2 *') to 'unsigned long *' for 1st argument
  240 | unsigned long atomicAdd(unsigned long* address, unsigned long val) {
      |               ^         ~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-6.1.3/include/hip/amd_detail/amd_hip_atomic.h:252:20: note: candidate function not viable: no known conversion from 'half2 *' (aka '__half2 *') to 'unsigned long long *' for 1st argument
  252 | unsigned long long atomicAdd(unsigned long long* address, unsigned long long val) {
      |                    ^         ~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/rocm-6.1.3/include/hip/amd_detail/amd_hip_atomic.h:264:7: note: candidate function not viable: no known conversion from 'half2 *' (aka '__half2 *') to 'float *' for 1st argument
  264 | float atomicAdd(float* address, float val) {
      |       ^         ~~~~~~~~~~~~~~
/opt/rocm-6.1.3/include/hip/amd_detail/amd_hip_atomic.h:290:8: note: candidate function not viable: no known conversion from 'half2 *' (aka '__half2 *') to 'double *' for 1st argument
  290 | double atomicAdd(double* address, double val) {

Operating System

Ubuntu 22.04

GPU

mi300

ROCm Component

6.1.3 + rocblas + rocwmma

cjatin · 2024-08-14T07:47:25Z

For half, we have unsafeAtomicAdd instead of atomicAdd.

https://github.com/ROCm/clr/blob/aa6d07518fdb211c49fd617ee9f69408f1acddfd/hipamd/include/hip/amd_detail/amd_hip_fp16.h#L1511

ZJLi2013 · 2024-08-15T12:10:12Z

For half, we have unsafeAtomicAdd instead of atomicAdd.

https://github.com/ROCm/clr/blob/aa6d07518fdb211c49fd617ee9f69408f1acddfd/hipamd/include/hip/amd_detail/amd_hip_fp16.h#L1511

is there any risk concern for unsafeAtomicAdd, just wonder in which way it's unsafe

b-sumner · 2024-08-15T13:54:52Z

Its unsafe because it causes the fast HW instruction to be generated, but those instructions don't work if they act on memory that is not cached, e.g. across a PCIe bus. The developer needs to assert that they are willing to take that risk.

jinz2014 · 2024-09-05T02:38:35Z

Does ROCm 6.2 support it ?

/opt/rocm-6.2.0/lib/llvm/bin/../../../include/hip/amd_detail/amd_hip_fp16.h does not contain the function.

jinz2014 · 2024-09-05T02:41:08Z

Do you think it is better to have two types of atomic add functions than a single function in CUDA ?

tcgu-amd · 2024-10-02T20:21:24Z

Hi @jinz2014, the function is recently added part of this. You can see it on staging but it hasn't made to 6.2.2 release yet, but please keep an eye out for it since it should be out soon. Meanwhile if you are interested, you can try to compile and install clr staging to see how it works.

Do you think it is better to have two types of atomic add functions than a single function in CUDA ?

In our case we wanted to ensure developers assert that they are using a unsafe function. However, I see your point in having a uniform interface to access these functions. We can will bring it up internally for discussion for sure. Thanks!

tcgu-amd · 2024-10-03T18:32:19Z

Hi @jinz2014, to follow up on the discussion, in our opinion, it is best to have two versions of the atomic functions because atomic* and unsafeAtomic* APIs behave differently, hence the distinction to avoid confusion. We think this aspect outweigh the syntax discrepancy with CUDA, and should be explicitly noted by developers that use these APIs. Thanks!

jinz2014 · 2024-10-03T18:57:34Z

Hi @tcgu-amd Thanks for your answers.

tcgu-amd · 2024-10-03T19:17:00Z

@jinz2014 No problem! Is there anything else we can help you with? If not I will close this issue for now. Thanks!

jinz2014 · 2024-10-04T00:01:55Z

I will try the atomicAdd() for half data types. I am not sure if @ZJLi2013 has more questions.

ZJLi2013 · 2024-10-24T01:48:55Z

hi, @tcgu-amd , I tried rocm/torch images with rocm6.2.3, rocm6.2.4, neither has unsafeAtomicAdd yet.

anther thing may need you clarify, looks in /opt/rocm/include, there is no clr headers, only found: /opt/rocm/include/./hip/amd_detail/amd_hip_bf16.h , supposing clr headers has merged to hip header dir?

will wait on a official release then

Thanks again

tcgu-amd · 2024-10-24T14:06:32Z

anther thing may need you clarify, looks in /opt/rocm/include, there is no clr headers, only found: /opt/rocm/include/./hip/amd_detail/amd_hip_bf16.h , supposing clr headers has merged to hip header dir?

Yes, that's correct. There is no clr directory under /opt/rocm, since the repository mainly hosts runtimes for hip and OpenCL. The amd_detail headers from clr are indeed under /opt/rocm/include/hip/amd_detail.

will wait on a official release then

Sounds good.

Thanks!

tcgu-amd · 2024-11-11T15:20:13Z

This issue will be closed since there is no further actionable item/activity. Please feel free to re-open for follow ups and further inquires requiring the release status. Thanks!

jinz2014 · 2024-11-11T20:41:11Z

@tcgu-amd
Could you close the issue after the feature is available in the release ?

tcgu-amd · 2024-11-11T21:12:06Z

@jinz2014 sure! Just a note that we would prefer closing issues with no further actionable items to help better track which ones still require active attention. Since the feature requested specifically for this issue is already in staging, we can keep it open a little longer. However, in general, feature requests that are on internal roadmaps will be closed, but we do encourage users to poll for progress by continue to ask follow up questions. Hope this makes sense. Thanks!

ZJLi2013 changed the title ~~atomicAdd() to support half2~~ [Feature Request]: atomicAdd() to support half2 Aug 14, 2024

ZJLi2013 closed this as completed Aug 15, 2024

ZJLi2013 reopened this Aug 15, 2024

ppanchad-amd added the Under Investigation label Oct 1, 2024

tcgu-amd added the feature request label Oct 15, 2024

tcgu-amd closed this as completed Nov 11, 2024

tcgu-amd reopened this Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request]: atomicAdd() to support half2 #3573

[Feature Request]: atomicAdd() to support half2 #3573

ZJLi2013 commented Aug 14, 2024

cjatin commented Aug 14, 2024

ZJLi2013 commented Aug 15, 2024

b-sumner commented Aug 15, 2024

jinz2014 commented Sep 5, 2024

jinz2014 commented Sep 5, 2024

tcgu-amd commented Oct 2, 2024

tcgu-amd commented Oct 3, 2024

jinz2014 commented Oct 3, 2024

tcgu-amd commented Oct 3, 2024

jinz2014 commented Oct 4, 2024

ZJLi2013 commented Oct 24, 2024

tcgu-amd commented Oct 24, 2024

tcgu-amd commented Nov 11, 2024

jinz2014 commented Nov 11, 2024

tcgu-amd commented Nov 11, 2024

[Feature Request]: atomicAdd() to support half2 #3573

[Feature Request]: atomicAdd() to support half2 #3573

Comments

ZJLi2013 commented Aug 14, 2024

Suggestion Description

Operating System

GPU

ROCm Component

cjatin commented Aug 14, 2024

ZJLi2013 commented Aug 15, 2024

b-sumner commented Aug 15, 2024

jinz2014 commented Sep 5, 2024

jinz2014 commented Sep 5, 2024

tcgu-amd commented Oct 2, 2024

tcgu-amd commented Oct 3, 2024

jinz2014 commented Oct 3, 2024

tcgu-amd commented Oct 3, 2024

jinz2014 commented Oct 4, 2024

ZJLi2013 commented Oct 24, 2024

tcgu-amd commented Oct 24, 2024

tcgu-amd commented Nov 11, 2024

jinz2014 commented Nov 11, 2024

tcgu-amd commented Nov 11, 2024