Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm 6.0][offline GPU compiler] DEV tests failing test_check_numerics_test test_reduce_test test_find_db #2617

Closed
junliume opened this issue Dec 18, 2023 · 9 comments

Comments

@junliume
Copy link
Collaborator

junliume commented Dec 18, 2023

2617.log
[Steps to Reproduce]:

CXX=/opt/rocm/llvm/bin/clang++ CXXFLAGS='-Werror'  cmake -DMIOPEN_TEST_FLAGS=' --disable-verification-cache  --verbose ' -DCMAKE_BUILD_TYPE=debug -DCMAKE_CXX_FLAGS_DEBUG='-g -fno-omit-frame-pointer -fsanitize=undefined -fno-sanitize-recover=undefined -Wno-option-ignored ' -DBUILD_DEV=On -DMIOPEN_USE_MLIR=ON -DMIOPEN_GPU_SYNC=Off  -DMIOPEN_USE_COMGR=Off -DCMAKE_PREFIX_PATH="/root/MIOpen/install_dir;/opt/rocm"    ..

[Observations]:

./hip_f8_impl.hpp:31:22: error: unknown type name 'bfloat16'; did you mean 'float1'?
using hip_bfloat16 = bfloat16;
                    ^~~~~~~~
                    float1
MIOpenCheckNumerics.cpp:126:14: error: no viable overloaded '='
        minV = min(minV, val);
        ~~~~ ^ ~~~~~~~~~~~~~~
MIOpenCheckNumerics.cpp:188:5: note: in instantiation of function template specialization 'check_numerics<miopen_f8::hip_f8<miopen_f8::hip_f8_type::fp8>, float>' requested here
    check_numerics<miopen_f8::hip_f8<miopen_f8::hip_f8_type::fp8>, float>(
    ^
./hip_float8.hpp:259:43: note: candidate function not viable: call to __host__ function from __device__ function
    inline MIOPEN_HIP_HOST_DEVICE hip_f8& operator=(const hip_f8& rhs)
                                          ^
MIOpenCheckNumerics.cpp:127:14: error: no viable overloaded '='
        maxV = max(maxV, val);
        ~~~~ ^ ~~~~~~~~~~~~~~
./hip_float8.hpp:259:43: note: candidate function not viable: call to __host__ function from __device__ function
    inline MIOPEN_HIP_HOST_DEVICE hip_f8& operator=(const hip_f8& rhs)
                                          ^
MIOpenCheckNumerics.cpp:126:14: error: no viable overloaded '='
        minV = min(minV, val);
        ~~~~ ^ ~~~~~~~~~~~~~~
MIOpenCheckNumerics.cpp:200:5: note: in instantiation of function template specialization 'check_numerics<miopen_f8::hip_f8<miopen_f8::hip_f8_type::bf8>, float>' requested here
    check_numerics<miopen_f8::hip_f8<miopen_f8::hip_f8_type::bf8>, float>(
    ^
./hip_float8.hpp:259:43: note: candidate function not viable: call to __host__ function from __device__ function
    inline MIOPEN_HIP_HOST_DEVICE hip_f8& operator=(const hip_f8& rhs)
                                          ^

@atamazov
Copy link
Contributor

@junliume In the provided log, ls /opt reports rocm-5.7.1 while HIP version is 6.0.23494. Is rocm/miopen:ci_94f0fe a 6.0 RC docker?

@junliume
Copy link
Collaborator Author

@junliume In the provided log, ls /opt reports rocm-5.7.1 while HIP version is 6.0.23494. Is rocm/miopen:ci_94f0fe a 6.0 RC docker?

@atamazov I think the first reports base OS ROCm version. You can use the docker I sent the other day with hipRTC changes to reproduce this issue. This issue is blocking CI docker upgrade so majority of CI dockers are not yet 6.0

@atamazov
Copy link
Contributor

@junliume Thanks, my Q was silly ;) Can you please rename this to "[ROCm 6.0][offline GPU compiler] ... "? It might also be worth assigning urgency_blocker as this blocks CI upgrade.

@junliume junliume changed the title [ROCm 6.0][No-COMgr-hipRTC] DEV tests failing test_check_numerics_test test_reduce_test test_find_db [ROCm 6.0][offline GPU compiler] DEV tests failing test_check_numerics_test test_reduce_test test_find_db Dec 18, 2023
@junliume
Copy link
Collaborator Author

@junliume Thanks, my Q was silly ;) Can you please rename this to "[ROCm 6.0][offline GPU compiler] ... "? It might also be worth assigning urgency_blocker as this blocks CI upgrade.

I'm afraid that hipcc got rid of some necessary headers which make this harder to fix, I've asked the team to take a look.
Meanwhile if you have some time, could you focus on the hipRTC issue?
#2615 fixed one thing or two, but now still having issues mentioned in #2615 (comment)

hipcc and hipRTC, one thing a time and I think you are more familiar with the second one? :)

@atamazov
Copy link
Contributor

@junliume I am more familiar with both ;)

@junliume
Copy link
Collaborator Author

@junliume I am more familiar with both ;)

Good thing is that both problems are reproducible from the docker :)

@atamazov
Copy link
Contributor

@junliume Well, this seems a compiler regression in 6.0. ROCm 5.7.1 does not have these problems. Do we need a workaround for this issue?

@junliume
Copy link
Collaborator Author

@junliume Well, this seems a compiler regression in 6.0. ROCm 5.7.1 does not have these problems. Do we need a workaround for this issue?

@atamazov yes. Compiler patch comes along slowly, but I will create a ticket for them. In parallel we should work around it if at all possible to unblock CI upgrade.

@junliume
Copy link
Collaborator Author

@atamazov FYI #2519 is updated with changed requested by compiler

https://gerrit-git.amd.com/c/compute/ec/clr/+/955022 is the commit caused __HIP_PLATFORM_HCC__ to be removed
MIOpen should remove use of __HIP_PLATFORM_HCC__ and use __HIP_PLATFORM_AMD__.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants