-
Notifications
You must be signed in to change notification settings - Fork 743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[E2E][HIP] Several E2E tests failed on HIP in SYCL Nightly testing #12997
Comments
memcpy2d issues is a rocm bug that was fixed and then apparently broken again in a later rocm version. Reminder that it would be a good idea to stop using a card for CI that is officially unsupported by rocm. And especially a rdna2 one that has no matrix cores, no good double floating support, and hence is only at all useful in gpgpu for a very limited set of applications like blender that use single floats. Hence it is something that amd is rightly not going to have as a priority to maintain support/fix bugs in new rocm versions. If you want a more economic card then using a small rdna3 or later card (which has matrix cores) would be smarter. cdna* (cdna2/3 are currently most relevant) family cards are the most relevant for gpgpu, but amd don't have economy variants of these cards. |
@bader @stdale-intel can we upgrade the AMD CI machine to use the latest GPU card? |
#12955 marks 2dmem tests that fail due to rocm driver bug even on gfx90a XFAIL |
Thanks! |
Looks like we still have these failures in SYCL Nightly: https://github.com/intel/llvm/actions/runs/8516817039/job/23352774112 |
Hi @uditagarwal97 . oneapi-src/unified-runtime#1455 ( tested with #13059 ) should be fixing all of the following:
There will probably be a separate future fix for the joint_matrix failure which could get XFAIL'd for now probably. |
Seems like it may be possible to merge this one unified-runtime/pull/1455 very soon as it will be affecting the next release. Not sure about the exact timeframe but it is marked as needed asap and the UR team is aware. |
@uditagarwal97 is it OK to wait for this fix? Once it is merged if anything is still failing, then let me know and I can update the XFAIL PR. |
I think we can wait for UR fix. |
Hi @uditagarwal97. unified-runtime/pull/1455 has been merged yesterday, so these tests must now pass. Anything remaining that may still fail should get XFAILed by #12955. |
Thanks! I no longer see the failures in SYCL Nightly: https://github.com/intel/llvm/actions/runs/8640903467 |
Describe the bug
The following E2E tests failed on HIP during SYCL Nightly testing:
https://github.com/intel/llvm/actions/runs/8242960746/job/22543076923
Basic/large-range.cpp
test failed with the following error message:USM/* and syclcompat/* tests failed with the following error message:
To reproduce
intel/llvm commit id: ad6085c
Environment
sycl-ls --verbose
output:Additional context
No response
The text was updated successfully, but these errors were encountered: