-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU_X] Unit tests failing with "cudaErrorInvalidDeviceFunction: invalid device function" #46864
Comments
assign heterogeneous |
cms-bot internal usage |
A new Issue was created by @iarspider. @Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
on what machines are the tests running ? |
Grid node with nVidia gpu:
|
could you run also |
FWIW, the test has succeeded in 14_2_X (at least between 11-27-2300 and 12-03-2300). |
|
What's very curious is that the alpaka-based tests all pass in the IBs 🤔
|
Is there a way to log interactively on a node where the test fails ? |
@fwyzard , you can do the following to login to the grid gpu node ( where a dummy job is running to hold the node).
Node is available for next 20 hours. Once you logged out of this node then it will be deallocated automatically. |
Mhm, it didn't like me, I got kicked out immediately:
Can I request a similar slot myself ? |
yes, just use condor to request a gpu resource |
add the following in the condor job to get gpu
|
OK, I can reproduce the problem. |
Two unit tests - HeterogeneousTest/CUDAKernel/testCudaDeviceAdditionKernel and HeterogeneousTest/CUDAWrapper/testCudaDeviceAdditionWrapper are failing in GPU_X IB since at least CMSSW_15_0_GPU_X_2024-11-27-2300:
The text was updated successfully, but these errors were encountered: