Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DFT][MKLGPU] Tests can fail on PVC #601

Open
Rbiessy opened this issue Oct 21, 2024 · 0 comments
Open

[DFT][MKLGPU] Tests can fail on PVC #601

Rbiessy opened this issue Oct 21, 2024 · 0 comments
Labels
bug A request to fix an issue

Comments

@Rbiessy
Copy link
Contributor

Rbiessy commented Oct 21, 2024

Summary

The MKLGPU backend tests can fail on PVC.

Version

Using the tip of develop as of today (6923d40).

Environment

Running on PVC ( GPU Max 1100 1.3) with the oneAPI base toolkit 2024.2.0. OS is Ubuntu 22.04.
apt level-zero package versions:

  • level-zero: 1.16.15-881~22.04
  • level-zero-dev: 1.16.15-881~22.04
  • intel-level-zero-gpu: 1.3.30049.10-950~22.04

Steps to reproduce

cmake -Bbuild-pvc -GNinja .
cd build-pvc
ninja
ctest --output-on-failure

Observed behavior

Full log: log_pvc.txt
The tests failing all seem to be 2D.
Short extract:

[ RUN      ] ComputeTestSuite/ComputeTests_in_place_COMPLEX.COMPLEX_SINGLE_in_place_buffer/sizes_4x4_fwd_strides_0_7_1_bwd_strides_0_5_1_batches_2_Intel_R__Data_Center_GPU_Max_1100
Mismatching results: actual = (2.32784,-0.862237) vs. reference = (-0.0695089,0.350374)
 relative error = 7.52116 absolute error = 2.68658 relative bound = 9.53674e-05 absolute bound = 1.55578e-05
 at position 2, 0, 0
 at indices 10, 8
Mismatching results: actual = (1.28088,-0.619282) vs. reference = (2.32784,-0.862237)
 relative error = 0.432961 absolute error = 1.07478 relative bound = 9.53674e-05 absolute bound = 1.55578e-05
 at position 2, 1, 0
 at indices 11, 9
Mismatching results: actual = (0.626577,1.75821) vs. reference = (1.28088,-0.619282)
 relative error = 1.7332 absolute error = 2.46588 relative bound = 9.53674e-05 absolute bound = 1.55578e-05
 at position 2, 2, 0
 at indices 12, 10

Note the BLAS failures are reported in a separate issue: #600

Expected behavior

The tests should pass.

@Rbiessy Rbiessy added the bug A request to fix an issue label Oct 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A request to fix an issue
Projects
None yet
Development

No branches or pull requests

1 participant