-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rocm jaxlib v0.4.30 qa cleanup #35
Rocm jaxlib v0.4.30 qa cleanup #35
Conversation
Main changes include: * Added support for fp8 matmul with output data type to be fp8 and bf16. * Added buffer comparators for fp8e4m3fnuz and fp8e5m2fnuz
@draganmladjenovic take a look at the failures for MLIR tests at 79b3692 Most of them are failing due to bigger thread size and smaller block size in AMD GPUs compared to NVIDIA. Some tests run into infinite loop on MI200 and hence I have commented out RunAndCompare |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're ok on CublasDot at 0.4.28, but it's failed on 0.4.30?
wondering do we have a ticket to track on it?
It needs more investigation. It depends on the choice of autotune https://github.com/ROCm/frameworks-internal/issues/9088 |
e1d02ab
to
be97509
Compare
dd85325
to
4bc36d5
Compare
4bc36d5
to
339dde0
Compare
switch (algorithm) { | ||
case PrecisionConfig::ALG_DOT_ANY_F8_ANY_F8_F32: | ||
case PrecisionConfig::ALG_DOT_ANY_F8_ANY_F8_F32_FAST_ACCUM: | ||
// Other F8 types are actually not supported by NVIDIA GPUs. | ||
return is_cuda_ge_ada && | ||
return (is_cuda_ge_ada || is_rocm_mi100_and_above) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this is really correct. I guess FP8 support begins from MI300 arch? But I see that we have the same check also on upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reviewed gemm-related changes and buffer_comparator, leaving the remaining LLVM changes to Dragan and Chao
Is this PR merged in QA-31 too? @hsharsha |
No description provided.