-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA: add BF16 support #11093
CUDA: add BF16 support #11093
Conversation
@JohannesGaessler |
Addition, multiplication, division, etc. |
In this commit it stops working for me. I have many warnings when compiling.
and running fails also
|
It compiles and works on my system with commit: b56f079
|
Okay, I don't at all understand why this is happening. The problem is that for whatever reason your 3090 is not detected during compilation so it instead compiles the code for compute capability 5.2 and you later get an error when you try to run the code. The only thing that this PR changes that would maybe have any effect is the inclusion of |
could we separate the refactoring from the inclusion of the |
* CUDA: add BF16 support
This PR adds BF16 support for CUDA/HIP. For large batch sizes the BF16 data is converted to FP32, then FP32 cuBLAS GEMM is used. It seems that cuBLAS unfortunately does not have support for BF16 tensor cores. For batch size 1 I added a template parameter to
mul_mat_vec
to specify the input type as either FP16 or BF16. The calculations are done using FP32 arithmetic since BF16 hardware support for operations other than matrix multiplications are only available with compute capability 9.0 and the highest that I own is 8.9. I will purchase a Blackwell GPU in a few weeks when they come out and revisit this.Performance: