Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

speedup nnedi3 with cooperative matrix multiplication #59

Open
bjin opened this issue Aug 11, 2023 · 1 comment
Open

speedup nnedi3 with cooperative matrix multiplication #59

bjin opened this issue Aug 11, 2023 · 1 comment

Comments

@bjin
Copy link
Owner

bjin commented Aug 11, 2023

Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic 16x8x8 fp16 coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.

But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).

@bjin
Copy link
Owner Author

bjin commented Sep 12, 2023

radv(amd): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683
anv(intel): https://gitlab.freedesktop.org/mesa/mesa/-/issues/9250

I only have AMD RDNA3(GFX11+) GPU for testing, and according to the RADV PR above, the supported coopMatMul type is 16x16x16 (opcode: v_wmma_f32_16x16x16_f16) with subgroup size of 64. This settings probably won't work on both Intel and nvidia cards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant