speedup nnedi3 with cooperative matrix multiplication #59

bjin · 2023-08-11T16:49:01Z

Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic 16x8x8 fp16 coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.

But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).

The text was updated successfully, but these errors were encountered:

bjin · 2023-09-12T04:25:37Z

radv(amd): https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24683
anv(intel): https://gitlab.freedesktop.org/mesa/mesa/-/issues/9250

I only have AMD RDNA3(GFX11+) GPU for testing, and according to the RADV PR above, the supported coopMatMul type is 16x16x16 (opcode: v_wmma_f32_16x16x16_f16) with subgroup size of 64. This settings probably won't work on both Intel and nvidia cards.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speedup nnedi3 with cooperative matrix multiplication #59

speedup nnedi3 with cooperative matrix multiplication #59

bjin commented Aug 11, 2023

bjin commented Sep 12, 2023

speedup nnedi3 with cooperative matrix multiplication #59

speedup nnedi3 with cooperative matrix multiplication #59

Comments

bjin commented Aug 11, 2023

bjin commented Sep 12, 2023