You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic 16x8x8 fp16 coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.
But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).
The text was updated successfully, but these errors were encountered:
I only have AMD RDNA3(GFX11+) GPU for testing, and according to the RADV PR above, the supported coopMatMul type is 16x16x16 (opcode: v_wmma_f32_16x16x16_f16) with subgroup size of 64. This settings probably won't work on both Intel and nvidia cards.
Vulkan 1.3.255 is released with a new vendor neutral extension VK_KHR_cooperative_matrix for tensorcore-like fast matrix multiplication, which could possibly be used to speedup nnedi3. A basic
16x8x8 fp16
coopmatMulAdd is enough. And according to some perf stats I found elsewhere, a 2x to 3x speedup could be expected.But first, this had to be hold until AMD implemented this extension in their Linux driver (or maybe radv will overcome and implement this first?).
The text was updated successfully, but these errors were encountered: