Invoke micro kernels from CUTLASS #596

roastduck · 2024-01-21T11:28:45Z

Now we can use a CUTLASS micro kernel for matmul on GPU shared memory. Currently it only works for A100 + float64.

Still TODO:

Support other data types, which requires further memory layout manipulation.
Support other GPU architectures.
Also map memory operations to memory micro kernels.

Fixes: - Check `beta=0` or `beta=1` in micro kernel. - Fix the testing program. Workarounds: - Temporarily add __syncthreads in micro kernels. - Temporarily set data types to float64 to avoid the issue that CUTLASS does not support RowMajor layout for float16.

… cutlass

roastduck added 5 commits January 21, 2024 19:26

[WIP] Invoke micro kernels from CUTLASS

40c5dc5

Fix: accum should be in row-major

b5438bb

Fix __syncthreads()

0f4d8fd

Merge branch 'master' of https://github.com/roastduck/FreeTensor into…

54c0ab3

… cutlass

roastduck marked this pull request as ready for review March 15, 2024 06:31

roastduck changed the title ~~[WIP] Invoke micro kernels from CUTLASS~~ Invoke micro kernels from CUTLASS Mar 15, 2024

Adjust test case name

43db782

roastduck force-pushed the cutlass branch from b52a58e to 43db782 Compare March 15, 2024 06:42

Fix compiler warnings

8691b62

roastduck added enhancement New feature or request functionality Support new types of user programs labels Mar 15, 2024

roastduck added 2 commits March 15, 2024 15:30

Include micro kernels as needed

fba3483

Fix a memory bug in pass/simplify

3225dc3

roastduck merged commit 50909cc into master Mar 15, 2024
9 checks passed

roastduck deleted the cutlass branch March 15, 2024 10:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invoke micro kernels from CUTLASS #596

Invoke micro kernels from CUTLASS #596

roastduck commented Jan 21, 2024 •

edited

Loading

Invoke micro kernels from CUTLASS #596

Invoke micro kernels from CUTLASS #596

Conversation

roastduck commented Jan 21, 2024 • edited Loading

roastduck commented Jan 21, 2024 •

edited

Loading