Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invoke micro kernels from CUTLASS #596

Merged
merged 9 commits into from
Mar 15, 2024
Merged

Invoke micro kernels from CUTLASS #596

merged 9 commits into from
Mar 15, 2024

Conversation

roastduck
Copy link
Owner

@roastduck roastduck commented Jan 21, 2024

Now we can use a CUTLASS micro kernel for matmul on GPU shared memory. Currently it only works for A100 + float64.

Still TODO:

  • Support other data types, which requires further memory layout manipulation.
  • Support other GPU architectures.
  • Also map memory operations to memory micro kernels.

Fixes:
- Check `beta=0` or `beta=1` in micro kernel.
- Fix the testing program.

Workarounds:
- Temporarily add __syncthreads in micro kernels.
- Temporarily set data types to float64 to avoid the issue that
  CUTLASS does not support RowMajor layout for float16.
@roastduck roastduck marked this pull request as ready for review March 15, 2024 06:31
@roastduck roastduck changed the title [WIP] Invoke micro kernels from CUTLASS Invoke micro kernels from CUTLASS Mar 15, 2024
@roastduck roastduck added enhancement New feature or request functionality Support new types of user programs labels Mar 15, 2024
@roastduck roastduck merged commit 50909cc into master Mar 15, 2024
9 checks passed
@roastduck roastduck deleted the cutlass branch March 15, 2024 10:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request functionality Support new types of user programs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant