-
Notifications
You must be signed in to change notification settings - Fork 984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QST] Is there a Cutlass GEMM example to read inputs with custom padding? #1922
Comments
@thakkarV Is there a cutlass template example for Hopper that computes Gemm with each of their input applied by a custom pre-calculation? (i.e. a fused |
if you are doing this modification to the input tensors, its usually much better to fuse this in the epilogue of the previous layer. we do not have a mainloop fusion layer for this reason |
the padding you are describing above is something the framework or graph compiler layer would take care of rather than something cutlass does |
Sure, but the problem seems to be TMA's interface isn't flexible enough or compatible with all fusion requirement. Before Hopper, fusion can be taken care of by compiler during loading gmem to smem (e.g. Although I can disable TMA to perform GEMM following early GPU styles so that I can apply custom padding between gmem -> smem. But the sad thing is Hopper's Gemm won't be efficient without using TMA, which loses the sense to do padding fusion for better speed. |
For F8_scaled_GEMM(x, y, scale_x, scale_y) which is supported by Cutlass, it seems to do an easier & in-place fusion requirement so as to let |
What is your question?
This is the computation requirement using a native way to execute a GEMM with custom input:
Is it supported by any Cutlass template to do above fused "pad + gemm" together? Which example can be good for reference?
The text was updated successfully, but these errors were encountered: