You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the techniques, I found GroupNorm was effective to speed up. But when I tested the speed between triton and torch operations, I found triton was slower than torch. The script is the same as https://github.com/chengzeyi/stable-fast/blob/main/src/sfast/triton/ops/group_norm.py , and the result is shown below.
For the GroupNorm operation, whether triton is faster than torch. Can I replace the torch.nn.GroupNorm with TritonGroupNorm directly, to accelerate stable diffusion model.
My env is
A100 GPU, torch 2.1, triton 2.1, no xformers, diffusers 0.21.2
The text was updated successfully, but these errors were encountered:
@chenly15 Our implementation may be not super effecient now. I currently use other methods to speed up groupnorm computation. However, it has not been open-sourced by far.
In the techniques, I found GroupNorm was effective to speed up. But when I tested the speed between triton and torch operations, I found triton was slower than torch. The script is the same as https://github.com/chengzeyi/stable-fast/blob/main/src/sfast/triton/ops/group_norm.py , and the result is shown below.
For the GroupNorm operation, whether triton is faster than torch. Can I replace the torch.nn.GroupNorm with TritonGroupNorm directly, to accelerate stable diffusion model.
My env is
A100 GPU, torch 2.1, triton 2.1, no xformers, diffusers 0.21.2
The text was updated successfully, but these errors were encountered: