Have some instructions for using Xformers on Win64 with the normally disabled 2:4 sparse tensor routines and Triton (Triton should also enable some missing items in torch) #2870
NeedsMoar
started this conversation in
Show and tell
Replies: 1 comment
-
After some testing the main thing I've noticed is that very large batch sizes seem to be the same speed as smaller ones now depending on the resolution... for example before the "optimal" size for a batch of 512x512 images on the 4090 seemed to be 20ish, averaged per image. Now 32x512x512 images is the same total speed as 16x512x512, roughly 2s/it with animatediff at both batch sizes. Values in the 20s (non-power-of-two) were slightly slower. I don't know how exactly that works but I'll take it. Unless something has changed in comfy or animatediff that would explain that? I can't keep up with the huge number of changelists. :-) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The wheel for xformers on Windows includes flash attention now, but while checking the enabled features list I was sad to see that the "disabled" items part had gotten bigger. This saddens me because the whole point of bothering to use a disaster of a scripting language like python in the first place is keeping your project cross platform without the usual messes. Unfortunately most things are too slow to do in native python so practically all major functionality is implemented in C++ / CUDA / etc, and people who use linux absolutely hate making things cross-platform and then use weird GCC extensions to make things harder to port. For the record I did an internal port of LLVM and clang + our product integration to Windows in under a week before LLVM would even build for Windows out of the box; this was mainly possible because a huge number of ISO C++ members were working on it like crazy; presumably so they'd never have to look at the abomination that is the GCC codebase again. /rant
These instructions are for Win64, Torch 2.2, xformers 0.0.24 and Python 3.10 or 3.11.
The sparse tensor portion just requires two steps:
Now you're done with the hard part, apart from figuring out how to use the API in practice.
There's no point in conditionalizing for mac unless you have one old enough to run a CUDA card that's useful enough to run stable diffusion models, or maybe one of the more recent Intel mac pros with Windows running on it if apple didn't block NVidia hardware from working.
The triton portion is easier thanks to the wheel builds someone made:
Just download the triton artifacts from here, extract the wheel for your version of python, and run pip install on it and you're set.
After all this is done you can check that xformers sees everything by running
python -m xformers.info
which should output something like this:
I believe the triton flash attention functions are unavailable when the flash-attention-2 versions are in use so that's normal, and the sequence_parallel stuff requires an old nvidia lib that's never been supported on Windows and only benefits multi-gpu setups so I think that's everything you can get. I haven't explored what this lets torch use, but I think lack of triton was the reason inductor and possibly torch.compile don't work right so I'll leave that to somebody who likes messing around in python to play with.
Beta Was this translation helpful? Give feedback.
All reactions