Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running your inference.py demo does NOT produce anything close to a good result. #36

Closed
aifartist opened this issue Nov 27, 2024 · 4 comments
Assignees

Comments

@aifartist
Copy link

Running your inference.py demo does NOT produce anything close to a good result.
Also, there is no way inference.py runs on my 4090. It OOM's so I need to enable_model_cpu_offload() and other things to even get it to run.

At a minimum there should be a stand alone py demo which runs in 24GB's and produces a result like all those being shown.

@jpgallegoar
Copy link

"At a minimum" no... don't even...

@gjnave
Copy link

gjnave commented Nov 30, 2024

im getting an output but its just noise.. wonderinf if there is a VAE or something im missing

@eoffermann
Copy link

eoffermann commented Dec 5, 2024

It takes up about 37.6GB on my A6000 to run a 768x512/24p/121frame/40steps video. Reducing resolution helps some, but going to 384x256 only knocked that down to a bit over 31GB. That's a lot - but not unusual for running modern, leading edge models.

I tested this https://github.com/KT313/LTX_Video_better_vram/tree/test - which converts the UNET model to bfloat16 which dropped GPU ram consumption to 22.2GB - so you should be able to get that running on your 4090. (I don't notice any meaningful quality difference.)

If you need it to run in a lot less, that's another research project.

This works great for me, though - there aren't any issues with it generating descriptions that match the examples. It took no time to add a gradio app to it.

You can try running mine with the gradio app and the text encoder unloading by pulling down this fork/branch: https://github.com/eoffermann/LTX-Video/tree/gradio - but if it's not working for you at all (or is producing really poor results) you may have other problems.

@able2608
Copy link

able2608 commented Dec 5, 2024

I don't think you need a system spec that high to run the model anyways, especially if you are doing 768x512 generation. The DiT and VAE can be run with under 6GB VRAM for 97 frames, the VRAM requirement is pretty much due to the T5 encoder. However, there already exists quantized T5 encoder for use (actually the T5 this model uses is the same as the one used in Flux, and there are already a slew of gguf ready to be used on huggingface). Not pretty sure if this particular repo will receive such update to utilize those models, but if you would like to try it out, using ComfyUI is the current way to go. Some nuts and bolts can be found on my other comment here: #4 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants