-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving disk and download time (plus VRAM) #19
Comments
how's FP8 version, is it run faster? |
Not sure if faster, it save disk space and download time. |
updated , try the new version |
Hi @1038lab ! The main problem I see here is the strategy of downloading upstream code, which doesn't implement a good memory strategy. You must incorporate it and patch the code to do the proper things. Also: loading the FP8 file won't solve memory issues, PyTorch loads it using some current default dtype, so it gets expanded once loaded, it just small on disk. To get quantization working you must patch the nn.Linear layers. BTW: Please don't use print, use logging.debug, if you use print the messages goes to the console, but the GUI can't catch them. |
Applying quantization is a good approach. I'll make an effort to update it when I have the time. |
I manually downloaded the model from here:
https://huggingface.co/silveroxides/OmniGen-V1/tree/main
Renamed the fp8 file to model.safetensors and got it working.
The FP8 model is just 3.7 GB
The text was updated successfully, but these errors were encountered: