Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert: generated K quants are useless for SD3.5 Large models #446

Open
stduhpf opened this issue Oct 24, 2024 · 1 comment
Open

Convert: generated K quants are useless for SD3.5 Large models #446

stduhpf opened this issue Oct 24, 2024 · 1 comment

Comments

@stduhpf
Copy link
Contributor

stduhpf commented Oct 24, 2024

Quantizing Stable Diffusion 3.5 models to any kind of k-quants results in large files made out of mostly fp16 weights. That's because
a lot of tensors have width 2432 or 7296, wich do not fit in the block size of k quants (256).

https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1761

For example, a q3_k quant of SD3.5 Large is 13 842 megabytes (it's 16 460 megabytes for the fp16 model), which would indicate only about 10% of the weights in total got quantized (assuming 3.4375 bits per quantized weight) .

I'm not sure what could be done to fix that. Maybe fallback to the next bigger quant that does the job instead of skipping the tensor altogether?

Edit: I found this PR that adresses a similar issue lin llama.cpp: ggerganov/llama.cpp#2001

@stduhpf
Copy link
Contributor Author

stduhpf commented Oct 24, 2024

By mixing q2_k with q4_0 as a fallback for the tensors that can't be k-quantized, I can just barely fit SD3.5 Large with a 960x960 image compute buffer in my 8GB vram gpu. Quality is a bit degraded though, even tho it's mostly q4_0.

Edit: just q4_0 works too if I close my browser and all my electron apps to free up vram, and the quality is better. So I guess the best I can use is a mix of q4_k and q4_0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant