You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quantizing Stable Diffusion 3.5 models to any kind of k-quants results in large files made out of mostly fp16 weights. That's because
a lot of tensors have width 2432 or 7296, wich do not fit in the block size of k quants (256).
For example, a q3_k quant of SD3.5 Large is 13 842 megabytes (it's 16 460 megabytes for the fp16 model), which would indicate only about 10% of the weights in total got quantized (assuming 3.4375 bits per quantized weight) .
I'm not sure what could be done to fix that. Maybe fallback to the next bigger quant that does the job instead of skipping the tensor altogether?
By mixing q2_k with q4_0 as a fallback for the tensors that can't be k-quantized, I can just barely fit SD3.5 Large with a 960x960 image compute buffer in my 8GB vram gpu. Quality is a bit degraded though, even tho it's mostly q4_0.
Edit: just q4_0 works too if I close my browser and all my electron apps to free up vram, and the quality is better. So I guess the best I can use is a mix of q4_k and q4_0
Quantizing Stable Diffusion 3.5 models to any kind of k-quants results in large files made out of mostly fp16 weights. That's because
a lot of tensors have width 2432 or 7296, wich do not fit in the block size of k quants (256).
https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1761
For example, a q3_k quant of SD3.5 Large is 13 842 megabytes (it's 16 460 megabytes for the fp16 model), which would indicate only about 10% of the weights in total got quantized (assuming 3.4375 bits per quantized weight) .
I'm not sure what could be done to fix that. Maybe fallback to the next bigger quant that does the job instead of skipping the tensor altogether?
Edit: I found this PR that adresses a similar issue lin llama.cpp: ggerganov/llama.cpp#2001
The text was updated successfully, but these errors were encountered: