Convert: generated K quants are useless for SD3.5 Large models #446

stduhpf · 2024-10-24T17:48:42Z

Quantizing Stable Diffusion 3.5 models to any kind of k-quants results in large files made out of mostly fp16 weights. That's because
a lot of tensors have width 2432 or 7296, wich do not fit in the block size of k quants (256).

https://github.com/leejet/stable-diffusion.cpp/blob/master/model.cpp#L1761

For example, a q3_k quant of SD3.5 Large is 13 842 megabytes (it's 16 460 megabytes for the fp16 model), which would indicate only about 10% of the weights in total got quantized (assuming 3.4375 bits per quantized weight) .

I'm not sure what could be done to fix that. Maybe fallback to the next bigger quant that does the job instead of skipping the tensor altogether?

Edit: I found this PR that adresses a similar issue lin llama.cpp: ggerganov/llama.cpp#2001

stduhpf · 2024-10-24T19:39:44Z

By mixing q2_k with q4_0 as a fallback for the tensors that can't be k-quantized, I can just barely fit SD3.5 Large with a 960x960 image compute buffer in my 8GB vram gpu. Quality is a bit degraded though, even tho it's mostly q4_0.

Edit: just q4_0 works too if I close my browser and all my electron apps to free up vram, and the quality is better. So I guess the best I can use is a mix of q4_k and q4_0

stduhpf mentioned this issue Oct 25, 2024

Convert: mixed k-quant with legacy quant fallback #447

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert: generated K quants are useless for SD3.5 Large models #446

Convert: generated K quants are useless for SD3.5 Large models #446

stduhpf commented Oct 24, 2024 •

edited

Loading

stduhpf commented Oct 24, 2024 •

edited

Loading

Convert: generated K quants are useless for SD3.5 Large models #446

Convert: generated K quants are useless for SD3.5 Large models #446

Comments

stduhpf commented Oct 24, 2024 • edited Loading

stduhpf commented Oct 24, 2024 • edited Loading

stduhpf commented Oct 24, 2024 •

edited

Loading

stduhpf commented Oct 24, 2024 •

edited

Loading