Question about model choices #22

TC72 · 2023-10-24T08:59:25Z

I'm still learning about running models locally. Could I ask how you decide which version of each model you will run?
I see different versions like Q5_K_S and Q4_K_M. I understand the main driver is memory when choosing between 7B, 13B, 34B, etc but how do you decide which quantization is right?

I'm on a 32GB M2 Max MacBook Pro.

fletchgqc · 2023-10-27T17:20:23Z

I don't really know, but I think that normally people choose the biggest one which works on their computer. How did it work out for you?

TC72 · 2023-10-30T14:01:51Z

If you look at tools like LM Studio they mark some as recommended.
They say anything ending _0 like codellama-13b-instruct.Q4_0.gguf is a legacy quantization method.
For _K_S they don't give an opinion but _K_M do tend to be shown as recommended.

They also mention Q2 and Q3 models having a loss of quality.

So for me the sweetspot seems to be Q4_K_M and Q5_K_M.
I might try writing some kind of evaluation script to compare those based on quality of response and time taken.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about model choices #22

Question about model choices #22

TC72 commented Oct 24, 2023 •

edited

Loading

fletchgqc commented Oct 27, 2023

TC72 commented Oct 30, 2023

Question about model choices #22

Question about model choices #22

Comments

TC72 commented Oct 24, 2023 • edited Loading

fletchgqc commented Oct 27, 2023

TC72 commented Oct 30, 2023

TC72 commented Oct 24, 2023 •

edited

Loading