Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about model choices #22

Open
TC72 opened this issue Oct 24, 2023 · 2 comments
Open

Question about model choices #22

TC72 opened this issue Oct 24, 2023 · 2 comments

Comments

@TC72
Copy link

TC72 commented Oct 24, 2023

I'm still learning about running models locally. Could I ask how you decide which version of each model you will run?
I see different versions like Q5_K_S and Q4_K_M. I understand the main driver is memory when choosing between 7B, 13B, 34B, etc but how do you decide which quantization is right?

I'm on a 32GB M2 Max MacBook Pro.

@fletchgqc
Copy link

I don't really know, but I think that normally people choose the biggest one which works on their computer. How did it work out for you?

@TC72
Copy link
Author

TC72 commented Oct 30, 2023

If you look at tools like LM Studio they mark some as recommended.
They say anything ending _0 like codellama-13b-instruct.Q4_0.gguf is a legacy quantization method.
For _K_S they don't give an opinion but _K_M do tend to be shown as recommended.

They also mention Q2 and Q3 models having a loss of quality.

So for me the sweetspot seems to be Q4_K_M and Q5_K_M.
I might try writing some kind of evaluation script to compare those based on quality of response and time taken.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants