Adding quantized Model #26

SaddamBInSyed · 2024-06-27T14:51:41Z

Hi
Thanks for this good work.,

Is there a method to add a quantized LLM model so that it can be run on a GPU with under 10GB of VRAM, making it accessible to more users?

Note:
I am running llama3 from the ollama tool on my laptop, so once we have this option in this repo, I can test the same on my laptop itself.

Thank you

SaschaHornauer · 2024-07-22T11:25:46Z

+1 Exactly that, just being able to choose one smaller model would be already great. Being able to pick and choose from a model zoo even better so people can choose their performance vs. hardware requirements. Personally I would like to run this on my laptop RTX with 8GB VRAM where I know that some small LLMs perform already surprisingly well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding quantized Model #26

Adding quantized Model #26

SaddamBInSyed commented Jun 27, 2024

SaschaHornauer commented Jul 22, 2024

Adding quantized Model #26

Adding quantized Model #26

Comments

SaddamBInSyed commented Jun 27, 2024

SaschaHornauer commented Jul 22, 2024