You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+1 Exactly that, just being able to choose one smaller model would be already great. Being able to pick and choose from a model zoo even better so people can choose their performance vs. hardware requirements. Personally I would like to run this on my laptop RTX with 8GB VRAM where I know that some small LLMs perform already surprisingly well.
Hi
Thanks for this good work.,
Is there a method to add a quantized LLM model so that it can be run on a GPU with under 10GB of VRAM, making it accessible to more users?
Note:
I am running llama3 from the ollama tool on my laptop, so once we have this option in this repo, I can test the same on my laptop itself.
Thank you
The text was updated successfully, but these errors were encountered: