[ENHANCEMENT] Add Support for 5-bit quantized models #84

TreesPlay · 2023-05-08T04:44:39Z

Hi, I don't know much about AI. But I've seen a lot models popping up on HuggingFace recently advertising 5-bit quantisation. Here is an example: https://huggingface.co/TheBloke/OpenAssistant-SFT-7-Llama-30B-GGML

I can only load q4_0 and q4_1 models. The newer q4_2, q5_0 and q5_1 don't work. Since I recently upgraded my RAM to 64GB to run LLMs on my machine I'd like to be able to use the newer models.

TreesPlay · 2023-05-08T04:47:11Z

For context I use the latest release. Since it was last updated a month ago I don't know if the latest commits already added support for 5-bit quantisation.

chmodseven · 2023-05-20T21:08:58Z

I have been using some q5_1 models with no problems after compiling llama.cpp and putting the resulting main.exe in place of Alpaca Electron's chat.exe. You can follow "(OPTIONAL) Building llama.cpp from source" in the README here, although note that for me the second cmake didn't work and should be "cmake --build . --config Release" per the llama.cpp README.

TreesPlay added the enhancement New feature or request label May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCEMENT] Add Support for 5-bit quantized models #84

[ENHANCEMENT] Add Support for 5-bit quantized models #84

TreesPlay commented May 8, 2023

TreesPlay commented May 8, 2023

chmodseven commented May 20, 2023

[ENHANCEMENT] Add Support for 5-bit quantized models #84

[ENHANCEMENT] Add Support for 5-bit quantized models #84

Comments

TreesPlay commented May 8, 2023

TreesPlay commented May 8, 2023

chmodseven commented May 20, 2023