Replies: 2 comments 6 replies
-
For point 2. |
Beta Was this translation helpful? Give feedback.
-
Hello, I want to know if you have found the answer to this question. I am also confused. Different loading methods mean different effects. I am curious whether the results are different because of the different models. |
Beta Was this translation helpful? Give feedback.
-
I have sort of a two part question and discussion.
1. If we take any GPTQ model lets say Wizard Vicuna 13B.
I was told that if we quantize this model into five different final models.
In order for their Accuracy or perplexity whatever you want to call it.
They appear something like this.
--Best--
GGML Wizard Vicuna 13B 5_1
GGML Wizard Vicuna 13B 5_0
GPTQ Wizard Vicuna 13B 4bit
GGML Wizard Vicuna 13B 4_1
GGML Wizard Vicuna 13B 4_0
--Worst--
Of course there are other variants. But lets just use those five.
Is there any truth to this? Have you heard and or seen confirmation to yes or no to this?
The difference may be slim to none or marginally noticeable I guess depending on the model and prompt etc.
2. Now lets focus only on GPTQ Wizard Vicuna 13B 4bit.
Does anyone know is there is a difference in perplexity/accuracy between AutoGPTQ, GPTQ-for-LLaMa or ExLlama?
What have you heard? What does the Data show? How sure are you? Sharing source would be awesome.
For example I've only heard rumours. I can confirm that certain modes or models are faster or slower of course. But I have not personally checked accuracy or read anywhere that AutoGPT is better or worse in accuracy VS GPTQ-forLLaMA. But I did hear a few people say that GGML 4_0 is generally worse than GPTQ. And GGML 5_0 is generally better than GPTQ. However I have no real proof to back this up. Which is why I'm asking for opinions, facts and theories? If there is a difference between back-ends like these three modes is it like you know virtually nothing? Like 0.001% etc. If you have anything else relevant to add even if it's not directly related? Let us know. Thanks.
Just to be clear I am talking about the same exact models being compared. Same model name and Paramaters. I understand that in some cases it might actually depend on the model too. But as a rule of thumb what is the expected conclusion.
Beta Was this translation helpful? Give feedback.
All reactions