-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to fix Baichuan2 models by using vocab size in config.json #3299
Conversation
Use local GGUF package when possible in Baichuan converter
Seems reasonable to me - same thing we did in #2914 |
This pull does fix the vocab issue, but unfortunately it's not enough to get reasonable results from the 13B model. Also Anyway, this pull is better than the status quo but may not be the best approach to solving the issue. I still don't know what the issue with Baichuan2 13B is, I suspect it may be something like variations in the ALiBi operation it wants. |
Is Baichuan2 13B different than the Baichuan 13B that we added support for some time ago? |
Well, there's this: https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#migrating-inference-optimizations-from-baichuan-1-to-baichuan-2 So one difference is
You mean Baichuan1 13B? I haven't tried any Baichuan1 models so far. I'll try to check that later today. |
Yes, support for Baichuan 13B was added in #3009 and allegedly it was working, though I haven't tried it. |
@ggerganov Sorry it's a bit late, but I got a chance to test Baichuan1 13B. Unfortunately, it seems like neither Baichuan1 13B or Baichuan2 13B work at all currently. It just immediately hits an assert and dies:
Seems to happen when prompt ingestion starts. Exactly the same error for both Baichuan1 and Baichuan2. Probably an issue with how the Baichuan graph is set up in I didn't test with the 7B models again, I'd have to redownload and convert it. It's likely this particular issue would affect them though. Previously the 7B Baichuan2 seemed to work perfectly. (Note: I converted the Baichuan1 model using the conversion script in |
@KerfuffleV2 Try to just delete the assert on ggml.c:12913 and see if it works. It was deleted in #3329 as well and the alibi seems to be working |
It seems to run with that change. The output (like Baichuan2 13B) is very repetitive though:
Or with Chinese:
It's basically just repeating "My weight increases.", "My size increases", "My weight is increasing", "My size is increasing". Baichuan2 13B is worse, same prompt it just outputs "Once upon a time there was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who" or "从前有一只小狐狸,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他". (I don't think this is worse than before, I just mean it seems worse than Baichuan1 13B.) If Baichuan1 13B behaved the same before the recent changes and people were actually using it to good effect... Well all I can say is they appear to know something I don't! |
I think the repetition is normal for |
I guess it's fine. Baichuan1 13B Q4_K_M:
Baichuan2 13B Q6_K:
I've never seen models that weren't broken just repeat the same word over and over but I also haven't messed with small models in a while. Hmm, I'm not sure if Q6_K model had the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for helping out with this. Think we can merge this
No problem. I tested without the normalizing stuff. Seems to work fine. Should diff --git a/ggml.c b/ggml.c
index bf1426d..b24f7c3 100644
--- a/ggml.c
+++ b/ggml.c
@@ -12905,7 +12905,6 @@ static void ggml_compute_forward_alibi_f32(
//const int nb3 = src0->nb[3];
GGML_ASSERT(nb0 == sizeof(float));
- GGML_ASSERT(ne1 + n_past == ne0);
GGML_ASSERT(n_head == ne2);
// add alibi to src0 (KQ_scaled) also be included here since it's necessary to actually use the model after conversion? (If not, we can go ahead and merge since I don't have any other changes planned.) |
I have added this change through the #3329 PR |
…example * 'master' of github.com:ggerganov/llama.cpp: (24 commits) convert : fix Baichuan2 models by using vocab size in config.json (ggerganov#3299) readme : add project status link ggml : fix build after ggerganov#3329 llm : add Refact model (ggerganov#3329) sync : ggml (conv 1d + 2d updates, UB fixes) (ggerganov#3468) finetune : readme fix typo (ggerganov#3465) ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (ggerganov#3453) main : consistent prefix/suffix coloring (ggerganov#3425) llama : fix session saving/loading (ggerganov#3400) llama : expose model's rope_freq_scale in the API (ggerganov#3418) metal : alibi for arbitrary number of heads (ggerganov#3426) cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (ggerganov#3273) Work on the BPE tokenizer (ggerganov#3252) convert : fix vocab size when not defined in hparams (ggerganov#3421) cmake : increase minimum version for add_link_options (ggerganov#3444) CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402) gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408) gguf : general usability improvements (ggerganov#3409) cmake : make CUDA flags more similar to the Makefile (ggerganov#3420) finetune : fix ggerganov#3404 (ggerganov#3437) ...
…erganov#3299) Use local GGUF package when possible in Baichuan converter
Use local GGUF package when possible in Baichuan converter
This basically just uses the same approach as #2914
I tested converting https://huggingface.co/baichuan-inc/Baichuan2-7B-Base - seems to work fine now. I don't know if this breaks Baichuan1. One would assume one can't go wrong using the vocab size in the config, but who knows?
While I was in the neighborhood I updated the
gguf
import to look for the local version like the other scripts.Hopefully fixes #3270