Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to fix Baichuan2 models by using vocab size in config.json #3299

Merged
merged 1 commit into from
Oct 4, 2023

Conversation

KerfuffleV2
Copy link
Collaborator

Use local GGUF package when possible in Baichuan converter


This basically just uses the same approach as #2914

I tested converting https://huggingface.co/baichuan-inc/Baichuan2-7B-Base - seems to work fine now. I don't know if this breaks Baichuan1. One would assume one can't go wrong using the vocab size in the config, but who knows?

While I was in the neighborhood I updated the gguf import to look for the local version like the other scripts.

Hopefully fixes #3270

Use local GGUF package when possible in Baichuan converter
@akawrykow
Copy link
Contributor

Seems reasonable to me - same thing we did in #2914

@KerfuffleV2
Copy link
Collaborator Author

This pull does fix the vocab issue, but unfortunately it's not enough to get reasonable results from the 13B model. Also convert.py works for converting it, except for setting the architecture and looking for the correct context length key. So it may make more sense to update convert.py rather than fixing the Baichuan-specific conversion script (which could just be removed).

Anyway, this pull is better than the status quo but may not be the best approach to solving the issue. I still don't know what the issue with Baichuan2 13B is, I suspect it may be something like variations in the ALiBi operation it wants.

@ggerganov
Copy link
Owner

ggerganov commented Sep 30, 2023

Is Baichuan2 13B different than the Baichuan 13B that we added support for some time ago?
Also, have you tried running the Baichuan 13B that was initially supported, after merging #3228?
At first, I thought that #3228 would break support, but now I think it should actually still work correctly and looking to verify

@KerfuffleV2
Copy link
Collaborator Author

Is Baichuan2 13B different than the Baichuan 13B that we added support for some time ago?

Well, there's this: https://github.com/baichuan-inc/Baichuan2/blob/main/README_EN.md#migrating-inference-optimizations-from-baichuan-1-to-baichuan-2

So one difference is lm_head isn't already normalized. Doing that didn't seem to make a difference for the issues I mentioned.

Also, have you tried running the Baichuan 13B that was initially supported, after merging #3228?

You mean Baichuan1 13B? I haven't tried any Baichuan1 models so far. I'll try to check that later today.

@ggerganov
Copy link
Owner

You mean Baichuan1 13B?

Yes, support for Baichuan 13B was added in #3009 and allegedly it was working, though I haven't tried it.

@KerfuffleV2
Copy link
Collaborator Author

@ggerganov Sorry it's a bit late, but I got a chance to test Baichuan1 13B. Unfortunately, it seems like neither Baichuan1 13B or Baichuan2 13B work at all currently. It just immediately hits an assert and dies:

llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: kv self size  =  400.00 MB
llama_new_context_with_model: compute buffer total size = 262.38 MB
llama_new_context_with_model: VRAM scratch buffer: 256.50 MB
llama_new_context_with_model: total VRAM used: 6006.66 MB (model: 5750.16 MB, context: 256.50 MB)
GGML_ASSERT: ggml.c:12913: ne1 + n_past == ne0

Seems to happen when prompt ingestion starts. Exactly the same error for both Baichuan1 and Baichuan2. Probably an issue with how the Baichuan graph is set up in llama.cpp?

I didn't test with the 7B models again, I'd have to redownload and convert it. It's likely this particular issue would affect them though. Previously the 7B Baichuan2 seemed to work perfectly.

(Note: I converted the Baichuan1 model using the conversion script in master, don't think this is a conversion issue though.)

@ggerganov
Copy link
Owner

@KerfuffleV2 Try to just delete the assert on ggml.c:12913 and see if it works. It was deleted in #3329 as well and the alibi seems to be working

@KerfuffleV2
Copy link
Collaborator Author

Try to just delete the assert on ggml.c:12913 and see if it works.

It seems to run with that change. The output (like Baichuan2 13B) is very repetitive though:

$ ./main -m /blah/baichuan1-13b.gguf -p 'Once upon a time there was a little fox' -ngl 18 --ignore-eos --temp 0.0
[...]
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.000000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 Once upon a time there was a little fox.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt think that he could be as good at being clever or wise like themselves but when it came to running fast across fields in order to catch rabbits then there was no doubt.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt think that he could be as good at being clever or wise like themselves but when it came to running fast across fields in order to catch rabbits then there was no doubt.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt think that he could be as good at being clever or wise like themselves but when it came to running fast across fields in order to catch rabbits then there was no doubt.
The Fox said, "I am the best of all my kind!" The other animals were not so sure about this and they didnt

Or with Chinese:

$ ./main -m /blah/baichuan1-13b.gguf -p '从前有一只小狐狸,他' -ngl 18 --ignore-eos --temp 0.0
[...]
 从前有一只小狐狸,他长大了,长高了。我长高了,变大了。我的身体变得更大更重。我的体重和身高都在增加。我的体重在增加,我的身高也在增长。我的体重在增加,我的身高也在增长。我的体重在增加,我的身高也在增长。我的体重在增加,我的身高也在增长。

It's basically just repeating "My weight increases.", "My size increases", "My weight is increasing", "My size is increasing". Baichuan2 13B is worse, same prompt it just outputs "Once upon a time there was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who was a little fox who" or "从前有一只小狐狸,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他,他". (I don't think this is worse than before, I just mean it seems worse than Baichuan1 13B.)

If Baichuan1 13B behaved the same before the recent changes and people were actually using it to good effect... Well all I can say is they appear to know something I don't!

@ggerganov
Copy link
Owner

I think the repetition is normal for --temp 0.0 and such short prompt. Best thing would be to just run a perplexity on wiki text and make sure it is some reasonable number - i.e. less than 10 for example

@KerfuffleV2
Copy link
Collaborator Author

I guess it's fine.

Baichuan1 13B Q4_K_M:

[1]7.8807,[2]10.3365,[3]11.3285,[4]12.7045,[5]13.3817,[6]13.1125,[7]13.6777,[8]13.8214,[9]14.4119,[10]14.7038,[11]15.0939,[12]15.0929,[13]14.8181,[14]14.8787,[15]15.5579

Baichuan2 13B Q6_K:

[1]6.9463,[2]8.8442,[3]11.5516,[4]12.2726,[5]10.9208,[6]11.5958,[7]11.9222,[8]11.2917,[9]11.9582,[10]12.1829,[11]11.9845,[12]11.9964,[13]11.7915

I've never seen models that weren't broken just repeat the same word over and over but I also haven't messed with small models in a while.

Hmm, I'm not sure if Q6_K model had the lm_head normalize thing applied to it though. I will have to mess around and reconvert it, but unfortunately I probably won't get a chance to do that today.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for helping out with this. Think we can merge this

@KerfuffleV2
Copy link
Collaborator Author

Ok, thanks for helping out with this.

No problem.

I tested without the normalizing stuff. Seems to work fine.

Should

diff --git a/ggml.c b/ggml.c
index bf1426d..b24f7c3 100644
--- a/ggml.c
+++ b/ggml.c
@@ -12905,7 +12905,6 @@ static void ggml_compute_forward_alibi_f32(
     //const int nb3 = src0->nb[3];
 
     GGML_ASSERT(nb0 == sizeof(float));
-    GGML_ASSERT(ne1 + n_past == ne0);
     GGML_ASSERT(n_head == ne2);
 
     // add alibi to src0 (KQ_scaled)

also be included here since it's necessary to actually use the model after conversion? (If not, we can go ahead and merge since I don't have any other changes planned.)

@ggerganov ggerganov merged commit 019ba1d into ggerganov:master Oct 4, 2023
@ggerganov
Copy link
Owner

I have added this change through the #3329 PR

joelkuiper added a commit to vortext/llama.cpp that referenced this pull request Oct 5, 2023
…example

* 'master' of github.com:ggerganov/llama.cpp: (24 commits)
  convert : fix Baichuan2 models by using vocab size in config.json (ggerganov#3299)
  readme : add project status link
  ggml : fix build after ggerganov#3329
  llm : add Refact model (ggerganov#3329)
  sync : ggml (conv 1d + 2d updates, UB fixes) (ggerganov#3468)
  finetune : readme fix typo (ggerganov#3465)
  ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (ggerganov#3453)
  main : consistent prefix/suffix coloring (ggerganov#3425)
  llama : fix session saving/loading (ggerganov#3400)
  llama : expose model's rope_freq_scale in the API (ggerganov#3418)
  metal : alibi for arbitrary number of heads (ggerganov#3426)
  cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (ggerganov#3273)
  Work on the BPE tokenizer (ggerganov#3252)
  convert : fix vocab size when not defined in hparams (ggerganov#3421)
  cmake : increase minimum version for add_link_options (ggerganov#3444)
  CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402)
  gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408)
  gguf : general usability improvements (ggerganov#3409)
  cmake : make CUDA flags more similar to the Makefile (ggerganov#3420)
  finetune : fix ggerganov#3404 (ggerganov#3437)
  ...
yusiwen pushed a commit to yusiwen/llama.cpp that referenced this pull request Oct 7, 2023
…erganov#3299)

Use local GGUF package when possible in Baichuan converter
@KerfuffleV2 KerfuffleV2 deleted the fix-baichuan2 branch November 17, 2023 03:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

when will baichuan2 be supported?
3 participants