-
Notifications
You must be signed in to change notification settings - Fork 474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Qwen2.5 14B models are ... sometimes? ... having their token vocabulary truncated down to 'actual'? #425
Comments
When using I can look at adding an option for padding the size up to the nearest multiple of 32 if that's causing an issue. |
Would be a helpful option -- it's causing some downstream effects in other paradigms (like getting into unsloth patching that isn't fully calibrated to the model type, for some reason) and preventing merges with other Qwen 2.5 models. |
I've added this option in #465 - for Qwen2.5 models setting |
Early indications are that it's working! Merging two models that were at the truncated size brought it back up to 152064 and it evaluates well. If those were just padding in the first place it should be fine. |
有合并过glm4模型吗 |
I have not tried merging any glm4 models. It looks like they have a padded_vocab_size rather than a vocab_size? |
Actual example of a merge that produced this issue:
Additional relevant information is that if I get the tokenizer vocab size with
tokenizer_vocab_size = len(tokenizer)
from ... any Qwen 2.5 14B model, I get the151665
number rather than the152064
number that's in the config.json.I don't fully understand why it's trimming the vocabulary size and embedding layer down in this merge method but none of the others, but it's annoying for compatibility and specifying the tokenizer_source doesn't seem to address the issue (presumably because the tokenizer doesn't actually have 152064 worth of vocabulary)
The text was updated successfully, but these errors were encountered: