-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Broken tokenizer in Yi-34B merge #428
Comments
Hi! What do you mean by broken tokenizer? I did not sure if my tokenizer were broken. I got token in texts like "<|unused115|>", "<|unused026|>" in my message after merging model. |
I was unable to convert the model to GGUF and quantize because of an error about token ids being out of range. There were tokens numbered 64000 & 64001 when the max was 63999. I too see a lot of unused tokens in the config, but I don't know if that's anything to worry about. So far, I haven't seen these show up in generated text. |
After merging in #430 I'm able to merge the config you posted and successfully quantize the output model. Please do let me know if this recurs or you run into any similar problems! |
I've been trying to merge two Yi-34B based builds using Arcee's hosted mergekit. The merge seems to be successful, with no errors shown, but no matter what tokenizer source I use, the result seems broken and I'm unable to convert to GGUF. I know there used to be a bug related to this, but I thought it was fixed.
This is the most recent YAML I used:
The text was updated successfully, but these errors were encountered: