You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tokenizer of one of the parents, Unslop-Nemo-v4.1 is broken too. So I instead used the tokenizer from the previous version Unslop-Nemo-v2. I also tried simply using the other model's tokenizer — it breaks too. Oobabooga Textgeneration UI simply can't run it and breaks while trying to initialize the tokenizer. I've also merged other models and the tokenizers are all broken too.
Tokenizers that work are around 9MB, while broken ones are around 17MB.
This reddit thread faces the same problem, which suggests it might be related to Transformers: https://www.reddit.com/r/LocalLLaMA/comments/1gwyuyg/beware_of_broken_tokenizers_learned_of_this_while/
In the new tokenizer configuration, I noticed pad_to_multiple_of:, could this fix it, maybe? Or is it something completely different?
The text was updated successfully, but these errors were encountered:
This isn't actually an issue with mergekit - in huggingface/tokenizers#909 the serialization format for merges was changed. If you upgrade tokenizers and transformers in your webui environment it should be able to load these new-format tokenizers.
Alternatively, you can downgrade transformers to 4.44.2 and tokenizers to 0.19.1 in your mergekit environment. This will make mergekit output tokenizers in the old format. Unfortunately this will mean you won't be able to merge models that use the new tokenizer format. There's kind of no perfect option until the entire ecosystem supports the new one, unfortunately.
Greetings. I'm a novice to mergekit, so I still don't understand most of the technical terms.
I created a merge with the following config:
The tokenizer of one of the parents, Unslop-Nemo-v4.1 is broken too. So I instead used the tokenizer from the previous version Unslop-Nemo-v2. I also tried simply using the other model's tokenizer — it breaks too. Oobabooga Textgeneration UI simply can't run it and breaks while trying to initialize the tokenizer. I've also merged other models and the tokenizers are all broken too.
Tokenizers that work are around 9MB, while broken ones are around 17MB.
This reddit thread faces the same problem, which suggests it might be related to Transformers: https://www.reddit.com/r/LocalLLaMA/comments/1gwyuyg/beware_of_broken_tokenizers_learned_of_this_while/
In the new tokenizer configuration, I noticed
pad_to_multiple_of:
, could this fix it, maybe? Or is it something completely different?The text was updated successfully, but these errors were encountered: