-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include eos_token
and bos_token
from tokenizer_config.json
for chat templating
#5040
Comments
So if I understand correctly, the issue is that |
Okay so the documentation is not exactly clear on this subject. Some models have a clear mapping with Others do not such as phi-2: It looks like the I guess that means some models are doomed to have this shortcoming because of the way they were configured. |
The problem in this issue is partly because of the logic in the GGUF writer. The llama.cpp/gguf-py/gguf/vocab.py Lines 151 to 174 in 381ee19
When For this issue it would help if the conditional was skipped. |
Hi all, I'm the developer at Hugging Face who designed the original spec and implementation for our chat templates - I got linked here from abetlen/llama-cpp-python#1096. Just to clarify the situation with special tokens like Chat template handling all happens inside the apply_chat_template function. The key line is here - the chat template is rendered with the conversation history so far, the If you follow the definition for "bos_token"
"eos_token"
"unk_token"
"sep_token"
"pad_token"
"cls_token"
"mask_token"
"additional_special_tokens" With the exception of |
I think the real issue here is just as @vriesdemichael eloquently pointed out that some models don't have a definitively unique BOS or EOS token. We could read from the For example, "bos_token": "<|endoftext|>",
"eos_token": "<|endoftext|>",
"unk_token": "<|endoftext|>" Phi-2 has the same token defined for BOS, EOS, and UNK. This isn't unique to Phi either. Maybe it's better to read it from the For example, the A thought I had is to simply just use @slaren and @ggerganov would have more of an idea of how this is processed under the hood though. I'm still trying to figure it out. |
Hi @teleprint-me, I wrote a bit about this here: abetlen/llama-cpp-python#1096 (comment) The tl;dr is that models should set The principled approach would be to just read |
@Rocketknight1 Thank you for that clarification! It would be incredibly helpful if the transformer lib offers some sort of expected structure for these various config files. Most config structures do so using a json schema, which also makes it very easy to validate any given config file. |
Still doesnt fix the problem with misconfigured models. But I guess that is best left to the model creators. |
I think that might be difficult, given the diversity of models in the library! However, I do have something that might be interesting to you - we're working on allowing models to store lists of arbitrary stop strings. When this feature is implemented, and when (if) users start adding it to their repos, you can hopefully just read that field to resolve the issue here! huggingface/transformers#28932 |
Implementations typically use Users won't (and often don't) always define these tokens. The phi model by microsoft is a perfect example of this where only the This isn't a fix huggingface can take care of, although I can see potential streamlining and guidance helping with the issue. The only reasonable expectation I can think of is suggesting users utilize some kind of consistent interface, but this can not and will not be guaranteed (I can see some users rolling their eyes at this while others will comply). This doesn't account for the variety of implementations there are for tokenizers either. |
I understand the complexity and definitely don't expect older models to adapt. I merely suggest taking the reins for future implementations as huggingface has done with chat templates in the tokenizer. At some point in the future a reliable set of params which are enforced by hf are desirable for downstream implementations such as llama.cpp, gtpq, etc. before derailing this discussion any further... I believe the problems that can be solved in the gguf-py code are fixed, that means this issue can be closed |
This issue is stale because it has been open for 30 days with no activity. |
Feature Description
Include (at minimum)
eos_token
andbos_token
keys the huggingfacetokenizer_config.json
as gguf metadata keys.Motivation
For chat models these differ from the normal eos and bos tokens and are required to stop the model generating user message tokens.
For example to correctly stop generation of
chatml
chat models you need to watch for the<|im_end|>
token specified in thetokenizer_config.json
below.https://huggingface.co/mlabonne/NeuralBeagle14-7B/blob/main/tokenizer_config.json
however the gguf metadata doesn't include this token
using the chat_format alone without the
eos_token
causes generation to continue incorrectly past the end of the assistant respone.Possible Implementation
I'm not too familiar with the
gguf-py
package but I think adding these to theSpecialVocab
class and updating the_try_load_from_tokenizer_json
would be a start. Not sure what the correct gguf key should be for these so as to not create confusion.The text was updated successfully, but these errors were encountered: