You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I see that the tokenizer.json looks different when saved with environment setup in step#1 and step#2 (see system info section for environment details)
When I looked further into it, I see that add_prefix_space which is present in older packages is no longer present. And 2 new fields got introduced prepend_scheme and split. I believe the change in contract caused the failure.
Couple of questions in this context
The newer changes to tokenizer + transformer packages are expected to be backward compatible?
Will there be any tokenization differences when
I save the tokenizer with env in step#1 and tokenize a dataset
I save the tokenizer with env in step#2 and tokenize a dataset
I will launch tests on question#2 and share the output here, but in case the answer is already known, please let me know.
The text was updated successfully, but these errors were encountered:
Hey! What you are asking for is Forward Compatibility not Backward Compatibility.
The issue lies with the tokenizers version, not transformers. And as such this is expected. You can probably hack to use tokenizers 0.19 but an older version of transformers.
System Info
Who can help?
@ArthurZucker @you
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Expected behavior
The step#2 is failing with error
I see that the
tokenizer.json
looks different when saved with environment setup in step#1 and step#2 (see system info section for environment details)When I looked further into it, I see that
add_prefix_space
which is present in older packages is no longer present. And 2 new fields got introducedprepend_scheme
andsplit
. I believe the change in contract caused the failure.I save the tokenizer with env in step#1 and tokenize a dataset
I save the tokenizer with env in step#2 and tokenize a dataset
I will launch tests on question#2 and share the output here, but in case the answer is already known, please let me know.
The text was updated successfully, but these errors were encountered: