fix T5 tokenizer loading #3544

helpmefindaname · 2024-09-06T13:04:03Z

The T5 tokenizer fast is required to load from the slow tokenizer when add_prefix_space is True. See the code
I am not sure why exactly this is implemented that way, as the traces do not show a specific reasoning, but with a small exception on our side, we can easily support all T5 tokenizers.

DhruvSondhi · 2024-09-26T16:11:20Z

Hello,

Yes, this indeed fixes the problem. I was able to load the best model from the set of models trained. Hoping that this could be merged into the master branch as soon as possible. Thanks, @helpmefindaname!

alanakbik · 2024-10-11T11:07:30Z

@helpmefindaname thanks for fixing this!

fix T5 tokenizer loading

588279f

helpmefindaname added bug Something isn't working labels Sep 13, 2024

helpmefindaname requested a review from alanakbik September 27, 2024 12:32

alanakbik merged commit 2993108 into master Oct 11, 2024
1 check passed

alanakbik deleted the fix-t5-tokenizer branch October 11, 2024 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix T5 tokenizer loading #3544

fix T5 tokenizer loading #3544

helpmefindaname commented Sep 6, 2024

DhruvSondhi commented Sep 26, 2024

alanakbik commented Oct 11, 2024

fix T5 tokenizer loading #3544

fix T5 tokenizer loading #3544

Conversation

helpmefindaname commented Sep 6, 2024

DhruvSondhi commented Sep 26, 2024

alanakbik commented Oct 11, 2024