You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I train a tokenizer and set 'add_prefix_space' to 'False', How can I ensure that BBPE tokenizers correctly handle space division when decoding a sequence ?
The text was updated successfully, but these errors were encountered:
enze5088
changed the title
When add_prefix_space=False, how to add space when decode English sentence.
When decoding an English sentence with the 'add_prefix_space' parameter set to 'False,' how can I add spaces?
Oct 9, 2023
Hey! could you elaborate on How can I ensure that BBPE tokenizers correctly handle space division when decoding a sequence what is your concern / issue here?
I aim to develop a multilingual tokenizer. However, when processing multilingual text, especially text lacking space-based segmentation, like Chinese, it occasionally introduces erroneous spaces before certain characters. If I add whitespace in the pre-tokenizer, the tokenizer will not correctly preserve the spaces during the decoding of generated English text."
I train a tokenizer and set 'add_prefix_space' to 'False', How can I ensure that BBPE tokenizers correctly handle space division when decoding a sequence ?
The text was updated successfully, but these errors were encountered: