We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I have some id values and I want to train them with bpe.The following is an example of the id value.
26865, 5412, 26865, 26865, 26865, 26865, 5412, 5412, 25283, 26865, 3395, 26865, 3395, 19440, 25283, 3395, 24032, 1175, 3395, 3395, 3395, 26865, 1175, 26865, 15807, 15807, 27062, 27062, 26865, 4759, 26865, 26865, 27062, 1175, 1175, 1175, 382, 382, 382, 382, 27474, 23834, 29768, 11946, 11946, 27474, 17279
I want to extract the class [26865, 26865, ] as a vocabulary.
The text was updated successfully, but these errors were encountered:
If I use bpe, split_by_num will truncate the id value regardless of whether split_by_whitespace is selected or not. print(sp.id_to_piece(111)) #65, 26
Sorry, something went wrong.
https://github.com/google/sentencepiece/blob/master/doc/options.md#:~:text=%2D%2Dsplit_by_number%20(split%20tokens%20by%20numbers%20(0%2D9))%20%20type%3A%20bool%20default%3A%20true%0A%20%20%20%2D%2Dsplit_by_whitespace%20(use%20a%20white%20space%20to%20split%20sentence%20pieces)%20%20type%3A%20bool%20default%3A%20true%0A%20%20%20%2D%2Dsplit_digits%20(split%20all%20digits%20(0%2D9)%20into%20separate%20pieces)%20%20type%3A%20bool%20default%3A%20false
@azimjonn Could you give detailed configuration? The URL you gave is the default configuration.
No branches or pull requests
I have some id values and I want to train them with bpe.The following is an example of the id value.
I want to extract the class [26865, 26865, ] as a vocabulary.
The text was updated successfully, but these errors were encountered: