-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Issues: google/sentencepiece
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
BPE Dropout tokenizer generates unk at the beginning of sequence
#1071
opened Dec 1, 2024 by
AnnaLebedeva
The pip command to install the SentencePiece Python module fails.
#1070
opened Nov 20, 2024 by
tprrt
Asan detects memory leak in sentencepiece/_sentencepiece.cpython-312-x86_64-linux-gnu.so+0x6f7f4
#1065
opened Nov 4, 2024 by
renxida
Compatibility Issue when using v0.2.0 with transformers and tensorflow
#1060
opened Oct 2, 2024 by
aws-tianquaw
With unigram algorithm, constant piece at end of each sentences does not become a token
bug
#1047
opened Aug 29, 2024 by
jogardi
Error Attribute Error: type object 'SentencePieceTrainer' has no attribute 'train'. Did you mean: 'Train'?
#1046
opened Aug 23, 2024 by
bop578530
When I set SPM_PROTOBUF_PROVIDER to "package" in CMakeLists.txt, the compilation fails.
#1029
opened Jun 25, 2024 by
hhxdestiny
High frequency token segmented into letter sequence when input is a tsv file
bug
#967
opened Jan 30, 2024 by
TingxunShi
A recent EMNLP work to share about task-adaptive tokenization with variable segmentation
#924
opened Oct 24, 2023 by
lsy641
Unexpected behavior with sampling of repeated character sequence.
#904
opened Aug 14, 2023 by
kellymarchisio
Previous Next
ProTip!
Follow long discussions with comments:>50.