Releases: google/sentencepiece
v0.1.81
v0.1.8
Feature: Get rid of the dependency to external protobuf
Feature: added (Encode|Decode)AsSerializedProto interface so Python module can get full access to the SentencePieceText proto including the byte offsets/aligments
Feature: added --treat_whitespace_as_suffix option to make _ be a suffix of word.
Feature: Added normalization rules to remove control characters in the default nmt_* normalizers
Minor fix: simplify the error messager
Minor fix: do not emit full source path in LOG(INFO)
For more detail: v0.1.7...v0.1.8
v0.1.7
Deprecated: --mining_sentence_size
and --training_sentence_size
. Load all sentences by default. --input_sentence_size
can be specified to limit the sentences to be loaded
Feature: added --unk_piece/--bos_piece/--eos_piece/--pad_piece
flags to change the surface representations of these special symbols.
Bug fix: added third_party directory for cmake's subdirectory.
For more detail:
v0.1.6...v0.1.7
v0.1.6pre1
SentencePiece Windows release
v0.1.6
- Bug fix: do not apply normalization to the user-defined-symbols.
- Bug fix: stop adding extra whitespaces before user-defined symbols
- Feature: added --minloglevel flag to suppress LOG(INFO) message
- Feature: added --split_by_number flag to allow numbers to attach other symbols.
- Feature: added --max_sentence_length flag to control the maximum byte length of input sentence for training.
- used tf-versioned so file for _sentencepiece_processor_ops to minimize ABI incompatibility for tf wapper.
For more detail: v0.1.5...master