Skip to content

v0.2.0pre1

Pre-release
Pre-release
Compare
Choose a tag to compare
@taku910 taku910 released this 16 Jan 06:37
· 75 commits to master since this release

Major changes

N/A

New features

  • [ALL] Added SentencePieceNormalizer class in C++/Python. It supports almost the equivalent feature of spm_normalize. Python Sample C++ Sample
  • [ALL] Added SentencePieceProcessor::Normalize method in C++/Python Python Sample
    C++ Sample
  • [ALL] Added functionality to override the normalization spec before the processing. Python Sample

Bug fixes & minor changes

  • Introduce better support of using external abseil and protobuf #869
  • Build universal binary in OSX release package #892
  • Add the set_min_log_level function to python to change the loglevel from the python wrapper. #893
  • Uses the logsumexp techniques in marginal probabilities of n-best tokenization to avoid underflow.
  • Support Python 3.12 #932
  • Improves the thread utilization in batch encoding/decoding.
  • Fix nasty bug in BPE position encoding.
  • Fix bugs in the handling of duplicated bigrams