v2.5.1
Version 2.5.1 (2021-09-02)
-
Import ftfy and use its
uncurl_quotes
method to turn curly quotes into
straight ones, providing consistency with multiple forms of apostrophes. -
Set minimum version requierements on
regex
,jieba
, andlangcodes
so that tokenization will give consistent results. -
Work around an inconsistency in the
msgpack
API around
strict_map_key=False
.
Version 2.5 (2021-04-15)
- Incorporate data from the OSCAR corpus.