Skip to content

Commit

Permalink
Handle when precompiled charsmap is empty (#1308)
Browse files Browse the repository at this point in the history
* Handle when precompiled charsmap is empty

* Black

---------

Co-authored-by: Nicolas Patry <[email protected]>
  • Loading branch information
kellymarchisio and Narsil authored Jul 31, 2023
1 parent c2664ae commit efea6c7
Showing 1 changed file with 9 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -173,12 +173,15 @@ def from_spm(filename: str):

tokenizer = Tokenizer(Unigram(vocab, unk_id, byte_fallback))

tokenizer.normalizer = normalizers.Sequence(
[
normalizers.Precompiled(precompiled_charsmap),
normalizers.Replace(Regex(" {2,}"), " "),
]
)
if precompiled_charsmap:
tokenizer.normalizer = normalizers.Sequence(
[
normalizers.Precompiled(precompiled_charsmap),
normalizers.Replace(Regex(" {2,}"), " "),
]
)
else:
tokenizer.normalizer = normalizers.Sequence([normalizers.Replace(Regex(" {2,}"), " ")])
tokenizer.pre_tokenizer = pre_tokenizers.Metaspace(replacement=replacement, add_prefix_space=add_prefix_space)
tokenizer.decoder = decoders.Metaspace(replacement=replacement, add_prefix_space=add_prefix_space)

Expand Down

0 comments on commit efea6c7

Please sign in to comment.