Skip to content

Commit

Permalink
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Manual patching for BERT-like tokenizers
Browse files Browse the repository at this point in the history
Rocketknight1 committed Feb 26, 2024
1 parent 76d2577 commit 3dd5dec
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/transformers/generation/stopping_criteria.py
Original file line number Diff line number Diff line change
@@ -254,6 +254,8 @@ def _stop_string_get_matching_positions(
def _cleanup_token(token: str) -> str:
if token[0] in ["▁", "Ġ"]:
token = " " + token[1:]
elif token[0] == "##":
token = token[2:]
return token

reversed_filtered_tok_list = [_cleanup_token(token)[::-1] for token in tok_list]

0 comments on commit 3dd5dec

Please sign in to comment.