Skip to content

Commit

Permalink
Manual patching for BERT-like tokenizers
Browse files Browse the repository at this point in the history
  • Loading branch information
Rocketknight1 committed Feb 26, 2024
1 parent a694827 commit 25ef298
Showing 1 changed file with 2 additions and 0 deletions.
2 changes: 2 additions & 0 deletions src/transformers/generation/stopping_criteria.py
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,8 @@ def _stop_string_get_matching_positions(
def _cleanup_token(token: str) -> str:
if token[0] in ["▁", "Ġ"]:
token = " " + token[1:]
elif token[0] == "##":
token = token[2:]
return token

reversed_filtered_tok_list = [_cleanup_token(token)[::-1] for token in tok_list]
Expand Down

0 comments on commit 25ef298

Please sign in to comment.