Skip to content

Commit

Permalink
py : fix missing added_tokens_dict for SPM vocab
Browse files Browse the repository at this point in the history
  • Loading branch information
ggerganov committed Jan 16, 2024
1 parent a0b3ac8 commit 9b464b4
Showing 1 changed file with 1 addition and 0 deletions.
1 change: 1 addition & 0 deletions convert.py
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,7 @@ def __init__(
)

# Token pieces that were added to the base vocabulary.
self.added_tokens_dict = added_tokens
self.added_tokens_list = [new_tokens[id] for id in actual_new_ids]
self.vocab_size_base = vocab_size
self.vocab_size = self.vocab_size_base + len(self.added_tokens_list)
Expand Down

0 comments on commit 9b464b4

Please sign in to comment.