Skip to content

Commit

Permalink
fix: only use eos_token_id as pad_token_id if int (#2774)
Browse files Browse the repository at this point in the history
LLama 3 has a list of values as eos_token_id:
  "['<|end_of_text|>', '<|eom_id|>', '<|eot_id|>']"
This breaks tokenizer since it expects single value. This
commit uses tokenizer.eos_token_id instead in such a case.

Fixes: #2440

Signed-off-by: Dmitry Rogozhkin <[email protected]>
  • Loading branch information
dvrogozh authored Dec 2, 2024
1 parent 2c74c55 commit 535149d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion server/text_generation_server/models/causal_lm.py
Original file line number Diff line number Diff line change
Expand Up @@ -630,7 +630,7 @@ def fallback(
if tokenizer.pad_token_id is None:
if model.config.pad_token_id is not None:
tokenizer.pad_token_id = model.config.pad_token_id
elif model.config.eos_token_id is not None:
elif model.config.eos_token_id is not None and isinstance(model.config.eos_token_id, int):
tokenizer.pad_token_id = model.config.eos_token_id
elif tokenizer.eos_token_id is not None:
tokenizer.pad_token_id = tokenizer.eos_token_id
Expand Down

0 comments on commit 535149d

Please sign in to comment.