Skip to content

Commit

Permalink
added more descriptive exception
Browse files Browse the repository at this point in the history
  • Loading branch information
Martin Fajčík committed Oct 7, 2024
1 parent dc20800 commit 1008070
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions lm_eval/models/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,9 @@ def segmented_tok_encode(string: SegmentedString, tokenizer: PreTrainedTokenizer
is a list of segment labels.
"""
if type(string)!=SegmentedString:
raise ValueError(f"Input must be of type SegmentedString (found type {str(type(string))}).\n"
f"Do not use smart truncation strategy for language modeling tasks.")
assert type(string) == SegmentedString, "string must be a SegmentedString"
encoding = tokenizer(
string,
Expand Down

0 comments on commit 1008070

Please sign in to comment.