Skip to content

Commit

Permalink
Return empty tokens array if text is empty after normalization
Browse files Browse the repository at this point in the history
  • Loading branch information
xenova committed Jan 25, 2024
1 parent b07336d commit 5a9fc6a
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions src/tokenizers.js
Original file line number Diff line number Diff line change
Expand Up @@ -2744,6 +2744,12 @@ export class PreTrainedTokenizer extends Callable {
x = this.normalizer(x);
}

// If, after normalization, this section is empty (e.g., trimming whitespace),
// we return an empty array
if (x.length === 0) {
return [];
}

const sectionTokens = (this.pre_tokenizer !== null) ? this.pre_tokenizer(x, {
section_index,
}) : [x];
Expand Down

0 comments on commit 5a9fc6a

Please sign in to comment.