Skip to content

Commit

Permalink
update Split pretokenizer docstrings
Browse files Browse the repository at this point in the history
  • Loading branch information
Dylan-Harden3 committed Dec 12, 2024
1 parent 24d29f4 commit 7c2dee8
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 4 deletions.
4 changes: 2 additions & 2 deletions bindings/python/py_src/tokenizers/pre_tokenizers/__init__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -422,10 +422,10 @@ class Split(PreTokenizer):
Args:
pattern (:obj:`str` or :class:`~tokenizers.Regex`):
A pattern used to split the string. Usually a string or a regex built with `tokenizers.Regex`.
If you want to use a regex pattern, it has to be wrapped around a `tokenizer.Regex`,
If you want to use a regex pattern, it has to be wrapped around a `tokenizers.Regex`,
otherwise we consider is as a string pattern. For example `pattern="|"`
means you want to split on `|` (imagine a csv file for example), while
`patter=tokenizer.Regex("1|2")` means you split on either '1' or '2'.
`pattern=tokenizers.Regex("1|2")` means you split on either '1' or '2'.
behavior (:class:`~tokenizers.SplitDelimiterBehavior`):
The behavior to use when splitting.
Choices: "removed", "isolated", "merged_with_previous", "merged_with_next",
Expand Down
4 changes: 2 additions & 2 deletions bindings/python/src/pre_tokenizers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -335,10 +335,10 @@ impl PyWhitespaceSplit {
/// Args:
/// pattern (:obj:`str` or :class:`~tokenizers.Regex`):
/// A pattern used to split the string. Usually a string or a regex built with `tokenizers.Regex`.
/// If you want to use a regex pattern, it has to be wrapped around a `tokenizer.Regex`,
/// If you want to use a regex pattern, it has to be wrapped around a `tokenizers.Regex`,
/// otherwise we consider is as a string pattern. For example `pattern="|"`
/// means you want to split on `|` (imagine a csv file for example), while
/// `patter=tokenizer.Regex("1|2")` means you split on either '1' or '2'.
/// `pattern=tokenizers.Regex("1|2")` means you split on either '1' or '2'.
/// behavior (:class:`~tokenizers.SplitDelimiterBehavior`):
/// The behavior to use when splitting.
/// Choices: "removed", "isolated", "merged_with_previous", "merged_with_next",
Expand Down

0 comments on commit 7c2dee8

Please sign in to comment.