You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello! I would like to request a fast pre-tokenizer to be implemented, which only splits the input to continuous pre-defined length segments. I know that this is not a common issue in NLP, but for my use-case it is necessary. I'm trying to process DNA data and that has no spaces or any type of separators, so I'm trying to use fixed length tokens.
Implementing this for someone that actually knows Rust and the backend would probably take less than half an hour but I don't want to learn a new language for this.
Biggest thanks!
The text was updated successfully, but these errors were encountered:
Hello! I would like to request a fast pre-tokenizer to be implemented, which only splits the input to continuous pre-defined length segments. I know that this is not a common issue in NLP, but for my use-case it is necessary. I'm trying to process DNA data and that has no spaces or any type of separators, so I'm trying to use fixed length tokens.
Implementing this for someone that actually knows Rust and the backend would probably take less than half an hour but I don't want to learn a new language for this.
Biggest thanks!
The text was updated successfully, but these errors were encountered: