-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return pytorch tensors like in transformers? #1578
Comments
In general, tokenizers pairs really well with tranformers. But if this features is requested I don't mind adding support for that! But it's gonna be a python only "layer" in the sense that I don't think there is a rust torch type |
thanks for the quick answer! So I guess I'll just use |
@ArthurZucker There are Rust bindings for Torch, as seen here, but it is a bit more finicky to use. There are currently some issues with creating a Python package that uses it, since it requires setting up various flags that require a certain PyTorch version. Another way to avoid using python lists, while supporting the latest PyTorch version, is to return NumPy arrays, and then call |
we do support input sequence that are nympy of strings, it's a matter of converting into a single encoding, where you keep the offset and batch / same for tokens etc |
Currently more focused on getting tokenizers to |
Hi,
Sorry in advance because I'm feel like I'm missing something here.
The
Tokenizer
fromtokenizers
seems to be having all of the same features astransformers.PreTrainedTokenizer
except for one:return_tensors="pt"
So, of course, I could convert them myself to pytorch
Tensor
fromEncoding.ids
like so but:List[int]
Or should I use
tokenizers
only for training the tokenizer and then switch totransformers.PreTrainedTokenizer
for inference?Best,
Paul
(In case you're wondering, this is a character-level tokenizer, that's why the sequences are so long in my example)
The text was updated successfully, but these errors were encountered: