-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question on model_max_length (DeBERTa-V3) #16998
Comments
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
It's likely an error! Do you want to open a discussion on the model repo directly? https://huggingface.co/microsoft/deberta-v3-base/discussions/new |
i get the same result 1000000000000000019884624838656 |
I'm seeing the same for the 125m and 350m OPT tokenizers (haven't checked the larger ones): >>> AutoTokenizer.from_pretrained("facebook/opt-350m")
PreTrainedTokenizer(name_or_path='facebook/opt-350m', vocab_size=50265, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True)})
>>> AutoTokenizer.from_pretrained("facebook/opt-125m")
PreTrainedTokenizer(name_or_path='facebook/opt-125m', vocab_size=50265, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True)}) Is this definitely a bug? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
deberta v3 uses relative position embeddings which means it isn't limited to the typical 512 token limit. As taken from section A.5 in their paper:
That being said, it will start to slow down a ton once the sequence length gets bigger than 512 |
Yes, I thought this might be the case, however, the same is true for deberta v2 if I remember correctly and the answer for that is different. What I was asking in the original post is why the the difference between v2 and v3. Thanks for clarifying part of the question/answer. |
I meant to add to my last post: There was a discussion about it here: https://huggingface.co/google/muril-base-cased/discussions/1 |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
This is still an issue with the config file and/or config file parser. |
What is the issue? |
@nbroad1881 is it as simple as just sending in additional tokens totaling more than 512 to Deberta v3 to make use of the longer context window capability or is there some config/architecture change that needs to be made first? |
Send the tokens |
the model config saying 512 for max_positional_embeddings won't affect this? @nbroad1881 |
send the tokens and see what happens |
i'm not getting an error but am unsure if it's automatically truncating the tokens to 512 or not |
It's not truncating |
System Info
Who can help?
@LysandreJik @SaulLu
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm interested in finding out the max sequence length that a model can be run with. After some code browsing, my current understanding that this is a property stored in the tokenizer
model_max_length
.I wrote a simple script to load a tokenzier for a pretrained model and print the model max length. This is the important part:
I used this to print max seq length for models such as BERT, RoBERTa, etc. All with expected results. For DeBERTa, I get confusing results.
If I run my script with DeBERTA-v3 as follows:
I get
Model max length 1000000000000000019884624838656
If I understand correctly, this is a large integer used for models that can support "infinite" size lengths.
If I run my script with
--model_name microsoft/deberta-v2-xlarge
, I getModel max length 512
I don't understand if this is a bug or a feature :) My understanding is that the main difference between DeBERTa V2 and V3 is the use of ELECTRA style discriminator during MLM pretraining in V3. I don't understand why this difference would lead to a difference in supported max sequence lengths between the two models.
I also don't understand why some properties are hardcoded in the python files, e.g.,
I would expect these to be in the config files for the corresponding models.
Expected behavior
The text was updated successfully, but these errors were encountered: