Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on model_max_length (DeBERTa-V3) #16998

Closed
4 tasks
ioana-blue opened this issue Apr 28, 2022 · 17 comments
Closed
4 tasks

Question on model_max_length (DeBERTa-V3) #16998

ioana-blue opened this issue Apr 28, 2022 · 17 comments
Labels

Comments

@ioana-blue
Copy link

System Info

- `transformers` version: 4.18.0
- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.3
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): 1.5.1 (False)
- Tensorflow version (GPU?): 2.4.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: N/A
- Using distributed or parallel set-up in script?: N/A

Who can help?

@LysandreJik @SaulLu

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I'm interested in finding out the max sequence length that a model can be run with. After some code browsing, my current understanding that this is a property stored in the tokenizer model_max_length.

I wrote a simple script to load a tokenzier for a pretrained model and print the model max length. This is the important part:

    # initialize the tokenizer to be able to print model_max_length
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast_tokenizer,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )

    logger.info(f"Model max length {tokenizer.model_max_length}")

I used this to print max seq length for models such as BERT, RoBERTa, etc. All with expected results. For DeBERTa, I get confusing results.

If I run my script with DeBERTA-v3 as follows:

python check_model_max_len.py --model_name microsoft/deberta-v3-large --output_dir ./tmp --cache_dir ./tmp/cache

I get Model max length 1000000000000000019884624838656

If I understand correctly, this is a large integer used for models that can support "infinite" size lengths.

If I run my script with --model_name microsoft/deberta-v2-xlarge, I get Model max length 512

I don't understand if this is a bug or a feature :) My understanding is that the main difference between DeBERTa V2 and V3 is the use of ELECTRA style discriminator during MLM pretraining in V3. I don't understand why this difference would lead to a difference in supported max sequence lengths between the two models.

I also don't understand why some properties are hardcoded in the python files, e.g.,

PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
    "microsoft/deberta-v2-xlarge": 512,
    "microsoft/deberta-v2-xxlarge": 512,
    "microsoft/deberta-v2-xlarge-mnli": 512,
    "microsoft/deberta-v2-xxlarge-mnli": 512,
}

I would expect these to be in the config files for the corresponding models.

Expected behavior

I would expect the max supported lengths for DeBERTa-V2 and DeBERTa-V3 models to be the same. Unless, I'm missing something. Thanks for your help!
@ioana-blue ioana-blue added the bug label Apr 28, 2022
@github-actions
Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@LysandreJik
Copy link
Member

It's likely an error! Do you want to open a discussion on the model repo directly? https://huggingface.co/microsoft/deberta-v3-base/discussions/new

@yu-xiang-wang
Copy link

i get the same result 1000000000000000019884624838656

@donaghhorgan
Copy link

I'm seeing the same for the 125m and 350m OPT tokenizers (haven't checked the larger ones):

>>> AutoTokenizer.from_pretrained("facebook/opt-350m")
PreTrainedTokenizer(name_or_path='facebook/opt-350m', vocab_size=50265, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True)})
>>> AutoTokenizer.from_pretrained("facebook/opt-125m")
PreTrainedTokenizer(name_or_path='facebook/opt-125m', vocab_size=50265, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True)})

Is this definitely a bug?

@github-actions
Copy link

github-actions bot commented Jul 9, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@nbroad1881
Copy link
Contributor

deberta v3 uses relative position embeddings which means it isn't limited to the typical 512 token limit.

As taken from section A.5 in their paper:

With relative position bias, we choose to truncate the maximum relative distance to k as in equation 3.
Thus in each layer, each token can attend directly to at most (2k - 1) tokens and itself. By stacking
Transformer layers, each token in the l-th layer can attend to at most (2k-1)*l tokens implicitly.
Taking DeBERTa_large as an example, where k = 512, L = 24, in theory, the maximum sequence
length that can be handled is 24,528.

That being said, it will start to slow down a ton once the sequence length gets bigger than 512

@ioana-blue
Copy link
Author

Yes, I thought this might be the case, however, the same is true for deberta v2 if I remember correctly and the answer for that is different. What I was asking in the original post is why the the difference between v2 and v3. Thanks for clarifying part of the question/answer.

@nbroad1881
Copy link
Contributor

I meant to add to my last post:
The max length of 1000000000000000019884624838656 is typically an error when the max length is not specified in the tokenizer config file.

There was a discussion about it here: https://huggingface.co/google/muril-base-cased/discussions/1
And the solution was to modify the tokenizer config file: https://huggingface.co/google/muril-base-cased/discussions/2

@github-actions
Copy link

github-actions bot commented Aug 7, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@bcdarwin
Copy link

This is still an issue with the config file and/or config file parser.

@nbroad1881
Copy link
Contributor

@bcdarwin

What is the issue?

@woofadu2
Copy link

woofadu2 commented Jun 4, 2024

@nbroad1881 is it as simple as just sending in additional tokens totaling more than 512 to Deberta v3 to make use of the longer context window capability or is there some config/architecture change that needs to be made first?

@nbroad1881
Copy link
Contributor

Send the tokens

@woofadu2
Copy link

Send the tokens

the model config saying 512 for max_positional_embeddings won't affect this? @nbroad1881

@nbroad1881
Copy link
Contributor

send the tokens and see what happens

@woofadu2
Copy link

i'm not getting an error but am unsure if it's automatically truncating the tokens to 512 or not

@nbroad1881
Copy link
Contributor

It's not truncating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants