Question on model_max_length (DeBERTa-V3) #16998

ioana-blue · 2022-04-28T18:29:57Z

System Info

- `transformers` version: 4.18.0
- Platform: macOS-10.16-x86_64-i386-64bit
- Python version: 3.8.3
- Huggingface_hub version: 0.5.1
- PyTorch version (GPU?): 1.5.1 (False)
- Tensorflow version (GPU?): 2.4.0 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: N/A
- Using distributed or parallel set-up in script?: N/A

Who can help?

@LysandreJik @SaulLu

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

I'm interested in finding out the max sequence length that a model can be run with. After some code browsing, my current understanding that this is a property stored in the tokenizer model_max_length.

I wrote a simple script to load a tokenzier for a pretrained model and print the model max length. This is the important part:

    # initialize the tokenizer to be able to print model_max_length
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast_tokenizer,
        revision=model_args.model_revision,
        use_auth_token=True if model_args.use_auth_token else None,
    )

    logger.info(f"Model max length {tokenizer.model_max_length}")

I used this to print max seq length for models such as BERT, RoBERTa, etc. All with expected results. For DeBERTa, I get confusing results.

If I run my script with DeBERTA-v3 as follows:

python check_model_max_len.py --model_name microsoft/deberta-v3-large --output_dir ./tmp --cache_dir ./tmp/cache

I get Model max length 1000000000000000019884624838656

If I understand correctly, this is a large integer used for models that can support "infinite" size lengths.

If I run my script with --model_name microsoft/deberta-v2-xlarge, I get Model max length 512

I don't understand if this is a bug or a feature :) My understanding is that the main difference between DeBERTa V2 and V3 is the use of ELECTRA style discriminator during MLM pretraining in V3. I don't understand why this difference would lead to a difference in supported max sequence lengths between the two models.

I also don't understand why some properties are hardcoded in the python files, e.g.,

PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES = {
    "microsoft/deberta-v2-xlarge": 512,
    "microsoft/deberta-v2-xxlarge": 512,
    "microsoft/deberta-v2-xlarge-mnli": 512,
    "microsoft/deberta-v2-xxlarge-mnli": 512,
}

I would expect these to be in the config files for the corresponding models.

Expected behavior

I would expect the max supported lengths for DeBERTa-V2 and DeBERTa-V3 models to be the same. Unless, I'm missing something. Thanks for your help!

The text was updated successfully, but these errors were encountered:

github-actions · 2022-05-29T15:01:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

LysandreJik · 2022-06-01T11:14:35Z

It's likely an error! Do you want to open a discussion on the model repo directly? https://huggingface.co/microsoft/deberta-v3-base/discussions/new

yu-xiang-wang · 2022-06-01T11:41:48Z

i get the same result 1000000000000000019884624838656

donaghhorgan · 2022-06-14T16:03:26Z

I'm seeing the same for the 125m and 350m OPT tokenizers (haven't checked the larger ones):

>>> AutoTokenizer.from_pretrained("facebook/opt-350m")
PreTrainedTokenizer(name_or_path='facebook/opt-350m', vocab_size=50265, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True)})
>>> AutoTokenizer.from_pretrained("facebook/opt-125m")
PreTrainedTokenizer(name_or_path='facebook/opt-125m', vocab_size=50265, model_max_len=1000000000000000019884624838656, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'bos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'eos_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'unk_token': AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=True), 'pad_token': AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=True)})

Is this definitely a bug?

github-actions · 2022-07-09T15:02:38Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

nbroad1881 · 2022-07-14T03:26:04Z

deberta v3 uses relative position embeddings which means it isn't limited to the typical 512 token limit.

As taken from section A.5 in their paper:

With relative position bias, we choose to truncate the maximum relative distance to k as in equation 3.
Thus in each layer, each token can attend directly to at most (2k - 1) tokens and itself. By stacking
Transformer layers, each token in the l-th layer can attend to at most (2k-1)*l tokens implicitly.
Taking DeBERTa_large as an example, where k = 512, L = 24, in theory, the maximum sequence
length that can be handled is 24,528.

That being said, it will start to slow down a ton once the sequence length gets bigger than 512

ioana-blue · 2022-07-14T13:43:07Z

Yes, I thought this might be the case, however, the same is true for deberta v2 if I remember correctly and the answer for that is different. What I was asking in the original post is why the the difference between v2 and v3. Thanks for clarifying part of the question/answer.

nbroad1881 · 2022-07-14T14:32:47Z

I meant to add to my last post:
The max length of 1000000000000000019884624838656 is typically an error when the max length is not specified in the tokenizer config file.

There was a discussion about it here: https://huggingface.co/google/muril-base-cased/discussions/1
And the solution was to modify the tokenizer config file: https://huggingface.co/google/muril-base-cased/discussions/2

github-actions · 2022-08-07T15:02:04Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

bcdarwin · 2023-03-27T20:01:32Z

This is still an issue with the config file and/or config file parser.

nbroad1881 · 2023-03-27T22:27:57Z

@bcdarwin

What is the issue?

woofadu2 · 2024-06-04T15:01:28Z

@nbroad1881 is it as simple as just sending in additional tokens totaling more than 512 to Deberta v3 to make use of the longer context window capability or is there some config/architecture change that needs to be made first?

nbroad1881 · 2024-06-04T16:42:43Z

Send the tokens

woofadu2 · 2024-12-20T00:49:29Z

Send the tokens

the model config saying 512 for max_positional_embeddings won't affect this? @nbroad1881

nbroad1881 · 2024-12-20T02:25:12Z

send the tokens and see what happens

woofadu2 · 2024-12-20T17:04:39Z

i'm not getting an error but am unsure if it's automatically truncating the tokens to 512 or not

nbroad1881 · 2024-12-20T20:08:47Z

It's not truncating

ioana-blue added the bug label Apr 28, 2022

github-actions bot closed this as completed Aug 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on model_max_length (DeBERTa-V3) #16998

Question on model_max_length (DeBERTa-V3) #16998

ioana-blue commented Apr 28, 2022

github-actions bot commented May 29, 2022

LysandreJik commented Jun 1, 2022

yu-xiang-wang commented Jun 1, 2022

donaghhorgan commented Jun 14, 2022

github-actions bot commented Jul 9, 2022

nbroad1881 commented Jul 14, 2022

ioana-blue commented Jul 14, 2022

nbroad1881 commented Jul 14, 2022

github-actions bot commented Aug 7, 2022

bcdarwin commented Mar 27, 2023

nbroad1881 commented Mar 27, 2023

woofadu2 commented Jun 4, 2024

nbroad1881 commented Jun 4, 2024

woofadu2 commented Dec 20, 2024

nbroad1881 commented Dec 20, 2024

woofadu2 commented Dec 20, 2024

nbroad1881 commented Dec 20, 2024

Question on model_max_length (DeBERTa-V3) #16998

Question on model_max_length (DeBERTa-V3) #16998

Comments

ioana-blue commented Apr 28, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

github-actions bot commented May 29, 2022

LysandreJik commented Jun 1, 2022

yu-xiang-wang commented Jun 1, 2022

donaghhorgan commented Jun 14, 2022

github-actions bot commented Jul 9, 2022

nbroad1881 commented Jul 14, 2022

ioana-blue commented Jul 14, 2022

nbroad1881 commented Jul 14, 2022

github-actions bot commented Aug 7, 2022

bcdarwin commented Mar 27, 2023

nbroad1881 commented Mar 27, 2023

woofadu2 commented Jun 4, 2024

nbroad1881 commented Jun 4, 2024

woofadu2 commented Dec 20, 2024

nbroad1881 commented Dec 20, 2024

woofadu2 commented Dec 20, 2024

nbroad1881 commented Dec 20, 2024