[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` #27868

fxmarty · 2023-12-06T16:59:27Z

As per title, fixes #27789.

This issue is only for PyTorch 2.1 and has been fixed in torch nightly.

fxmarty · 2023-12-06T17:03:36Z

src/transformers/modeling_attn_mask_utils.py

@@ -66,7 +66,7 @@ def to_causal_4d(
        batch_size: int,
        query_length: int,
        key_value_length: int,
-        dtype: torch.dtype = torch.float32,
+        dtype: torch.dtype,


It is fine to remove the default (that is the cause of the error), as in _prepare_4d_causal_attention_mask, _prepare_4d_attention_mask and _create_4d_causal_attention_mask privately exposed methods dtype is always passed.

But to_causal_4d is part of the public api no? O this is breaking 😅

AFAIK it is used nowhere else than in _create_4d_causal_attention_mask, _prepare_4d_attention_mask, _prepare_4d_causal_attention_mask, cc @patrickvonplaten I don't think the mask converter API was meant to be exposed?

ArthurZucker

I think it should be alright even if breaking as it's mostly used for internal calls and has not really been here for a long time. Could you link the pytorch issue?

ArthurZucker · 2023-12-07T06:46:10Z

src/transformers/modeling_attn_mask_utils.py

@@ -66,7 +66,7 @@ def to_causal_4d(
        batch_size: int,
        query_length: int,
        key_value_length: int,
-        dtype: torch.dtype = torch.float32,
+        dtype: torch.dtype,


But to_causal_4d is part of the public api no? O this is breaking 😅

fxmarty · 2023-12-07T11:26:52Z

Also, AttentionMaskConverter is not in the documentation so not really user-facing.

ArthurZucker

Alright with me let's just add a ⚠️

fxmarty added 3 commits December 6, 2023 17:41

remove bugged torch.float32 default

558dfe4

add test

0c0b35c

fix tests

8358fac

fxmarty commented Dec 6, 2023

View reviewed changes

fxmarty requested a review from ArthurZucker December 6, 2023 17:03

fxmarty mentioned this pull request Dec 6, 2023

_prepare_4d_causal_attention_mask doesn't work with torch.compile #27789

Closed

4 tasks

ArthurZucker reviewed Dec 7, 2023

View reviewed changes

fxmarty added 2 commits December 7, 2023 10:52

fix test

f262fb7

fix doc

3c85215

fxmarty requested a review from ArthurZucker December 7, 2023 11:26

ArthurZucker approved these changes Dec 8, 2023

View reviewed changes

fxmarty changed the title ~~Make AttentionMaskConverter compatible with torch.compile(..., fullgraph=True)~~ [⚠️ removed a default argument] Make AttentionMaskConverter compatible with torch.compile(..., fullgraph=True) Dec 8, 2023

fxmarty merged commit 307a7d0 into huggingface:main Dec 8, 2023
20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` #27868

[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` #27868

fxmarty commented Dec 6, 2023 •

edited

Loading

fxmarty Dec 6, 2023

ArthurZucker Dec 7, 2023

fxmarty Dec 7, 2023

ArthurZucker left a comment

ArthurZucker Dec 7, 2023

fxmarty commented Dec 7, 2023

ArthurZucker left a comment

[⚠️ removed a default argument] Make AttentionMaskConverter compatible with torch.compile(..., fullgraph=True) #27868

[⚠️ removed a default argument] Make AttentionMaskConverter compatible with torch.compile(..., fullgraph=True) #27868

Conversation

fxmarty commented Dec 6, 2023 • edited Loading

fxmarty Dec 6, 2023

Choose a reason for hiding this comment

ArthurZucker Dec 7, 2023

Choose a reason for hiding this comment

fxmarty Dec 7, 2023

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Dec 7, 2023

Choose a reason for hiding this comment

fxmarty commented Dec 7, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` #27868

[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` #27868

fxmarty commented Dec 6, 2023 •

edited

Loading