🚨All attention refactor🚨 #35235

ArthurZucker · 2024-12-12T13:39:35Z

What does this PR do?

Todo in this PR:

ArthurZucker · 2024-12-13T18:26:41Z

src/transformers/modeling_utils.py

+)
+
+
+class GradientCheckpointLayer(torch.nn.Module):


This should help with kwargs as well

LysandreJik

Very impressive work, kudos to you both!

Cyrilvallez · 2024-12-18T16:03:12Z

Confirmed slow tests with Llama, everything is similar to main!

ydshieh · 2024-12-18T16:18:42Z

run-slow: vit

(just a check unrelated to this PR)

vasqu

Great work guys! I think there might be value in keeping some comments, e.g. why call contiguous on sdpa, and clarifying the fa usage on recasting to half (which originates from PEFT and/or rope).

src/transformers/integrations/flash_attention.py

src/transformers/integrations/flex_attention.py

vasqu · 2024-12-18T21:46:06Z

src/transformers/integrations/flex_attention.py

+    attention_mask: Optional[torch.Tensor],
+    scaling: Optional[float] = None,
+    softcap: Optional[float] = None,
+    **kwargs,


Head mask could be added as well as done in gpt neox.

It's more like gpt_neo_x will get a flex attention function that has support for head mask! But good point!

src/transformers/integrations/sdpa_attention.py

src/transformers/integrations/flash_attention.py

src/transformers/integrations/flex_attention.py

Cyrilvallez · 2024-12-19T13:28:08Z

Thanks a lot for the feedback @vasqu! Please have a look at the fixes in #35342!

SimJeg · 2024-12-20T08:34:11Z

@ArthurZucker @Cyrilvallez do you plan to refactor modeling_phi3.py using the ALL_ATTENTION_FUNCTIONS you recently introduced ? I see Phi 3 is in the list shared by @ArthurZucker at the beginning of this PR. It would be very helpful for our kvpress package. We plan to update it to be compatible with the future v4.48, any idea of when it will be released ? (december ? january ?)

ArthurZucker · 2024-12-20T09:59:01Z

Release will happen on 🎅🏻 🎁 !
Yeah for sure. Unless you submit a PR first 👀 not sure it will be in this release as we are all going on holidays but in january's release it will be included

SimJeg · 2024-12-20T10:26:14Z

I'm going to holidays too so I'll wait for January ^^ Happy holidays!

ArthurZucker · 2024-12-20T11:08:59Z

Happy holidays!

poedator · 2024-12-20T21:16:33Z

I noticed that there is still torch.reshape used in quite a few places, for instance in modeling_gpt2.py Won't it be an obstacle to compiling these models? Why not replacing it with einops and using einops._torch_specific.allow_ops_in_compiled_graph() ?

Cyrilvallez · 2024-12-21T12:12:10Z

Hey! Indeed for gpt2 I used a 'reshape' instead of the usual 'view' because I hit an edge case that wasn't compatible with viewing at some point (but I will recheck that it still appears with latest developments, might have been an artifact during the debugging process). Whenever possible (most of the time), 'reshape' is actually equivalent to 'view' so no worries there anyway 😉

ArthurZucker force-pushed the all-attention-refactor branch from 0dc9253 to d1aa9ce Compare December 12, 2024 13:49

ArthurZucker commented Dec 13, 2024

View reviewed changes

ArthurZucker mentioned this pull request Dec 16, 2024

Add ModernBERT to Transformers #35158

Merged

ArthurZucker and others added 17 commits December 16, 2024 10:14

refactor LlamaAttention

79cb53c

minimal changes

4bb485b

fix llama

f370907

update

d3ef539

modular gemmas

45eac58

modular nits

e52af49

modular updates

5ed37ae

nits

38cafc1

simplify

a862eac

gpt2

5639b81

more modualr and fixes

452d8ed

granite

81a0b66

modular modular modular

bc72c3f

nits

48caa89

update

df68dd0

qwen2 + starcoder2

0325dc4

mostly gemma2

ecd814b

Cyrilvallez force-pushed the all-attention-refactor branch from 8b56823 to ecd814b Compare December 16, 2024 11:28

Cyrilvallez and others added 9 commits December 16, 2024 12:39

Update image_processing_auto.py

f5fc638

fix

5e56d9c

Update modular_starcoder2.py

598b7bb

fix

0f565fb

remove all copied from attentions

c9ac84d

remove gcv

d189fe7

make fix-copies

9c83d96

oups

138368e

oups2.0

7225a4f

Cyrilvallez added 2 commits December 18, 2024 15:26

more explicit modulars

aeea33b

CIs! it works locally

ec3bef3

LysandreJik approved these changes Dec 18, 2024

View reviewed changes

add kwargs to _flash_attention_forward

fc74e39

Cyrilvallez merged commit 2c47618 into main Dec 18, 2024
25 checks passed

Cyrilvallez deleted the all-attention-refactor branch December 18, 2024 15:53

Cyrilvallez mentioned this pull request Dec 18, 2024

Add Zamba2 #34517

Draft

5 tasks

vasqu reviewed Dec 18, 2024

View reviewed changes

jp1924 mentioned this pull request Dec 18, 2024

Fix: 'Gemma2Attention' object has no attribute '_flash_attn_uses_top_left_mask' #35285

Closed

5 tasks

ArthurZucker mentioned this pull request Dec 19, 2024

DRAFT: Add transformers backend support vllm-project/vllm#11330

Draft

warner-benjamin mentioned this pull request Dec 19, 2024

Modernbert Release Fixes #35344

Merged

Cyrilvallez mentioned this pull request Dec 19, 2024

Efficient Transformers backend support huggingface/text-generation-inference#2858

Draft

alex-jw-brooks mentioned this pull request Dec 19, 2024

Add num logits to keep to granite llms #35247

Closed

5 tasks

This was referenced Dec 20, 2024

Modular phi #34361

Open

Adding FlexAttention Support for Qwen2 models #35155

Open

[ESM] Add support for sdpa. #34954

Open

Cyrilvallez mentioned this pull request Dec 20, 2024

More model refactoring! #35359

Open

30 tasks

ArthurZucker mentioned this pull request Dec 20, 2024

SnapKV_Cache support added #34710

Open

5 tasks

pglorio mentioned this pull request Dec 20, 2024

Zamba new attention standard #35375

Open

5 tasks

poedator mentioned this pull request Dec 21, 2024

is_causal arg appears twice in FAttention call from GPT2Attention.forward() #35380

Open

4 tasks

weak-kajuma added a commit to weak-kajuma/transformers that referenced this pull request Dec 21, 2024

add DiffLlamaDecoderLayer which is old, before huggingface#35235

8843732

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading

ArthurZucker Dec 13, 2024

LysandreJik left a comment

Cyrilvallez commented Dec 18, 2024

ydshieh commented Dec 18, 2024

vasqu left a comment

vasqu Dec 18, 2024

ArthurZucker Dec 19, 2024

Cyrilvallez commented Dec 19, 2024

SimJeg commented Dec 20, 2024

ArthurZucker commented Dec 20, 2024

SimJeg commented Dec 20, 2024

ArthurZucker commented Dec 20, 2024

poedator commented Dec 20, 2024

Cyrilvallez commented Dec 21, 2024

		)


		class GradientCheckpointLayer(torch.nn.Module):

🚨All attention refactor🚨 #35235

🚨All attention refactor🚨 #35235

Conversation

ArthurZucker commented Dec 12, 2024 • edited by Cyrilvallez Loading

What does this PR do?

ArthurZucker Dec 13, 2024

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

Cyrilvallez commented Dec 18, 2024

ydshieh commented Dec 18, 2024

vasqu left a comment

Choose a reason for hiding this comment

vasqu Dec 18, 2024

Choose a reason for hiding this comment

ArthurZucker Dec 19, 2024

Choose a reason for hiding this comment

Cyrilvallez commented Dec 19, 2024

SimJeg commented Dec 20, 2024

ArthurZucker commented Dec 20, 2024

SimJeg commented Dec 20, 2024

ArthurZucker commented Dec 20, 2024

poedator commented Dec 20, 2024

Cyrilvallez commented Dec 21, 2024

ArthurZucker commented Dec 12, 2024 •

edited by Cyrilvallez

Loading