[ `Core`] Refactor modeling code #34987

ArthurZucker · 2024-11-28T07:03:49Z

What does this PR do?

Refactor LlamaAttention following #34282

HuggingFaceDocBuilderDev · 2024-11-28T07:31:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ruidazeng

What was the rationale behind using different attention functions as opposed to different attention classes?

Moreover, I think it would be nice for all the new attention functions to have some sort of docstrings like we had in the classes.

…into llama-refactor

refactor LlamaAttention

f14637a

ruidazeng reviewed Nov 28, 2024

View reviewed changes

only change lLlama

f446bd4

ArthurZucker force-pushed the llama-refactor branch from f62d01f to f446bd4 Compare December 11, 2024 11:21

ArthurZucker added 24 commits December 11, 2024 12:39

more refactoring

0384db9

nits

4e681b9

nits

893ef38

_output_embedding and _input_embeding

13a195a

oupts

39ab8b7

make auto for causal lm work

0418f97

nits

341b8ce

updates

556aa4e

pass attention

f61a5fe

cache concatenates on the wrong axis

dcf7a37

update

1baabd3

fix

38dd294

revert some stuff

4015481

there was an issue with tie weight keys

28829d2

style

1ef18f4

style

4b9a429

fix

e5d60b4

remove tanh

3bbae39

fix auto set

89d32d6

update

7a911ef

clean

20c512b

mm

d915636

fix!

6018982

fix attention_mask

e9d751a

ArthurZucker changed the title ~~refactor LlamaAttention~~ [ Core] Refactor modeling code Dec 11, 2024

ArthurZucker and others added 16 commits December 11, 2024 19:44

update

7a608da

fixup

6028e85

fix some stuff

725d00c

fix some tests

c224f36

9 left!

3f68c7c

fix auto?

1a5a834

fix

53450ac

default init weights

2016bc4

nit?

4f36712

Merge branch 'main' into llama-refactor

f7395cc

nits

9461039

Merge branch 'llama-refactor' of github.com:huggingface/transformers …

57eece6

…into llama-refactor

fix unpack imoprt

584b443

be permissive

95cb944

tgi update

caaa5e5

remove layer_idx

5060a33

ArthurZucker mentioned this pull request Dec 20, 2024

Remove graph breaks for torch.compile() in flash_attention_forward when Lllama Model is padding free tuned #33932

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ `Core`] Refactor modeling code #34987

[ `Core`] Refactor modeling code #34987

ArthurZucker commented Nov 28, 2024

HuggingFaceDocBuilderDev commented Nov 28, 2024

ruidazeng left a comment

[ Core] Refactor modeling code #34987

Are you sure you want to change the base?

[ Core] Refactor modeling code #34987

Conversation

ArthurZucker commented Nov 28, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Nov 28, 2024

ruidazeng left a comment

Choose a reason for hiding this comment

[ `Core`] Refactor modeling code #34987

[ `Core`] Refactor modeling code #34987