Change `repeat` to `expand` in GQA #628

sashaDoubov · 2023-09-25T22:17:26Z

This reduces memory usage and improves MFU. Note that there may be some risk of NaNs with small d_head (this was previously seen for MQA). However, there is significant perf benefits for standard d_head.

madhavatreplit · 2023-10-01T23:39:16Z

Will this be backwards compatible with any MPT model using GQA, trained using the code from a commit before this?

sashaDoubov · 2023-10-02T22:43:14Z

@madhavatreplit yes, this is backwards compatible, as the model weights don't change, this is simply an optimization for the attention function to avoid allocating new memory.

add initial commit

c2b10fd

sashaDoubov changed the title ~~Change expand to repeat in GQA~~ Change repeat to expand in GQA Sep 25, 2023

sashaDoubov marked this pull request as ready for review September 26, 2023 00:43

sashaDoubov requested review from vchiley and dakinggg September 26, 2023 00:43

vchiley approved these changes Sep 26, 2023

View reviewed changes

Merge branch 'main' into expand_gqa

9d56430

dakinggg enabled auto-merge (squash) September 26, 2023 01:03

dakinggg disabled auto-merge September 26, 2023 01:03

dakinggg enabled auto-merge (squash) September 26, 2023 01:03

Merge branch 'main' into expand_gqa

31e83da

dakinggg merged commit 61dfbd6 into mosaicml:main Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change `repeat` to `expand` in GQA #628

Change `repeat` to `expand` in GQA #628

sashaDoubov commented Sep 25, 2023

madhavatreplit commented Oct 1, 2023

sashaDoubov commented Oct 2, 2023

Change repeat to expand in GQA #628

Change repeat to expand in GQA #628

Conversation

sashaDoubov commented Sep 25, 2023

madhavatreplit commented Oct 1, 2023

sashaDoubov commented Oct 2, 2023

Change `repeat` to `expand` in GQA #628

Change `repeat` to `expand` in GQA #628