Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change repeat to expand in GQA #628

Merged
merged 3 commits into from
Sep 26, 2023
Merged

Conversation

sashaDoubov
Copy link
Contributor

This reduces memory usage and improves MFU. Note that there may be some risk of NaNs with small d_head (this was previously seen for MQA). However, there is significant perf benefits for standard d_head.

@sashaDoubov sashaDoubov changed the title Change expand to repeat in GQA Change repeat to expand in GQA Sep 25, 2023
@sashaDoubov sashaDoubov marked this pull request as ready for review September 26, 2023 00:43
@dakinggg dakinggg enabled auto-merge (squash) September 26, 2023 01:03
@dakinggg dakinggg disabled auto-merge September 26, 2023 01:03
@dakinggg dakinggg enabled auto-merge (squash) September 26, 2023 01:03
@dakinggg dakinggg merged commit 61dfbd6 into mosaicml:main Sep 26, 2023
@madhavatreplit
Copy link

Will this be backwards compatible with any MPT model using GQA, trained using the code from a commit before this?

@sashaDoubov
Copy link
Contributor Author

@madhavatreplit yes, this is backwards compatible, as the model weights don't change, this is simply an optimization for the attention function to avoid allocating new memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants