Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch matmul fast path in MHAWithCache #449

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rohan-varma
Copy link
Contributor

Summary: When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache.

Differential Revision: D48418780

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 17, 2023
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

rohan-varma added a commit to rohan-varma/multimodal that referenced this pull request Aug 17, 2023
Summary:
Pull Request resolved: facebookresearch#449

When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache.

Differential Revision: D48418780

fbshipit-source-id: e8001eb870e827b05146221bb66f82939deae0c6
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

rohan-varma added a commit to rohan-varma/multimodal that referenced this pull request Aug 17, 2023
Summary:
Pull Request resolved: facebookresearch#449

When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache.

Differential Revision: D48418780

fbshipit-source-id: 0501341832910bf90a7ea1cc902b98f0760548ab
@codecov-commenter
Copy link

codecov-commenter commented Aug 17, 2023

Codecov Report

Patch coverage: 77.77% and project coverage change: -0.01% ⚠️

Comparison is base (951a452) 69.11% compared to head (a2e0a70) 69.11%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #449      +/-   ##
==========================================
- Coverage   69.11%   69.11%   -0.01%     
==========================================
  Files         170      170              
  Lines       11524    11530       +6     
==========================================
+ Hits         7965     7969       +4     
- Misses       3559     3561       +2     
Files Changed Coverage Δ
...hmultimodal/modules/layers/multi_head_attention.py 96.82% <77.77%> (-3.18%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

rohan-varma added a commit to rohan-varma/multimodal that referenced this pull request Aug 18, 2023
…th (facebookresearch#449)

Summary:
Pull Request resolved: facebookresearch#449

When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`.

Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations.

Differential Revision: D48418780

fbshipit-source-id: 5ad930ff27a4b131f8ff1f097a4c9e1548efb587
rohan-varma added a commit to rohan-varma/multimodal that referenced this pull request Aug 18, 2023
…th (facebookresearch#449)

Summary:
Pull Request resolved: facebookresearch#449

When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`.

Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations.

Differential Revision: D48418780

fbshipit-source-id: eb0691e9d3a4bf729cfd7ca3293585c7d0108403
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

rohan-varma added a commit to rohan-varma/multimodal that referenced this pull request Aug 18, 2023
…th (facebookresearch#449)

Summary:
Pull Request resolved: facebookresearch#449

When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`.

Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations.

Differential Revision: D48418780

fbshipit-source-id: 0b20fb807527109a9a3ad419805e47e0f9ba2c74
…th (facebookresearch#449)

Summary:
Pull Request resolved: facebookresearch#449

When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`.

Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations.

Differential Revision: D48418780

fbshipit-source-id: 58f00205af26d39f778853c7aa50d560e024b9f8
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D48418780

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants