Batch matmul fast path in MHAWithCache #449

rohan-varma · 2023-08-17T00:45:35Z

Summary: When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache.

Differential Revision: D48418780

facebook-github-bot · 2023-08-17T00:46:02Z

This pull request was exported from Phabricator. Differential Revision: D48418780

facebook-github-bot · 2023-08-17T00:50:55Z

This pull request was exported from Phabricator. Differential Revision: D48418780

Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache. Differential Revision: D48418780 fbshipit-source-id: e8001eb870e827b05146221bb66f82939deae0c6

facebook-github-bot · 2023-08-17T07:45:38Z

This pull request was exported from Phabricator. Differential Revision: D48418780

Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization in MHAWithCache. Differential Revision: D48418780 fbshipit-source-id: 0501341832910bf90a7ea1cc902b98f0760548ab

codecov-commenter · 2023-08-17T07:52:10Z

Codecov Report

Patch coverage: 77.77% and project coverage change: -0.01% ⚠️

Comparison is base (951a452) 69.11% compared to head (a2e0a70) 69.11%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #449      +/-   ##
==========================================
- Coverage   69.11%   69.11%   -0.01%     
==========================================
  Files         170      170              
  Lines       11524    11530       +6     
==========================================
+ Hits         7965     7969       +4     
- Misses       3559     3561       +2

Files Changed	Coverage Δ
...hmultimodal/modules/layers/multi_head_attention.py	`96.82% <77.77%> (-3.18%)`	⬇️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

facebook-github-bot · 2023-08-18T17:20:24Z

This pull request was exported from Phabricator. Differential Revision: D48418780

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: 5ad930ff27a4b131f8ff1f097a4c9e1548efb587

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: eb0691e9d3a4bf729cfd7ca3293585c7d0108403

facebook-github-bot · 2023-08-18T17:25:23Z

This pull request was exported from Phabricator. Differential Revision: D48418780

facebook-github-bot · 2023-08-18T17:29:29Z

This pull request was exported from Phabricator. Differential Revision: D48418780

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: 0b20fb807527109a9a3ad419805e47e0f9ba2c74

…th (facebookresearch#449) Summary: Pull Request resolved: facebookresearch#449 When doing self attention, an optimization is to combine the Q, K, V input projection matrices and do a single matmul, instead of 3. Adding this optimization for MHA with cache in a new module `MultiHeadSelfAttentionWithCache`. Note: we are primarily using a new module to avoid breaking checkpoint BC with respect to `MultiHeadAttentionWithCache`. In the future, we should consolidate these MHA implementations. Differential Revision: D48418780 fbshipit-source-id: 58f00205af26d39f778853c7aa50d560e024b9f8

facebook-github-bot · 2023-08-18T17:35:06Z

This pull request was exported from Phabricator. Differential Revision: D48418780

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 17, 2023

facebook-github-bot added the fb-exported label Aug 17, 2023

rohan-varma force-pushed the export-D48418780 branch from e1233cc to dfd2ec6 Compare August 17, 2023 00:51

rohan-varma force-pushed the export-D48418780 branch from dfd2ec6 to a2e0a70 Compare August 17, 2023 07:45

rohan-varma force-pushed the export-D48418780 branch from a2e0a70 to 919dc03 Compare August 18, 2023 17:20

rohan-varma force-pushed the export-D48418780 branch from 919dc03 to 6d67dae Compare August 18, 2023 17:25

rohan-varma force-pushed the export-D48418780 branch from 6d67dae to 173699e Compare August 18, 2023 17:29

rohan-varma force-pushed the export-D48418780 branch from 173699e to d459f16 Compare August 18, 2023 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch matmul fast path in MHAWithCache #449

Batch matmul fast path in MHAWithCache #449

rohan-varma commented Aug 17, 2023

facebook-github-bot commented Aug 17, 2023

facebook-github-bot commented Aug 17, 2023

facebook-github-bot commented Aug 17, 2023

codecov-commenter commented Aug 17, 2023 •

edited

Loading

facebook-github-bot commented Aug 18, 2023

facebook-github-bot commented Aug 18, 2023

facebook-github-bot commented Aug 18, 2023

facebook-github-bot commented Aug 18, 2023

Batch matmul fast path in MHAWithCache #449

Are you sure you want to change the base?

Batch matmul fast path in MHAWithCache #449

Conversation

rohan-varma commented Aug 17, 2023

facebook-github-bot commented Aug 17, 2023

facebook-github-bot commented Aug 17, 2023

facebook-github-bot commented Aug 17, 2023

codecov-commenter commented Aug 17, 2023 • edited Loading

Codecov Report

facebook-github-bot commented Aug 18, 2023

facebook-github-bot commented Aug 18, 2023

facebook-github-bot commented Aug 18, 2023

facebook-github-bot commented Aug 18, 2023

codecov-commenter commented Aug 17, 2023 •

edited

Loading