Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. #1374

ShashankMosaicML · 2024-07-19T19:15:44Z

This PR bumps flash attention version to 2.6.3.
It also adds the option for softcap in attention and lm_head logits, to allow Gemma-like models. The config names are same as the huggingface names here: https://github.com/huggingface/transformers/blob/96a074fa7e2c04b904f72d9e827398d4c5f90f25/src/transformers/models/gemma2/modeling_gemma2.py#L371

Loss curves for attn softcapping vs no attn softcapping:

MFU curves for attn softcapping vs no attn softcapping:

…ry into soft_cap_attn

tests/models/layers/test_flash_attn.py

ShashankMosaicML and others added 9 commits July 19, 2024 12:08

adding option for softcap in attention, updating flash attention

aeb650f

fix

8ed7e1c

adding test

4cf075c

..

d1e738e

..

87d4114

adding test

debd411

..

1e4a3aa

Merge branch 'main' into soft_cap_attn

9260f19

add logit softcapping

65c5fa8

ShashankMosaicML changed the title ~~adding option for softcap in attention, updating flash attention~~ adding option for softcap in attention and logits. Jul 23, 2024

ShashankMosaicML changed the title ~~adding option for softcap in attention and logits.~~ adding option for softcap in attention and lm_head logits. Jul 23, 2024

ShashankMosaicML and others added 5 commits July 22, 2024 23:28

..

059617d

Merge branch 'main' into soft_cap_attn

13119e2

Merge branch 'main' into soft_cap_attn

02b4f04

fix

b06adf6

Merge branch 'main' into soft_cap_attn

c72458a

ShashankMosaicML marked this pull request as ready for review July 23, 2024 22:33

ShashankMosaicML requested a review from a team as a code owner July 23, 2024 22:33

ShashankMosaicML requested review from dakinggg and vchiley July 23, 2024 22:33

ShashankMosaicML and others added 9 commits July 23, 2024 15:42

Merge branch 'soft_cap_attn' of github.com:ShashankMosaicML/llm-found…

258e048

…ry into soft_cap_attn

Merge branch 'main' into soft_cap_attn

b875fa3

Merge branch 'main' into soft_cap_attn

ee72ff6

Merge branch 'main' into soft_cap_attn

63d8676

Update configuration_mpt.py

65c17b0

Merge branch 'main' into soft_cap_attn

756e127

Merge branch 'main' into soft_cap_attn

a3b568d

Merge branch 'main' into soft_cap_attn

9c31f17

..

bf5e94e

ShashankMosaicML requested a review from a team as a code owner August 30, 2024 23:49

ShashankMosaicML changed the title ~~adding option for softcap in attention and lm_head logits.~~ bumping flash attention and adding option for softcap in attention and lm_head logits. Aug 30, 2024

ShashankMosaicML changed the title ~~bumping flash attention and adding option for softcap in attention and lm_head logits.~~ Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. Aug 30, 2024

dakinggg approved these changes Sep 3, 2024

View reviewed changes

tests/models/layers/test_flash_attn.py Show resolved Hide resolved

ShashankMosaicML added 2 commits September 20, 2024 13:58

Merge branch 'main' into soft_cap_attn

9a68b8f

Merge branch 'main' into soft_cap_attn

1a4123a

dakinggg approved these changes Sep 22, 2024

View reviewed changes

Merge branch 'main' into soft_cap_attn

21dd8bd

ShashankMosaicML merged commit 85403c0 into mosaicml:main Sep 23, 2024
9 checks passed

ShashankMosaicML deleted the soft_cap_attn branch September 23, 2024 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. #1374

Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. #1374

ShashankMosaicML commented Jul 19, 2024 •

edited

Loading

Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. #1374

Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. #1374

Conversation

ShashankMosaicML commented Jul 19, 2024 • edited Loading

ShashankMosaicML commented Jul 19, 2024 •

edited

Loading