Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. #1374

Merged
merged 26 commits into from
Sep 23, 2024

Conversation

ShashankMosaicML
Copy link
Contributor

@ShashankMosaicML ShashankMosaicML commented Jul 19, 2024

This PR bumps flash attention version to 2.6.3.
It also adds the option for softcap in attention and lm_head logits, to allow Gemma-like models. The config names are same as the huggingface names here: https://github.com/huggingface/transformers/blob/96a074fa7e2c04b904f72d9e827398d4c5f90f25/src/transformers/models/gemma2/modeling_gemma2.py#L371

Loss curves for attn softcapping vs no attn softcapping:
Screenshot 2024-09-23 at 10 36 14 AM

MFU curves for attn softcapping vs no attn softcapping:
Screenshot 2024-09-23 at 10 35 44 AM

@ShashankMosaicML ShashankMosaicML changed the title adding option for softcap in attention, updating flash attention adding option for softcap in attention and logits. Jul 23, 2024
@ShashankMosaicML ShashankMosaicML changed the title adding option for softcap in attention and logits. adding option for softcap in attention and lm_head logits. Jul 23, 2024
@ShashankMosaicML ShashankMosaicML marked this pull request as ready for review July 23, 2024 22:33
@ShashankMosaicML ShashankMosaicML requested a review from a team as a code owner July 23, 2024 22:33
@ShashankMosaicML ShashankMosaicML requested a review from a team as a code owner August 30, 2024 23:49
@ShashankMosaicML ShashankMosaicML changed the title adding option for softcap in attention and lm_head logits. bumping flash attention and adding option for softcap in attention and lm_head logits. Aug 30, 2024
@ShashankMosaicML ShashankMosaicML changed the title bumping flash attention and adding option for softcap in attention and lm_head logits. Bumping flash attention version to 2.6.3 and adding option for softcap in attention and lm_head logits. Aug 30, 2024
@ShashankMosaicML ShashankMosaicML merged commit 85403c0 into mosaicml:main Sep 23, 2024
9 checks passed
@ShashankMosaicML ShashankMosaicML deleted the soft_cap_attn branch September 23, 2024 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants