Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix FA2 when using quantization #28203

Merged
merged 1 commit into from
Dec 26, 2023

Conversation

pacman100
Copy link
Contributor

What does this PR do?

  1. when I use QLoRA+Flash Attention with bf16, I get the following warning of casting to float16 which is incorrect as it should be casting to bf16:
The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.float16.

This PR resolves this issue.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, we should first check if autocast is enabled even with quantization, then fallback to the quantization original dtype. Thanks for the fix @pacman100 !

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pacman100 Thanks for fixing this!

@pacman100 pacman100 merged commit 3b7675b into main Dec 26, 2023
19 checks passed
@pacman100 pacman100 deleted the smangrul/fix-fa2-integration-quantization branch December 26, 2023 03:06
@susnato
Copy link
Contributor

susnato commented Dec 28, 2023

Hi, this behavior of casting to float16 is still present in these models - whisper, bart, phi, distilbert...I will create a PR to fix it.

Saibo-creator pushed a commit to epfl-dlab/transformers-GCD-PR that referenced this pull request Jan 4, 2024
staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants