fix FA2 when using quantization #28203

pacman100 · 2023-12-22T11:48:25Z

What does this PR do?

when I use QLoRA+Flash Attention with bf16, I get the following warning of casting to float16 which is incorrect as it should be casting to bf16:

The input hidden states seems to be silently casted in float32, this might be related to the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in torch.float16.

This PR resolves this issue.

HuggingFaceDocBuilderDev · 2023-12-22T12:09:44Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

younesbelkada

Makes sense to me, we should first check if autocast is enabled even with quantization, then fallback to the quantization original dtype. Thanks for the fix @pacman100 !

amyeroberts

@pacman100 Thanks for fixing this!

susnato · 2023-12-28T09:05:27Z

Hi, this behavior of casting to float16 is still present in these models - whisper, bart, phi, distilbert...I will create a PR to fix it.

fix FA2 when using quantization

6dfc2e9

pacman100 requested review from younesbelkada and ArthurZucker December 22, 2023 11:48

younesbelkada approved these changes Dec 22, 2023

View reviewed changes

younesbelkada requested a review from amyeroberts December 22, 2023 16:07

amyeroberts approved these changes Dec 22, 2023

View reviewed changes

pacman100 merged commit 3b7675b into main Dec 26, 2023
19 checks passed

pacman100 deleted the smangrul/fix-fa2-integration-quantization branch December 26, 2023 03:06

Saibo-creator pushed a commit to epfl-dlab/transformers-GCD-PR that referenced this pull request Jan 4, 2024

fix FA2 when using quantization (huggingface#28203)

010566e

susnato mentioned this pull request Jan 4, 2024

fix FA2 when using quantization for remaining models #28341

Merged

5 tasks

staghado pushed a commit to staghado/transformers that referenced this pull request Jan 15, 2024

fix FA2 when using quantization (huggingface#28203)

5849317

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix FA2 when using quantization #28203

fix FA2 when using quantization #28203

pacman100 commented Dec 22, 2023

HuggingFaceDocBuilderDev commented Dec 22, 2023

younesbelkada left a comment

amyeroberts left a comment

susnato commented Dec 28, 2023

fix FA2 when using quantization #28203

fix FA2 when using quantization #28203

Conversation

pacman100 commented Dec 22, 2023

What does this PR do?

HuggingFaceDocBuilderDev commented Dec 22, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

susnato commented Dec 28, 2023