[`FA-2`] Final fix for FA2 dtype #26846

younesbelkada · 2023-10-16T18:48:14Z

What does this PR do?

Proposes a simpler fix for dealing with FA-2 + PEFT + quantization fine-tuning where users usually cast all other modules (e.g. LayerNorms) in fp32 for training stability.

With #26761 being introduced, it is now much simpler to retrieve model's original dtype, note also that self.config._pre_quantization_dtype remains the single source of truth as to is not supported for quantized models

cc @ArthurZucker @pacman100

Added also a nice test

HuggingFaceDocBuilderDev · 2023-10-16T19:05:17Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

ArthurZucker

Thanks, think we can simplify a bit and remove the warning ?

ArthurZucker · 2023-10-17T07:15:36Z

src/transformers/models/mistral/modeling_mistral.py

            logger.warning_once(
-                "The input hidden states seems to be silently casted in float32, this might be related to"
-                " the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in"
-                " float16."
+                f"The input hidden states seems to be silently casted in float32, this might be related to"
+                f" the fact you have upcasted embedding or layer norm layers in float32. We will cast back the input in"
+                f" {target_dtype}."
            )


I think we can remove this now no?

Hmm I think we need to keep it to inform users about that

src/transformers/models/falcon/modeling_falcon.py

Co-authored-by: Arthur <[email protected]>

* final fix for FA2 dtype * try * oops * Update src/transformers/models/falcon/modeling_falcon.py Co-authored-by: Arthur <[email protected]> * apply fix everywhere --------- Co-authored-by: Arthur <[email protected]>

final fix for FA2 dtype

d1743eb

younesbelkada mentioned this pull request Oct 16, 2023

[FA2] Cast to correct dtype #26560

Closed

younesbelkada added 2 commits October 17, 2023 01:01

try

2107305

oops

dfe9ddd

ArthurZucker approved these changes Oct 17, 2023

View reviewed changes

younesbelkada and others added 3 commits October 18, 2023 19:16

Update src/transformers/models/falcon/modeling_falcon.py

21ffe37

Co-authored-by: Arthur <[email protected]>

Merge remote-tracking branch 'upstream/main' into fa-2-final-fix

2ef54a2

apply fix everywhere

c0ce79a

younesbelkada merged commit 5a73316 into huggingface:main Oct 18, 2023
3 checks passed

younesbelkada mentioned this pull request Oct 18, 2023

[WIP] Add FA2 for all Bart-like #26722

Closed

younesbelkada deleted the fa-2-final-fix branch October 18, 2023 21:13

younesbelkada mentioned this pull request Oct 18, 2023

[FA-2] Revert suggestion that broke FA2 fine-tuning with quantized models #26916

Merged

ArthurZucker mentioned this pull request Oct 23, 2023

[fix] llama_dtype_fix triggered when flash attention is on #26984

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[`FA-2`] Final fix for FA2 dtype #26846

[`FA-2`] Final fix for FA2 dtype #26846

younesbelkada commented Oct 16, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 16, 2023

ArthurZucker left a comment

ArthurZucker Oct 17, 2023

younesbelkada Oct 18, 2023

[FA-2] Final fix for FA2 dtype #26846

[FA-2] Final fix for FA2 dtype #26846

Conversation

younesbelkada commented Oct 16, 2023 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Oct 16, 2023

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Oct 17, 2023

Choose a reason for hiding this comment

younesbelkada Oct 18, 2023

Choose a reason for hiding this comment

[`FA-2`] Final fix for FA2 dtype #26846

[`FA-2`] Final fix for FA2 dtype #26846

younesbelkada commented Oct 16, 2023 •

edited

Loading