Add tie_weights() to LM heads and set bias in set_output_embeddings() #28948

hackyon · 2024-02-09T20:35:29Z

What does this PR do?

This fixes a bug from the wrong bias in prediction heads in some situations. The predictions.bias needs to be tied to predictions.decoder.bias inside tie_weights().

Repro Steps:

Sync to HEAD in main
Add the following test case to test_modeling_bert.py and run it:

    def test_save_load_bert_prediction_head(self):
        with tempfile.TemporaryDirectory() as tmpdirname:
            model_to_save = BertForMaskedLM.from_pretrained("bert-base-uncased")

            model_to_save.save_pretrained(tmpdirname)

            model = BertForMaskedLM.from_pretrained(
                tmpdirname,
                low_cpu_mem_usage=True,
            )
            model.to(torch_device)

Error is thrown:

FAILED tests/models/bert/test_modeling_bert.py::BertModelTest::test_save_load_bert_prediction_head - NotImplementedError: Cannot copy out of meta tensor; no data!

    def convert(t):
        if convert_to_format is not None and t.dim() in (4, 5):
            return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
                        non_blocking, memory_format=convert_to_format)
>       return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
E       NotImplementedError: Cannot copy out of meta tensor; no data!

../venv/lib/python3.9/site-packages/torch/nn/modules/module.py:1158: NotImplementedError

The issue was uncovered in #28802.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@ArthurZucker @amyeroberts

amyeroberts

Thanks for digging into this and fixing for all these models!

Just two things I think we need to do before merging:

Run the slow model tests for these models to confirm this change is backwards compatible.
Apply the test to all models.

amyeroberts · 2024-02-13T19:12:46Z

tests/models/bert/test_modeling_bert.py

@@ -600,6 +600,24 @@ def test_model_from_pretrained(self):
            model = BertModel.from_pretrained(model_name)
            self.assertIsNotNone(model)

+    @slow
+    def test_save_and_load_low_cpu_mem_usage(self):


This should be applied to all the models. Could you move the test to ModelTesterMixin?

Done.

I also noticed that there were other non-slow tests in ModelTesterMixin that loaded/saved pretrained models.

I had a hunch that the saving/loading should actually be pretty fast, and so ran the tests without the @slow tag. The before/after runs on CircleCI showed similar timings for tests_torch, so my 2 cents is that we don't need to mark this test as @slow.

Before run (tests_torch ran in 6m58s):
https://app.circleci.com/pipelines/github/huggingface/transformers/84140/workflows/0998b8ba-9336-49e5-9680-2ddd86443669

After run (tests_torch ran in 6m48s):
https://app.circleci.com/pipelines/github/huggingface/transformers/84451/workflows/f5efa9af-c079-4303-8611-574f8a3bf7bd

The bias were not tied correctly in some LM heads, and this change should fix that.

hackyon · 2024-02-14T17:01:33Z

tests/models/deta/test_modeling_deta.py

@@ -520,6 +520,10 @@ def test_initialization(self):
                        msg=f"Parameter {name} of model {model_class} seems not properly initialized",
                    )

+    @unittest.skip("Cannot be initialized on meta device as some weights are modified during the initialization")


This came from an existing comment:

transformers/src/transformers/models/deta/modeling_deta.py

Line 1780 in 354775b

# We can't initialize the model on meta device as some weights are modified during the initialization

It seems like the DetaForObjectDetection model has logic to manipulate the params during init, and so it simply doesn't support initializing on a meta device.

With that said, should we add some kind of flag in PreTrainedModel that would throw an error when trying to initial on meta device for models that do not support it? So far, it seems like DetaModel is the only one, so maybe not worth the effort to do so.

cc @SunMarc who added this: #27089

Let's keep as-is for the moment, until we have more affected models. DETR and DETR-like models are special but I think they're the only case.

…nnot init on meta device

amyeroberts

Great work - thanks for adding this!

amyeroberts · 2024-02-14T19:57:38Z

tests/models/deta/test_modeling_deta.py

@@ -520,6 +520,10 @@ def test_initialization(self):
                        msg=f"Parameter {name} of model {model_class} seems not properly initialized",
                    )

+    @unittest.skip("Cannot be initialized on meta device as some weights are modified during the initialization")


Let's keep as-is for the moment, until we have more affected models. DETR and DETR-like models are special but I think they're the only case.

HuggingFaceDocBuilderDev · 2024-02-14T20:17:45Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…huggingface#28948) * Add tie_weights() to LM heads and set bias in set_output_embeddings() The bias were not tied correctly in some LM heads, and this change should fix that. * Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin * Adding _tie_weights() to MPNet and Vilt * Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device * Rename to test name to save_load to match the convention

…ddings() (huggingface#28948)" This reverts commit 725f4ad.

* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)" This reverts commit 725f4ad. * Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)" This reverts commit 4156f51.

…#28948) * Add tie_weights() to LM heads and set bias in set_output_embeddings() The bias were not tied correctly in some LM heads, and this change should fix that. * Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin * Adding _tie_weights() to MPNet and Vilt * Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device * Rename to test name to save_load to match the convention

* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)" This reverts commit 725f4ad. * Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)" This reverts commit 4156f51.

hackyon mentioned this pull request Feb 9, 2024

[BERT] Add support for sdpa #28802

Merged

5 tasks

amyeroberts reviewed Feb 13, 2024

View reviewed changes

hackyon added 2 commits February 14, 2024 10:41

Add tie_weights() to LM heads and set bias in set_output_embeddings()

e1c153b

The bias were not tied correctly in some LM heads, and this change should fix that.

Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin

e40f605

hackyon force-pushed the fix-bert-tie-weights branch from eafefb4 to e40f605 Compare February 14, 2024 15:43

Adding _tie_weights() to MPNet and Vilt

ce82fe4

hackyon commented Feb 14, 2024

View reviewed changes

Skip test for low cpu mem usage for Deta/DeformableDetr since they ca…

e454391

…nnot init on meta device

hackyon force-pushed the fix-bert-tie-weights branch 2 times, most recently from 59b1fa2 to e454391 Compare February 14, 2024 17:45

Rename to test name to save_load to match the convention

aa6e578

amyeroberts approved these changes Feb 14, 2024

View reviewed changes

amyeroberts merged commit 725f4ad into huggingface:main Feb 14, 2024
18 checks passed

hackyon deleted the fix-bert-tie-weights branch February 14, 2024 20:43

This was referenced Feb 15, 2024

Patch to skip failing test_save_load_low_cpu_mem_usage tests #29043

Merged

Skip failing test_save_load_low_cpu_mem_usage tests #29110

Closed

Skipping test_save_load_low_cpu_mem_usage() for all failing models #29118

Closed

amyeroberts added a commit to amyeroberts/transformers that referenced this pull request Feb 20, 2024

Revert "Add tie_weights() to LM heads and set bias in set_output_embe…

858a23f

…ddings() (huggingface#28948)" This reverts commit 725f4ad.

amyeroberts mentioned this pull request Feb 20, 2024

Revert low cpu mem tie weights #29135

Merged

hackyon mentioned this pull request Feb 20, 2024

Adding _tie_weights() to prediction heads to support low_cpu_mem_usage=True #29024

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tie_weights() to LM heads and set bias in set_output_embeddings() #28948

Add tie_weights() to LM heads and set bias in set_output_embeddings() #28948

hackyon commented Feb 9, 2024 •

edited

Loading

amyeroberts left a comment

amyeroberts Feb 13, 2024

hackyon Feb 14, 2024

hackyon Feb 14, 2024

amyeroberts Feb 14, 2024

amyeroberts left a comment

amyeroberts Feb 14, 2024

HuggingFaceDocBuilderDev commented Feb 14, 2024

Add tie_weights() to LM heads and set bias in set_output_embeddings() #28948

Add tie_weights() to LM heads and set bias in set_output_embeddings() #28948

Conversation

hackyon commented Feb 9, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Feb 13, 2024

Choose a reason for hiding this comment

hackyon Feb 14, 2024

Choose a reason for hiding this comment

hackyon Feb 14, 2024

Choose a reason for hiding this comment

amyeroberts Feb 14, 2024

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Feb 14, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 14, 2024

hackyon commented Feb 9, 2024 •

edited

Loading