-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tie_weights() to LM heads and set bias in set_output_embeddings() #28948
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for digging into this and fixing for all these models!
Just two things I think we need to do before merging:
- Run the slow model tests for these models to confirm this change is backwards compatible.
- Apply the test to all models.
@@ -600,6 +600,24 @@ def test_model_from_pretrained(self): | |||
model = BertModel.from_pretrained(model_name) | |||
self.assertIsNotNone(model) | |||
|
|||
@slow | |||
def test_save_and_load_low_cpu_mem_usage(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be applied to all the models. Could you move the test to ModelTesterMixin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I also noticed that there were other non-slow tests in ModelTesterMixin that loaded/saved pretrained models.
I had a hunch that the saving/loading should actually be pretty fast, and so ran the tests without the @slow tag. The before/after runs on CircleCI showed similar timings for tests_torch, so my 2 cents is that we don't need to mark this test as @slow.
Before run (tests_torch ran in 6m58s):
https://app.circleci.com/pipelines/github/huggingface/transformers/84140/workflows/0998b8ba-9336-49e5-9680-2ddd86443669
After run (tests_torch ran in 6m48s):
https://app.circleci.com/pipelines/github/huggingface/transformers/84451/workflows/f5efa9af-c079-4303-8611-574f8a3bf7bd
The bias were not tied correctly in some LM heads, and this change should fix that.
eafefb4
to
e40f605
Compare
@@ -520,6 +520,10 @@ def test_initialization(self): | |||
msg=f"Parameter {name} of model {model_class} seems not properly initialized", | |||
) | |||
|
|||
@unittest.skip("Cannot be initialized on meta device as some weights are modified during the initialization") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came from an existing comment:
# We can't initialize the model on meta device as some weights are modified during the initialization |
It seems like the DetaForObjectDetection model has logic to manipulate the params during init, and so it simply doesn't support initializing on a meta device.
With that said, should we add some kind of flag in PreTrainedModel that would throw an error when trying to initial on meta device for models that do not support it? So far, it seems like DetaModel is the only one, so maybe not worth the effort to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep as-is for the moment, until we have more affected models. DETR and DETR-like models are special but I think they're the only case.
…nnot init on meta device
59b1fa2
to
e454391
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work - thanks for adding this!
@@ -520,6 +520,10 @@ def test_initialization(self): | |||
msg=f"Parameter {name} of model {model_class} seems not properly initialized", | |||
) | |||
|
|||
@unittest.skip("Cannot be initialized on meta device as some weights are modified during the initialization") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep as-is for the moment, until we have more affected models. DETR and DETR-like models are special but I think they're the only case.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…huggingface#28948) * Add tie_weights() to LM heads and set bias in set_output_embeddings() The bias were not tied correctly in some LM heads, and this change should fix that. * Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin * Adding _tie_weights() to MPNet and Vilt * Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device * Rename to test name to save_load to match the convention
…ddings() (huggingface#28948)" This reverts commit 725f4ad.
…#28948) * Add tie_weights() to LM heads and set bias in set_output_embeddings() The bias were not tied correctly in some LM heads, and this change should fix that. * Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin * Adding _tie_weights() to MPNet and Vilt * Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device * Rename to test name to save_load to match the convention
What does this PR do?
This fixes a bug from the wrong bias in prediction heads in some situations. The
predictions.bias
needs to be tied topredictions.decoder.bias
insidetie_weights()
.Repro Steps:
test_modeling_bert.py
and run it:The issue was uncovered in #28802.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@ArthurZucker @amyeroberts