Add support for OLMo's November release #34497

2015aroras · 2024-10-29T20:21:34Z

What does this PR do?

An updated OLMo model will be released in November. The new model has a few small architecture changes compared to the existing model in transformers:

RMSNorm is used instead of standard layer norm.
Norm is applied to attention queries and keys.
Norm is applied after attention/feedforward rather than before.

This PR updates the OLMo implementation in transformers to support the November release.

@ArthurZucker

Fixes #34496

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

2015aroras · 2024-10-29T21:04:03Z

Tested that an intermediate checkpoint can be converted and matches the original model (max logit diff <5e-5).

Tested that a random OLMo 1B checkpoint can still be converted and still matches the original model (max logit diff <5e-5).

ArthurZucker

Thanks for the pr!
We should create a new model with https://huggingface.co/docs/transformers/en/modular_transformers 🤗

ArthurZucker · 2024-10-30T07:48:14Z

src/transformers/models/olmo/modeling_olmo.py

+        self.q_norm = (
+            get_layer_norm(config.layer_norm_type, self.num_heads * self.head_dim, config.rms_norm_eps)
+            if config.use_q_norm
+            else None
+        )
+        self.k_norm = (
+            get_layer_norm(config.layer_norm_type, self.num_key_value_heads * self.head_dim, config.rms_norm_eps)
+            if config.use_k_norm
+            else None
+        )


hey! This goes a bit against the transformers philosophy: we never change an old model to support a new architecture! We need a Olmo2 model 🤗
with modular this should be fairly simple to implement!

Ok, I've re-implemented this new model as a new model using modular (original model unchanged). I'll put out the PR once we have decided on a suitable model name internally (we don't intend to call this release OLMo 2).

HuggingFaceDocBuilderDev · 2024-10-30T08:31:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

2015aroras · 2024-10-31T23:38:35Z

Making a fresh PR with modular transformers.

2015aroras added 7 commits October 29, 2024 10:31

Add OLMo repo updates to converter

fb348b2

Add RMSNorm functionality to Olmo

d15b814

Add norm for attention queries and keys

c1789ce

Add option to apply norm after the attention/feedforward layers

86a17e8

Read rope theta from OLMo model, in converter

02b7158

Run make style

30844ec

Ignore None values in converter

75dabae

ArthurZucker reviewed Oct 30, 2024

View reviewed changes

2015aroras closed this Oct 31, 2024

2015aroras mentioned this pull request Oct 31, 2024

Add OLMo November 2024 #34551

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for OLMo's November release #34497

Add support for OLMo's November release #34497

2015aroras commented Oct 29, 2024 •

edited

Loading

2015aroras commented Oct 29, 2024

ArthurZucker left a comment

ArthurZucker Oct 30, 2024

2015aroras Oct 30, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 30, 2024

2015aroras commented Oct 31, 2024

Add support for OLMo's November release #34497

Add support for OLMo's November release #34497

Conversation

2015aroras commented Oct 29, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

2015aroras commented Oct 29, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Oct 30, 2024

Choose a reason for hiding this comment

2015aroras Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Oct 30, 2024

2015aroras commented Oct 31, 2024

2015aroras commented Oct 29, 2024 •

edited

Loading

2015aroras Oct 30, 2024 •

edited

Loading