[Liger] add native liger-kernel orpo loss #2482

kashif · 2024-12-15T12:55:12Z

What does this PR do?

Adds support for Liger ORPO loss kernel to the ORPO Trainer natively.

HuggingFaceDocBuilderDev · 2024-12-15T12:58:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qgallouedec · 2024-12-15T15:34:21Z

2 questions/remarks:

can you run benchmark so that we can (1) quantify the improvement and (2) check that results with and without liger are the same
we could have an additional tag for the hub when a model is trained with liger

tests/test_orpo_trainer.py

trl/trainer/orpo_trainer.py

qgallouedec · 2024-12-15T15:48:46Z

I think we should bump liger version to v0.5 (it doesn't include the loss before), see https://github.com/linkedin/Liger-Kernel/releases/tag/v0.5.0

Co-authored-by: Quentin Gallouédec <[email protected]>

…d an error

trl/trainer/orpo_trainer.py

SumanthRH

Thanks for this PR! I stumbled upon this and wanted to highlight an important point (maybe this change is in progress already, in which case, great!)

trl/trainer/orpo_trainer.py

SumanthRH · 2024-12-17T10:47:26Z

trl/trainer/orpo_trainer.py

-            loss = loss_fct(logits, labels)
-            return loss
+        if self.args.use_liger_loss:
+            # skip the lm head and get the last hidden state


Nice!

I guess we don't have much of an option beyond using a config parameter for now.

Given that we run forward pass on a submodule, it would be very nice to have some validation so that there are no unexpected failures etc with different distributed training settings. But in this case, I feel there might be compatibility issues with FSDP given the limitation from the docs: https://pytorch.org/docs/stable/fsdp.html

"FSDP does not support running the forward pass of a submodule that is contained in an FSDP instance. This is because the submodule’s parameters will be sharded, but the submodule itself is not an FSDP instance, so its forward pass will not all-gather the full parameters appropriately."

(might be fixed by just making the base model attribute an FSDP instance as well, coz why not)

Beyond that this looks fine! I have a couple nits (don't matter that much):

Does model.get_decoder() work all the time btw for AutoModelForCausalLM instances? Was wondering if that's a cleaner solution for getting the base model attribute. But I think some base model classes have some further wrapping over the actual decoder (to format outputs, etc) https://github.com/huggingface/transformers/blob/a7f5479b45a8040392af80bf1107a2bdd796931c/src/transformers/models/opt/modeling_opt.py#L1044

Maybe the config is base_model_attribute_name since its the attribute name of the base model in the CausalLM object?

thanks @SumanthRH yes you are right get_decoder() will work

Yes next is to verify the distributed training cases

kashif · 2024-12-17T12:33:34Z

tests/test_orpo_trainer.py

+    def test_orpo_trainer_with_liger(self):
+        """Test ORPO trainer with Liger loss enabled."""
+        with tempfile.TemporaryDirectory() as tmp_dir:
+            training_args = ORPOConfig(


so the plot was with the same parameters with and without the liger_loss flag... rather than in general... so trying to figure out why there is a difference between the two settings...

kashif · 2024-12-18T10:46:56Z

waiting on linkedin/Liger-Kernel#486

kashif · 2024-12-19T10:09:55Z

waiting on #2502

qgallouedec · 2024-12-19T10:33:44Z

@kashif can you share the curves once it's ready?

add native liger-kernl orpo loss

b480fff

kashif requested a review from qgallouedec December 15, 2024 12:55

qgallouedec reviewed Dec 15, 2024

View reviewed changes

tests/test_orpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec reviewed Dec 15, 2024

View reviewed changes

trl/trainer/orpo_trainer.py Show resolved Hide resolved

kashif and others added 2 commits December 15, 2024 16:52

Update tests/test_orpo_trainer.py

44aa20c

Co-authored-by: Quentin Gallouédec <[email protected]>

passing self.args.use_liger_loss without liger installed should raise…

7682e31

…d an error

qgallouedec reviewed Dec 15, 2024

View reviewed changes

trl/trainer/orpo_trainer.py Outdated Show resolved Hide resolved

qgallouedec reviewed Dec 15, 2024

View reviewed changes

trl/trainer/orpo_trainer.py Show resolved Hide resolved

kashif added 2 commits December 15, 2024 17:46

update liger version

c383bf6

make import more readable

220f754

kashif changed the title ~~[Liger] add native liger-kernl orpo loss~~ [Liger] add native liger-kernel orpo loss Dec 15, 2024

SumanthRH reviewed Dec 16, 2024

View reviewed changes

trl/trainer/orpo_trainer.py Outdated Show resolved Hide resolved

skip the lm_head when use_liger_loss is true

b3f3270

SumanthRH reviewed Dec 17, 2024

View reviewed changes

use get_decoder()

afaf5a8

qgallouedec mentioned this pull request Dec 17, 2024

[Tracking issue] Integrate native liger-kernel losses #2495

Open

5 tasks

make it a bit more robust

5776a4e

austin362667 reviewed Dec 17, 2024

View reviewed changes

Merge branch 'main' into liger-orpo

aa3c3b7

kashif added 4 commits December 19, 2024 11:35

Merge branch 'main' into liger-orpo

6f7918f

add back missing line

568e21a

pass is_enc_dec

f4979b0

call orpo_loss_fn with shifted inputs

5c6744f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Liger] add native liger-kernel orpo loss #2482

[Liger] add native liger-kernel orpo loss #2482

kashif commented Dec 15, 2024

HuggingFaceDocBuilderDev commented Dec 15, 2024

qgallouedec commented Dec 15, 2024

qgallouedec commented Dec 15, 2024

SumanthRH left a comment

SumanthRH Dec 17, 2024

kashif Dec 17, 2024

This comment was marked as outdated.

kashif Dec 17, 2024

kashif commented Dec 18, 2024

kashif commented Dec 19, 2024

qgallouedec commented Dec 19, 2024

[Liger] add native liger-kernel orpo loss #2482

Are you sure you want to change the base?

[Liger] add native liger-kernel orpo loss #2482

Conversation

kashif commented Dec 15, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Dec 15, 2024

qgallouedec commented Dec 15, 2024

qgallouedec commented Dec 15, 2024

SumanthRH left a comment

Choose a reason for hiding this comment

SumanthRH Dec 17, 2024

Choose a reason for hiding this comment

kashif Dec 17, 2024

Choose a reason for hiding this comment

This comment was marked as outdated.

kashif Dec 17, 2024

Choose a reason for hiding this comment

kashif commented Dec 18, 2024

kashif commented Dec 19, 2024

qgallouedec commented Dec 19, 2024