Conversational dataset support for `ORPOTrainer` #2184

qgallouedec · 2024-10-05T13:50:21Z

What does this PR do?

Part of #2071

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2024-10-05T13:53:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lewtun

Nice refactor! LGTM 🔥

lewtun · 2024-10-11T14:29:53Z

docs/source/orpo_trainer.md

@@ -2,107 +2,128 @@

 [![](https://img.shields.io/badge/All_models-ORPO-blue)](https://huggingface.co/models?other=orpo,trl)

-[Odds Ratio Preference Optimization](https://huggingface.co/papers/2403.07691) (ORPO) by Jiwoo Hong, Noah Lee, and James Thorne studies the crucial role of SFT within the context of preference alignment. Using preference data the method posits that a minor penalty for the disfavored generation together with a strong adaption signal to the chosen response via a simple log odds ratio term appended to the NLL loss is sufficient for preference-aligned SFT.
+## Overview


default learning rate

669723e

qgallouedec mentioned this pull request Oct 5, 2024

[Tracking issue] General dataset support #2071

Open

29 tasks

qgallouedec and others added 10 commits October 8, 2024 12:19

Merge branch 'main' into orpo-conversational

f025be2

Merge branch 'main' into orpo-conversational

7bbffe9

Merge branch 'main' into orpo-conversational

0588199

update trainer

6cf8d57

update test

17cdd95

update script

be579f3

update dataset format

a0b04b7

add line in dpo doc

d2c253c

update orpo doc

9b0e416

refine implicit/explicit

1af055d

qgallouedec requested review from kashif, edbeeching and lewtun October 11, 2024 11:49

qgallouedec marked this pull request as ready for review October 11, 2024 11:49

edbeeching approved these changes Oct 11, 2024

View reviewed changes

qgallouedec and others added 2 commits October 11, 2024 14:03

update demo chat

e3a3733

Merge branch 'main' into orpo-conversational

d58ce82

lewtun approved these changes Oct 11, 2024

View reviewed changes

qgallouedec merged commit d0aa421 into main Oct 11, 2024
9 of 10 checks passed

qgallouedec deleted the orpo-conversational branch October 11, 2024 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conversational dataset support for `ORPOTrainer` #2184

Conversational dataset support for `ORPOTrainer` #2184

qgallouedec commented Oct 5, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 5, 2024

lewtun left a comment

lewtun Oct 11, 2024

Conversational dataset support for ORPOTrainer #2184

Conversational dataset support for ORPOTrainer #2184

Conversation

qgallouedec commented Oct 5, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Oct 5, 2024

lewtun left a comment

Choose a reason for hiding this comment

lewtun Oct 11, 2024

Choose a reason for hiding this comment

Conversational dataset support for `ORPOTrainer` #2184

Conversational dataset support for `ORPOTrainer` #2184

qgallouedec commented Oct 5, 2024 •

edited

Loading