generated from fastai/nbdev_template
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Judges] use the pair-judges in online-preference trainers (#2243)
* use the pair-judges * add test * Update trl/trainer/online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update trl/trainer/online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * decode and skip special characters * initial nash * return tensors * Update trl/trainer/online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update trl/trainer/online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * Update trl/trainer/online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * add back the logging * use batch_decode * add judges api to XPO trainer * Update tests/test_online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * judge in examples * judge in config * add back logs when using reward model * typo * add back model_scores logging when using reward model * log scores for reward model only * better cond on what to log * same for rlhf reward * Update trl/trainer/online_dpo_trainer.py Co-authored-by: Quentin Gallouédec <[email protected]> * use decode_and_strip_padding * error if both reward and judge or none are set * remove unused check * Uniform way to pass conversation into judge * heading -> leading * LogCompletionsCallback compat with online method * Update Online DPO doc * check if data is conversational for judges * update example * remove comment * use zip * fix stats xpo * Replace judge with PairRMJudge and import AutoModelForSequenceClassification * update xpo documentation * Remove doc duplication * update nash doc * XPO trl chat * nash md doc * HfPairwiseJudge --------- Co-authored-by: Quentin Gallouédec <[email protected]> Co-authored-by: Quentin Gallouédec <[email protected]>
- Loading branch information
1 parent
1699473
commit 9c376c5
Showing
15 changed files
with
502 additions
and
161 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.