Skip to content

Commit

Permalink
comment
Browse files Browse the repository at this point in the history
  • Loading branch information
Eugene-hu committed Aug 25, 2023
1 parent e8a65f2 commit b156a6d
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion openvalidators/reward/dpo.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def name(self) -> str: return RewardModelType.dpo.value
def __init__(self, device: str):
super().__init__()
self.device = device
self.penalty = 1.2
self.penalty = 1.2 # Same penalty as the original [paper](https://arxiv.org/pdf/1909.05858.pdf).
self.tokenizer = AutoTokenizer.from_pretrained(DirectPreferenceRewardModel.reward_model_name)
self.model = AutoModelForCausalLM.from_pretrained(DirectPreferenceRewardModel.reward_model_name,
trust_remote_code=True,
Expand Down

0 comments on commit b156a6d

Please sign in to comment.