You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems like chosen_input_ids and rejected_input_ids (and other inputs as well) are concatenated and fed into the behavior model to get concatenated logits : [chosen_logits | rejected_logits]
Doesn't it give a slight bias of looking at chosen_input_ids when computing rejected_logits? I wonder whether the impact of this on DPO performance is negligible compared to its efficiency trade-off.
Any comments would be appreciated! Thanks.
The text was updated successfully, but these errors were encountered:
@DaehanKim i think there may be slight confusion here..It is only concatenating inputs at batch level to avoid multiple passes..the calculation for chosen and rejected logits is independent.
Hi!
It seems like
chosen_input_ids
andrejected_input_ids
(and other inputs as well) are concatenated and fed into the behavior model to get concatenated logits :[chosen_logits | rejected_logits]
Doesn't it give a slight bias of looking at
chosen_input_ids
when computingrejected_logits
? I wonder whether the impact of this on DPO performance is negligible compared to its efficiency trade-off.Any comments would be appreciated! Thanks.
The text was updated successfully, but these errors were encountered: