Question on DPO's `concatenated_forward` #1113

DaehanKim · 2023-12-20T08:44:05Z

Hi!

It seems like chosen_input_ids and rejected_input_ids (and other inputs as well) are concatenated and fed into the behavior model to get concatenated logits : [chosen_logits | rejected_logits]

Doesn't it give a slight bias of looking at chosen_input_ids when computing rejected_logits? I wonder whether the impact of this on DPO performance is negligible compared to its efficiency trade-off.

Any comments would be appreciated! Thanks.

The text was updated successfully, but these errors were encountered:

lvwerra · 2023-12-21T15:27:36Z

tagging @kashif here :)

raghavgarg97 · 2023-12-27T07:35:07Z

@DaehanKim i think there may be slight confusion here..It is only concatenating inputs at batch level to avoid multiple passes..the calculation for chosen and rejected logits is independent.

DaehanKim · 2023-12-27T08:16:41Z

@raghavgarg97
I thought it was sequence level concat. Thanks for the correction!

lvwerra added the 🏋 DPO Related to DPO label Dec 21, 2023

DaehanKim closed this as completed Dec 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on DPO's `concatenated_forward` #1113

Question on DPO's `concatenated_forward` #1113

DaehanKim commented Dec 20, 2023

lvwerra commented Dec 21, 2023

raghavgarg97 commented Dec 27, 2023

DaehanKim commented Dec 27, 2023

Question on DPO's concatenated_forward #1113

Question on DPO's concatenated_forward #1113

Comments

DaehanKim commented Dec 20, 2023

lvwerra commented Dec 21, 2023

raghavgarg97 commented Dec 27, 2023

DaehanKim commented Dec 27, 2023

Question on DPO's `concatenated_forward` #1113

Question on DPO's `concatenated_forward` #1113