Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on DPO's concatenated_forward #1113

Closed
DaehanKim opened this issue Dec 20, 2023 · 3 comments
Closed

Question on DPO's concatenated_forward #1113

DaehanKim opened this issue Dec 20, 2023 · 3 comments
Labels
🏋 DPO Related to DPO

Comments

@DaehanKim
Copy link

Hi!

It seems like chosen_input_ids and rejected_input_ids (and other inputs as well) are concatenated and fed into the behavior model to get concatenated logits : [chosen_logits | rejected_logits]

Doesn't it give a slight bias of looking at chosen_input_ids when computing rejected_logits? I wonder whether the impact of this on DPO performance is negligible compared to its efficiency trade-off.

Any comments would be appreciated! Thanks.

@lvwerra lvwerra added the 🏋 DPO Related to DPO label Dec 21, 2023
@lvwerra
Copy link
Member

lvwerra commented Dec 21, 2023

tagging @kashif here :)

@raghavgarg97
Copy link

@DaehanKim i think there may be slight confusion here..It is only concatenating inputs at batch level to avoid multiple passes..the calculation for chosen and rejected logits is independent.

@DaehanKim
Copy link
Author

@raghavgarg97
I thought it was sequence level concat. Thanks for the correction!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏋 DPO Related to DPO
Projects
None yet
Development

No branches or pull requests

3 participants