Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] objective/entropy < 0 when using rlootrainer and ppotrainer #2496

Open
macheng6 opened this issue Dec 17, 2024 · 1 comment
Open

[bug] objective/entropy < 0 when using rlootrainer and ppotrainer #2496

macheng6 opened this issue Dec 17, 2024 · 1 comment
Labels
🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO

Comments

@macheng6
Copy link

mean_entropy = (-logprobs).sum(1).mean()

This is because in the previous code, the padding part of logprod is filled with 1.

INVALID_LOGPROB=1.0
logprobs = torch.masked_fill(logprobs, padding_mask, INVALID_LOGPROB)

I don't know why INVALID_LOGPROB is set to 1, wouldn't it work fine if it is set to 0?

@asparius
Copy link
Contributor

This has been noted previously #2281. I believe this was introduced in PPOv2 which was replication of the openai tldr paper which also contains this INVALID_LOGPROB=1.0 which does not break training because it cancels out at kl reward. Perhaps @vwxyzjn can tell why this was used, instead of masked_mean version

@qgallouedec qgallouedec added 🙋 help from community wanted Open invitation for community members to contribute ❓ question Seeking clarification or more information 🏋 PPO Related to PPO 🏋 RLOO Related to RLOO labels Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🙋 help from community wanted Open invitation for community members to contribute 🏋 PPO Related to PPO ❓ question Seeking clarification or more information 🏋 RLOO Related to RLOO
Projects
None yet
Development

No branches or pull requests

3 participants