[bug] objective/entropy < 0 when using rlootrainer and ppotrainer #2496
Labels
🙋 help from community wanted
Open invitation for community members to contribute
🏋 PPO
Related to PPO
❓ question
Seeking clarification or more information
🏋 RLOO
Related to RLOO
trl/trl/trainer/rloo_trainer.py
Line 443 in 1661bc2
This is because in the previous code, the padding part of logprod is filled with 1.
INVALID_LOGPROB=1.0
logprobs = torch.masked_fill(logprobs, padding_mask, INVALID_LOGPROB)
I don't know why INVALID_LOGPROB is set to 1, wouldn't it work fine if it is set to 0?
The text was updated successfully, but these errors were encountered: