-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug of PPO #1072
Comments
it should be like this: |
Why the negative value causes failure in actor loss? |
I drawed the loss polt and reward plot, when there is a very small negative value, such as 1e-10, the loss will be extremly larger than normally, and the reward stoped increase. |
Sorry for the late reply. What you mentioned might be caused by some numerical issues in tf.minimum if I understood correctly. Could you please print out an example case and paste it here? I'm a bit confused by your description since you mentioned both large negative value (-1e10) and small positive value (1e-10). A case showing how it causes a large loss value would be great. |
ratio = tf.exp(pi.log_prob(action) - old_pi.log_prob(action))
surr = ratio * adv
...
loss = -tf.reduce_mean( tf.minimum(surr, tf.clip_by_value(ratio, 1. - self.epsilon, 1. + self.epsilon) * adv) )
should use ratio in tf.minimum rather than surr, because surr=ration*adv, and there could be negative value in adv, so the result of tf.minimum may contain a value like -1e10, and cause actor's loss failed.
The text was updated successfully, but these errors were encountered: