pong_A2C might backward policy-loss to value net #60

aminzakizebarjad · 2022-07-10T05:50:33Z

I was reviewing the this code, then I thought that it is possible that the policy loss might impact on value branch net.
If you take a look at line 143 which is adv_v = vals_ref_v - value_v.detach() that is computing advantage, the value_v is detached to prevent policy loss to impact on value net in the backward process, but if you consider computing vals_ref_v which is conducted by function unpack_batch, then you will find out at line 90 last_vals_v = net(last_states_v)[1] the value net is involved in computing the vals_ref_v.
In result I think that the line 143 must get changed from adv_v = vals_ref_v - value_v.detach() to adv_v = (vals_ref_v - value_v).detach()

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pong_A2C might backward policy-loss to value net #60

pong_A2C might backward policy-loss to value net #60

aminzakizebarjad commented Jul 10, 2022 •

edited

Loading

pong_A2C might backward policy-loss to value net #60

pong_A2C might backward policy-loss to value net #60

Comments

aminzakizebarjad commented Jul 10, 2022 • edited Loading

aminzakizebarjad commented Jul 10, 2022 •

edited

Loading