Dose A2C support experience replay? #7

ShaoyuanLi · 2018-09-30T08:01:16Z

I read your code and implement a version with experience replay.
However, I find that the loss explode after a few frames(almost 1000). Value loss would be very large and action loss would be very negatively large.Is it code error or A2C doesn't support experience replay in theory?

csxeba · 2019-02-16T06:26:05Z

It is an on-policy method. Old data is practically from another policy, so it isn't a very good idea to update the policy network on old samples. I'm not quite sure about the value estimator though. You might get away with using a replay buffer to train the value network only.

YangRui2015 · 2019-03-26T03:35:21Z

csxeba is right, A2C and A3C are on-policy methods. Old datas are sampled by old policy, they are clearly not from the same distribution. We usually use a replay buffer to save the data sampled from the same policy, and after update we need to clear it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dose A2C support experience replay? #7

Dose A2C support experience replay? #7

ShaoyuanLi commented Sep 30, 2018

csxeba commented Feb 16, 2019

YangRui2015 commented Mar 26, 2019

Dose A2C support experience replay? #7

Dose A2C support experience replay? #7

Comments

ShaoyuanLi commented Sep 30, 2018

csxeba commented Feb 16, 2019

YangRui2015 commented Mar 26, 2019