You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Added experimental support for Apex-DQN that is significantly faster than DQN. See https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/apex_dqn_atari_visual.py. In the game of breakout, Apex-DQN takes less than 4 hours to achieve around 360 episode reward. In contrast, it took 25 hours for DQN to reach 360 episode rewards.
Our implementation is a little different from the original. First, in pytorch's ecosystem there isn't a well-mainained distributed prioritized experience buffer such as https://github.com/deepmind/reverb. So instead we split a single prioritized replay buffer of size 100000 to two prioritized replay of size 50000 in different data-processors in sub-processes to prepare data for the worker. This is kind of a work around and a hack but according to our benchmark, it works empirically good and fast enough.
During our experiments, we found the normalization of observation and reward seems to have a huge impact on PPO's performance, probably due to the large range of rewards provided by CarRacing-v0 (e.g. if dies you get -100 reward, but PPO is anecdotally sensitive to this kind of large rewards).