Skip to content

CleanRL v0.4.0

Compare
Choose a tag to compare
@vwxyzjn vwxyzjn released this 24 Sep 02:50
· 367 commits to master since this release
f4ec8af

What's new in the 0.4.0 release

Atari Results

gym_id apex_dqn_atari_visual c51_atari_visual dqn_atari_visual ppo_atari_visual
BeamRiderNoFrameskip-v4 2936.93 ± 362.18 13380.67 ± 0.00 7139.11 ± 479.11 2053.08 ± 83.37
QbertNoFrameskip-v4 3565.00 ± 690.00 16286.11 ± 0.00 11586.11 ± 0.00 17919.44 ± 383.33
SpaceInvadersNoFrameskip-v4 1019.17 ± 356.94 1099.72 ± 14.72 935.40 ± 93.17 1089.44 ± 67.22
PongNoFrameskip-v4 19.06 ± 0.83 18.00 ± 0.00 19.78 ± 0.22 20.72 ± 0.28
BreakoutNoFrameskip-v4 364.97 ± 58.36 386.10 ± 21.77 353.39 ± 30.61 380.67 ± 35.29

Mujoco Results

gym_id ddpg_continuous_action td3_continuous_action ppo_continuous_action
Reacher-v2 -6.25 ± 0.54 -6.65 ± 0.04 -7.86 ± 1.47
Pusher-v2 -44.84 ± 5.54 -59.69 ± 3.84 -44.10 ± 6.49
Thrower-v2 -137.18 ± 47.98 -80.75 ± 12.92 -58.76 ± 1.42
Striker-v2 -193.43 ± 27.22 -269.63 ± 22.14 -112.03 ± 9.43
InvertedPendulum-v2 1000.00 ± 0.00 443.33 ± 249.78 968.33 ± 31.67
HalfCheetah-v2 10386.46 ± 265.09 9265.25 ± 1290.73 1717.42 ± 20.25
Hopper-v2 1128.75 ± 9.61 3095.89 ± 590.92 2276.30 ± 418.94
Swimmer-v2 114.93 ± 29.09 103.89 ± 30.72 111.74 ± 7.06
Walker2d-v2 1946.23 ± 223.65 3059.69 ± 1014.05 3142.06 ± 1041.17
Ant-v2 243.25 ± 129.70 5586.91 ± 476.27 2785.98 ± 1265.03
Humanoid-v2 877.90 ± 3.46 6342.99 ± 247.26 786.83 ± 95.66

Pybullet Results

gym_id ddpg_continuous_action td3_continuous_action ppo_continuous_action
MinitaurBulletEnv-v0 -0.17 ± 0.02 7.73 ± 5.13 23.20 ± 2.23
MinitaurBulletDuckEnv-v0 -0.31 ± 0.03 0.88 ± 0.34 11.09 ± 1.50
InvertedPendulumBulletEnv-v0 742.22 ± 47.33 1000.00 ± 0.00 1000.00 ± 0.00
InvertedDoublePendulumBulletEnv-v0 5847.31 ± 843.53 5085.57 ± 4272.17 6970.72 ± 2386.46
Walker2DBulletEnv-v0 567.61 ± 15.01 2177.57 ± 65.49 1377.68 ± 51.96
HalfCheetahBulletEnv-v0 2847.63 ± 212.31 2537.34 ± 347.20 2347.64 ± 51.56
AntBulletEnv-v0 2094.62 ± 952.21 3253.93 ± 106.96 1775.50 ± 50.19
HopperBulletEnv-v0 1262.70 ± 424.95 2271.89 ± 24.26 2311.20 ± 45.28
HumanoidBulletEnv-v0 -54.45 ± 13.99 937.37 ± 161.05 204.47 ± 1.00
BipedalWalker-v3 66.01 ± 127.82 78.91 ± 232.51 272.08 ± 10.29
LunarLanderContinuous-v2 162.96 ± 65.60 281.88 ± 0.91 215.27 ± 10.17
Pendulum-v0 -238.65 ± 14.13 -345.29 ± 47.40 -1255.62 ± 28.37
MountainCarContinuous-v0 -1.01 ± 0.01 -1.12 ± 0.12 93.89 ± 0.06

Other Results

gym_id ppo dqn
CartPole-v1 500.00 ± 0.00 182.93 ± 47.82
Acrobot-v1 -80.10 ± 6.77 -81.50 ± 4.72
MountainCar-v0 -200.00 ± 0.00 -142.56 ± 15.89
LunarLander-v2 46.18 ± 53.04 144.52 ± 1.75
  • Added experimental support for Apex-DQN that is significantly faster than DQN. See https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/apex_dqn_atari_visual.py. In the game of breakout, Apex-DQN takes less than 4 hours to achieve around 360 episode reward. In contrast, it took 25 hours for DQN to reach 360 episode rewards.
    • Our implementation is a little different from the original. First, in pytorch's ecosystem there isn't a well-mainained distributed prioritized experience buffer such as https://github.com/deepmind/reverb. So instead we split a single prioritized replay buffer of size 100000 to two prioritized replay of size 50000 in different data-processors in sub-processes to prepare data for the worker. This is kind of a work around and a hack but according to our benchmark, it works empirically good and fast enough.
Benchmarked Learning Curves Atari
Metrics, logs, and recorded videos are at cleanrl.benchmark/reports/Atari
 
  • Supported CarRacing-v0 by PPO in the Experimental Domains. It is our first example with pixel observation space and continuous action space. See https://github.com/vwxyzjn/cleanrl/blob/master/cleanrl/experiments/ppo_car_racing.py.
    • During our experiments, we found the normalization of observation and reward seems to have a huge impact on PPO's performance, probably due to the large range of rewards provided by CarRacing-v0 (e.g. if dies you get -100 reward, but PPO is anecdotally sensitive to this kind of large rewards).

image