Releases · vwxyzjn/cleanrl

14 Nov 04:06

vwxyzjn

v1.0.0

c37a3ec

Latest

🎉 We are thrilled to announce the v1.0.0 CleanRL Release. Along with our CleanRL paper's recent publication in Journal of Machine Learning Research, our v1.0.0 release includes reworked documentation, new algorithm variants, support for google's new ML framework JAX, hyperparameter tuning utilities, and more. CleanRL has come a long way making high-quality deep reinforcement learning implementations easy to understand and reproducible. This release is a major milestone for the project and we are excited to share it with you. Over 90 PRs were merged to make this release possible. We would like to thank all the contributors who made this release possible.

Reworked documentation

One of the biggest change of the v1 release is the added documentation at docs.cleanrl.dev. Having great documentation is important for building a reliable and reproducible project. We have reworked the documentation to make it easier to understand and use. For each implemented algorithm, we have documented as much as we can to promote transparency:

Here is a list of the algorithm variants and their documentation:

Algorithm	Variants Implemented
✅ Proximal Policy Gradient (PPO)	`ppo.py`, docs
	`ppo_atari.py`, docs
	`ppo_continuous_action.py`, docs
	`ppo_atari_lstm.py`, docs
	`ppo_atari_envpool.py`, docs
	`ppo_atari_envpool_xla_jax.py`, docs
	`ppo_procgen.py`, docs
	`ppo_atari_multigpu.py`, docs
	`ppo_pettingzoo_ma_atari.py`, docs
	`ppo_continuous_action_isaacgym.py`, docs
✅ Deep Q-Learning (DQN)	`dqn.py`, docs
	`dqn_atari.py`, docs
	`dqn_jax.py`, docs
	`dqn_atari_jax.py`, docs
✅ Categorical DQN (C51)	`c51.py`, docs
	`c51_atari.py`, docs
✅ Soft Actor-Critic (SAC)	`sac_continuous_action.py`, docs
✅ Deep Deterministic Policy Gradient (DDPG)	`ddpg_continuous_action.py`, docs
	`ddpg_continuous_action_jax.py`, docs
✅ Twin Delayed Deep Deterministic Policy Gradient (TD3)	`td3_continuous_action.py`, docs
	`td3_continuous_action_jax.py`, docs
✅ Phasic Policy Gradient (PPG)	`ppg_procgen.py`, docs
✅ Random Network Distillation (RND)	`ppo_rnd_envpool.py`, docs

We also improved the contribution guide to make it easier for new contributors to get started. We are still working on improving the documentation. If you have any suggestions, please let us know in the GitHub Issues.

New algorithm variants, support for JAX

We now support JAX-based learning algorithm variants, which are usually faster than the torch equivalent! Here are the docs of the new JAX-based DQN, TD3, and DDPG implementations:

dqn_atari_jax.py @kinalmehta in vwxyzjn/cleanrl#222
- about 25% faster than dqn_atari.py.
td3_continuous_action_jax.py by @joaogui1 in vwxyzjn/cleanrl#225
- about 2.5-4x faster than td3_continuous_action.py.
ddpg_continuous_action_jax.py by @vwxyzjn in vwxyzjn/cleanrl#187
- about 2.5-4x faster than ddpg_continuous_action.py.
ppo_atari_envpool_xla_jax.py by @vwxyzjn in vwxyzjn/cleanrl#227
- about 3x faster than openai/baselines' PPO.

For example, below are the benchmark of DDPG + JAX (see docs here for further detail):

Other new algorithm variants include multi-GPU PPO, PPO prototype that works with Isaac Gym, multi-agent Atari PPO, and refactored PPG and PPO-RND implementations:

ppo_atari_multigpu.pu by @vwxyzjn in vwxyzjn/cleanrl#178
- about 34% faster than ppo_atari.py which uses SyncVectorEnv.
ppo_continuous_action_isaacgym.py by @vwxyzjn in vwxyzjn/cleanrl#233
- achieves 4000+ score and 30M steps on IsaacGymEnvs' Ant in 4 mins.
ppo_pettingzoo_ma_atari.py by @vwxyzjn in vwxyzjn/cleanrl#188
- achieves ~4000 episodic length (not episodic return) in Pong, creating competitive self play agents.
ppg_procgen.py by @Dipamc77 in vwxyzjn/cleanrl#186
- matches openai/baselines' PPO performance in StarPilot (easy), BossFight (easy), and BigFish (easy).
[ppo_rnd_envpoolpy.py](https://docs.cleanrl.dev/rl-algorithms/ppo-rnd/#ppo_rnd_env...

Contributors

cool-RR, jseppanen, and 12 other contributors

Assets 2

03 Oct 19:41

vwxyzjn

v1.0.0b2

49168b8

v1.0.0b2 JAX Support and Hyperparameter Tuning

🎉 I am thrilled to announce the v1.0.0b2 CleanRL Beta Release. This new release comes with exciting new features. First, we now support JAX-based learning algorithms, which are usually faster than the torch equivalent! Here are the docs of the new JAX-based DQN, TD3, and DDPG implementations:

Also, we now have preliminary support for hyperparameter tuning via optuna (see docs), which is designed to help researchers to find a single set of hyperparameters that work well with a kind of games. The current API looks like below:

import optuna
from cleanrl_utils.tuner import Tuner
tuner = Tuner(
    script="cleanrl/ppo.py",
    metric="charts/episodic_return",
    metric_last_n_average_window=50,
    direction="maximize",
    aggregation_type="average",
    target_scores={
        "CartPole-v1": [0, 500],
        "Acrobot-v1": [-500, 0],
    },
    params_fn=lambda trial: {
        "learning-rate": trial.suggest_loguniform("learning-rate", 0.0003, 0.003),
        "num-minibatches": trial.suggest_categorical("num-minibatches", [1, 2, 4]),
        "update-epochs": trial.suggest_categorical("update-epochs", [1, 2, 4, 8]),
        "num-steps": trial.suggest_categorical("num-steps", [5, 16, 32, 64, 128]),
        "vf-coef": trial.suggest_uniform("vf-coef", 0, 5),
        "max-grad-norm": trial.suggest_uniform("max-grad-norm", 0, 5),
        "total-timesteps": 100000,
        "num-envs": 16,
    },
    pruner=optuna.pruners.MedianPruner(n_startup_trials=5),
    sampler=optuna.samplers.TPESampler(),
)
tuner.tune(
    num_trials=100,
    num_seeds=3,
)

Besides, we added support for new algorithms/environments, which are

Isaac Gym support in PPO for GPU accelerated robotics environment. ppo_continuous_action_isaacgym.py
Random Network Distillation (RND) for highly exploratory environments: ppo_rnd_envpool.py

I would like to cordially thank the core dev members @dosssman @yooceii @dipamc @kinalmehta for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @cool-RR, @Howuhh, @jseppanen, @joaogui1, @kinalmehta, and @ALPH2H.

New CleanRL Supported Publications

Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk, Viktor Makoviychuk, Zichen Liu, Yufan Song, Ting Luo, Yukun Jiang, Zhongwen Xu, & Shuicheng YAN (2022). EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track. https://openreview.net/forum?id=BubxnHpuMbG

New Features PR

prototype jax with ddpg by @vwxyzjn in #187
Isaac Gym Envs PPO updates by @vwxyzjn in #233
JAX TD3 prototype by @joaogui1 in #225
prototype jax with dqn by @kinalmehta in #222
Poetry 1.2 by @vwxyzjn in #271
Add rnd_ppo.py documentation and refactor by @yooceii in #151
Hyperparameter optimization by @vwxyzjn in #228
Update the hyperparameter optimization example script by @vwxyzjn in #268

Bug Fixes PR

Td3 ddpg action bound fix by @dosssman in #211
added gamma to reward normalization wrappers by @Howuhh in #209
Seed envpool environment explicitly by @jseppanen in #238
Fix PPO + Isaac Gym Benchmark Script by @vwxyzjn in #243
Fix for noise sampling for the TD3 exploration by @dosssman in #260

Documentation PR

Add a note on PPG's performance by @vwxyzjn in #199
Clarify CleanRL is a non-modular library by @vwxyzjn in #200
Fix documentation link by @vwxyzjn in #213
JAX + DDPG docs fix by @vwxyzjn in #229
Fix links in docs for ppo_continuous_action_isaacgym.py by @vwxyzjn in #242
Fix docs (badge, TD3 + JAX, and DQN + JAX) by @vwxyzjn in #246
Fix typos by @ALPH2H in #282
Fix docs links in README.md by @vwxyzjn in #254
chore: remove unused parameters in jax implementations by @kinalmehta in #264

Misc PR

Show correct exception cause by @cool-RR in #205
Remove pettingzoo's pistonball example by @vwxyzjn in #214
Leverage CI to speed up poetry lock by @vwxyzjn in #235
Ubuntu runner for poetry lock by @vwxyzjn in #236
Remove the github pages CI in favor of vercel by @vwxyzjn in #241
Clarify LICENSE info by @vwxyzjn in #253
Update published paper citation by @vwxyzjn in #284
Refactor dqn word choice by @vwxyzjn in #257

New Contributors

@cool-RR made their first contribution in #205
@Howuhh made their first contribution in #209
@jseppanen made their first contribution in #238
@joaogui1 made their first contribution in #225
@kinalmehta made their first contribution in #222
@ALPH2H made their first contribution in #282

Full Changelog: v1.0.0b1...v1.0.0b2

Contributors

cool-RR, jseppanen, and 8 other contributors

Assets 2

07 Jun 00:18

vwxyzjn

v1.0.0b1

ee262da

v1.0.0b1 CleanRL Beta Release 🎉

🎉 I am thrilled to announce the v1.0.0b1 CleanRL Beta Release. CleanRL has come a long way making high-quality deep reinforcement learning implementations easy to understand. In this release, we have put a huge effort into revamping our documentation site, making our implementation friendly to use for new users.

I would like to cordially thank the core dev members @dosssman @yooceii @Dipamc77 @bragajj for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @ElliotMunro200 and @Dipamc77.

New CleanRL supported publications

Huang, S., Dossa, R., Raffin, A., Kanervisto, A., & Wang, W. (2022). The 37 Implementation Details of Proximal Policy Optimization, International Conference on Learning Representations 2022 Blog Post Track
Huang, S., Ontañón, S., (2022). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms, The International FLAIRS Conference Proceedings, 35.
Schmidt, D., & Schmied, T. (2021). Fast and Data-Efficient Training of Rainbow: an Experimental Study on Atari Deep Reinforcement Learning Workshop at the 35th Conference on Neural Information Processing Systems

New algorithm variants

Match PPG implementation by @Dipamc77 in #186
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppg/
Proper multi-gpu support with PPO by @vwxyzjn in #178
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy
Support Pettingzoo Multi-agent Atari envs with PPO by @vwxyzjn in #188
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy

Refactoring changes

Let ppo_continuous_action.pyonly run 1M steps by @vwxyzjn in #161
Change ppo.py's default timesteps by @vwxyzjn in #164
Enable video recording for ppo_procgen.py by @vwxyzjn in #166
Refactor replay based scripts by @vwxyzjn in #173

Documentation changes

A significant amount of documentation changes (tracked by #121).

See the overview documentation page here: https://docs.cleanrl.dev/rl-algorithms/overview/

Add ddpg_continuous_action.py docs by @vwxyzjn in #137
Fix DDPG docs' description by @vwxyzjn in #139
Fix typo in DDPG docs by @vwxyzjn in #140
Fix incorrect links in the DDPG docs by @vwxyzjn in #142
DDPG documnetation tweaks; added Q loss equations and light explanation by @dosssman in #145
Add dqn_atari.py documentation by @vwxyzjn in #124
Add documentation for td3_continuous_action.py by @vwxyzjn in #141
SAC Documentation - Benchmarks - Minor code tweaks by @dosssman in #146
Add docs for c51.py and c51_atari.py by @vwxyzjn in #159
Add docs for dqn.py by @vwxyzjn in #157
Address stale documentation by @vwxyzjn in #169
Documentation improvement - fix links and mkdocs by @vwxyzjn in #181
Improve documentation and contribution guide by @vwxyzjn in #189
Fix documentation links in README.md by @vwxyzjn in #192
Fix the implemented varaints section in PPO by @vwxyzjn in #193

Misclanouse changes

Add Pull Request template by @vwxyzjn in #122
Amend license to give proper attribution by @vwxyzjn in #152
Introduce better contribution guide by @vwxyzjn in #154
Fix the default wandb project name in ppo_atari_envpool.py by @vwxyzjn in #160
Removes unmaintained scripts by @vwxyzjn in #170
Add PPO documentation by @vwxyzjn in #163
Add docs header by @vwxyzjn in #174
Update README.md by @ElliotMunro200 in #177
Update issue_template.md by @vwxyzjn in #180
Temporarily Remove PPO-RND by @vwxyzjn in #190

Utility changes

Export requirements.txt automatically by @vwxyzjn in #143
Auto-upgrade syntax via pyupgrade by @vwxyzjn in #158
Introduce benchmark utilities by @vwxyzjn in #165

New Contributors

@ElliotMunro200 made their first contribution in #177
@Dipamc77 made their first contribution in #186

Full Changelog: v0.6.0...v1.0.0b1

Contributors

vwxyzjn, dipamc, and 4 other contributors

Assets 2

16 Mar 15:07

vwxyzjn

v0.6.0

d5256e4

v0.6.0 Major Refactoring

What's Changed

Update paper citation entry by @vwxyzjn in #91
Clean up stale files by @vwxyzjn in #95
Refactor formats in parse_args by @vwxyzjn in #78
Add Gitpod support by @vwxyzjn in #94
Reorganize README.md by @vwxyzjn in #93
Downgrade setuptools by @vwxyzjn in #98
Fix readme links by @vwxyzjn in #104
Refactor value based methods by @vwxyzjn in #102
Introduce pre-commit pipelines by @vwxyzjn in #107
Refactor PPG and PPO for procgen by @vwxyzjn in #108
Update documentation on PPG and PPO Procgen by @vwxyzjn in #112
Add PPO Atari LSTM example by @vwxyzjn in #83
Prototype Envpool Support by @vwxyzjn in #100
Fix replay buffer compatibility with mujoco envs by @vwxyzjn in #113
Add the isort and black badges by @vwxyzjn in #119
Refactor parse_args() by @vwxyzjn in #118
Add ppo.py documentation by @vwxyzjn in #120
Replace episode_reward with episodic_return by @vwxyzjn in #125
Refactor ppo_pettingzoo.py by @vwxyzjn in #128
Update gym to 0.23.0 by @vwxyzjn in #129
Add SPS and q-values metrics for value-based methods by @vwxyzjn in #126
Make seed work again in value methods by @vwxyzjn in #134
Remove offline DQN scripts by @vwxyzjn in #135
Deprecate apex_dqn_atari.py by @vwxyzjn in #136
Update to gym==0.23.1 by @vwxyzjn in #138

Full Changelog: v0.5.0...v0.6.0

Contributors

vwxyzjn

Assets 2

12 Nov 16:01

vwxyzjn

v0.5.0

679e498

v0.5.0

What's Changed

Use Poetry as the package manager by @vwxyzjn in #50
Remove links to deleted code on README algorithms by @FelipeMartins96 in #54
Add paper plotting utilities by @vwxyzjn in #55
Reorganization of files. by @vwxyzjn in #56
Bump Gym's version to 0.21.0 by @vwxyzjn in #61
Automatically Download Atari Roms by @vwxyzjn in #62
Make Spyder Editor Optional by @vwxyzjn in #66
Support Python 3.7.1+ by @vwxyzjn in #67
ddpg_continuous: Addded env argument to actor and target actor by @dosssman in #69
Add pytest as an optional dependency by @vwxyzjn in #71
Remove SB3 dependency in ppo_continuous_action.py by @vwxyzjn in #72
Add e2e tests by @vwxyzjn in #70
Fix #74 SAC consistency in logging and training to match other scripts by @dosssman in #75
Add MuJoCo environments support. by @vwxyzjn in #76
Only run tests given changes to the cleanrl directory by @vwxyzjn in #77
Prototype Documentation Site by @vwxyzjn in #64
Cloud Utilities Improvement by @vwxyzjn in #65
Import built docker image to local registry by @vwxyzjn in #80
Remove docker dummy cache by @vwxyzjn in #81
Allow buildx to save to local and push by @vwxyzjn in #82
Rollback back PyTorch version for better compatibility by @vwxyzjn in #84
Cloud utilities refactor by @vwxyzjn in #85
Prepare for 0.5.0 release by @vwxyzjn in #88

New Contributors

@FelipeMartins96 made their first contribution in #54

Full Changelog: v0.4.8...v0.5.0

Contributors

vwxyzjn, dosssman, and FelipeMartins96

Assets 2

16 May 02:41

vwxyzjn

v0.4.8

61a3b97

v0.4.8

update docs

Assets 2

12 May 14:26

vwxyzjn

v0.4.7

8d8d0c8

v0.4.7

Merge branch 'master' of https://github.com/vwxyzjn/cleanrl

Assets 2

12 May 13:39

vwxyzjn

v0.4.6

0156ba2

v0.4.6

cloud integration

Assets 2

19 Apr 04:32

vwxyzjn

v0.4.5

9817704

v0.4.5

update setup.py

Assets 2

16 Apr 16:28

vwxyzjn

v0.4.4

6c9b371

v0.4.4 Pre-release

Pre-release

add reproduce utility script

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reworked documentation

New algorithm variants, support for JAX

Contributors

New CleanRL Supported Publications

New Features PR

Bug Fixes PR

Documentation PR

Misc PR

New Contributors

Contributors

New CleanRL supported publications

New algorithm variants

Refactoring changes

Documentation changes

Misclanouse changes

Utility changes

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: vwxyzjn/cleanrl

v1.0.0 CleanRL Release 🎉

Reworked documentation

New algorithm variants, support for JAX

Contributors

v1.0.0b2 JAX Support and Hyperparameter Tuning

New CleanRL Supported Publications

New Features PR

Bug Fixes PR

Documentation PR

Misc PR

New Contributors

Contributors

v1.0.0b1 CleanRL Beta Release 🎉

New CleanRL supported publications

New algorithm variants

Refactoring changes

Documentation changes

Misclanouse changes

Utility changes

New Contributors

Contributors

v0.6.0 Major Refactoring

What's Changed

Contributors

v0.5.0

What's Changed

New Contributors

Contributors

v0.4.8

v0.4.7

v0.4.6

v0.4.5

v0.4.4