Release v1.0.0b1 CleanRL Beta Release 🎉 · vwxyzjn/cleanrl

🎉 I am thrilled to announce the v1.0.0b1 CleanRL Beta Release. CleanRL has come a long way making high-quality deep reinforcement learning implementations easy to understand. In this release, we have put a huge effort into revamping our documentation site, making our implementation friendly to use for new users.

I would like to cordially thank the core dev members @dosssman @yooceii @Dipamc77 @bragajj for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @ElliotMunro200 and @Dipamc77.

New CleanRL supported publications

Huang, S., Dossa, R., Raffin, A., Kanervisto, A., & Wang, W. (2022). The 37 Implementation Details of Proximal Policy Optimization, International Conference on Learning Representations 2022 Blog Post Track
Huang, S., Ontañón, S., (2022). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms, The International FLAIRS Conference Proceedings, 35.
Schmidt, D., & Schmied, T. (2021). Fast and Data-Efficient Training of Rainbow: an Experimental Study on Atari Deep Reinforcement Learning Workshop at the 35th Conference on Neural Information Processing Systems

New algorithm variants

Match PPG implementation by @Dipamc77 in #186
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppg/
Proper multi-gpu support with PPO by @vwxyzjn in #178
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy
Support Pettingzoo Multi-agent Atari envs with PPO by @vwxyzjn in #188
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy

Refactoring changes

Let ppo_continuous_action.pyonly run 1M steps by @vwxyzjn in #161
Change ppo.py's default timesteps by @vwxyzjn in #164
Enable video recording for ppo_procgen.py by @vwxyzjn in #166
Refactor replay based scripts by @vwxyzjn in #173

Documentation changes

A significant amount of documentation changes (tracked by #121).

See the overview documentation page here: https://docs.cleanrl.dev/rl-algorithms/overview/

Add ddpg_continuous_action.py docs by @vwxyzjn in #137
Fix DDPG docs' description by @vwxyzjn in #139
Fix typo in DDPG docs by @vwxyzjn in #140
Fix incorrect links in the DDPG docs by @vwxyzjn in #142
DDPG documnetation tweaks; added Q loss equations and light explanation by @dosssman in #145
Add dqn_atari.py documentation by @vwxyzjn in #124
Add documentation for td3_continuous_action.py by @vwxyzjn in #141
SAC Documentation - Benchmarks - Minor code tweaks by @dosssman in #146
Add docs for c51.py and c51_atari.py by @vwxyzjn in #159
Add docs for dqn.py by @vwxyzjn in #157
Address stale documentation by @vwxyzjn in #169
Documentation improvement - fix links and mkdocs by @vwxyzjn in #181
Improve documentation and contribution guide by @vwxyzjn in #189
Fix documentation links in README.md by @vwxyzjn in #192
Fix the implemented varaints section in PPO by @vwxyzjn in #193

Misclanouse changes

Add Pull Request template by @vwxyzjn in #122
Amend license to give proper attribution by @vwxyzjn in #152
Introduce better contribution guide by @vwxyzjn in #154
Fix the default wandb project name in ppo_atari_envpool.py by @vwxyzjn in #160
Removes unmaintained scripts by @vwxyzjn in #170
Add PPO documentation by @vwxyzjn in #163
Add docs header by @vwxyzjn in #174
Update README.md by @ElliotMunro200 in #177
Update issue_template.md by @vwxyzjn in #180
Temporarily Remove PPO-RND by @vwxyzjn in #190

Utility changes

Export requirements.txt automatically by @vwxyzjn in #143
Auto-upgrade syntax via pyupgrade by @vwxyzjn in #158
Introduce benchmark utilities by @vwxyzjn in #165

New Contributors

@ElliotMunro200 made their first contribution in #177
@Dipamc77 made their first contribution in #186

Full Changelog: v0.6.0...v1.0.0b1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.0.0b1 CleanRL Beta Release 🎉