v1.0.0b1 CleanRL Beta Release 🎉
🎉 I am thrilled to announce the v1.0.0b1 CleanRL Beta Release. CleanRL has come a long way making high-quality deep reinforcement learning implementations easy to understand. In this release, we have put a huge effort into revamping our documentation site, making our implementation friendly to use for new users.
I would like to cordially thank the core dev members @dosssman @yooceii @Dipamc77 @bragajj for their efforts in helping maintain the CleanRL repository. I would also like to give a shout-out to our new contributors @ElliotMunro200 and @Dipamc77.
New CleanRL supported publications
- Huang, S., Dossa, R., Raffin, A., Kanervisto, A., & Wang, W. (2022). The 37 Implementation Details of Proximal Policy Optimization, International Conference on Learning Representations 2022 Blog Post Track
- Huang, S., Ontañón, S., (2022). A Closer Look at Invalid Action Masking in Policy Gradient Algorithms, The International FLAIRS Conference Proceedings, 35.
- Schmidt, D., & Schmied, T. (2021). Fast and Data-Efficient Training of Rainbow: an Experimental Study on Atari Deep Reinforcement Learning Workshop at the 35th Conference on Neural Information Processing Systems
New algorithm variants
- Match PPG implementation by @Dipamc77 in #186
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppg/
- Proper multi-gpu support with PPO by @vwxyzjn in #178
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_atari_multigpupy
- Support Pettingzoo Multi-agent Atari envs with PPO by @vwxyzjn in #188
- See the documentation here: https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy
Refactoring changes
- Let
ppo_continuous_action.py
only run 1M steps by @vwxyzjn in #161 - Change
ppo.py
's default timesteps by @vwxyzjn in #164 - Enable video recording for
ppo_procgen.py
by @vwxyzjn in #166 - Refactor replay based scripts by @vwxyzjn in #173
Documentation changes
A significant amount of documentation changes (tracked by #121).
See the overview documentation page here: https://docs.cleanrl.dev/rl-algorithms/overview/
- Add
ddpg_continuous_action.py
docs by @vwxyzjn in #137 - Fix DDPG docs' description by @vwxyzjn in #139
- Fix typo in DDPG docs by @vwxyzjn in #140
- Fix incorrect links in the DDPG docs by @vwxyzjn in #142
- DDPG documnetation tweaks; added Q loss equations and light explanation by @dosssman in #145
- Add
dqn_atari.py
documentation by @vwxyzjn in #124 - Add documentation for
td3_continuous_action.py
by @vwxyzjn in #141 - SAC Documentation - Benchmarks - Minor code tweaks by @dosssman in #146
- Add docs for
c51.py
andc51_atari.py
by @vwxyzjn in #159 - Add docs for
dqn.py
by @vwxyzjn in #157 - Address stale documentation by @vwxyzjn in #169
- Documentation improvement - fix links and mkdocs by @vwxyzjn in #181
- Improve documentation and contribution guide by @vwxyzjn in #189
- Fix documentation links in README.md by @vwxyzjn in #192
- Fix the implemented varaints section in PPO by @vwxyzjn in #193
Misclanouse changes
- Add Pull Request template by @vwxyzjn in #122
- Amend license to give proper attribution by @vwxyzjn in #152
- Introduce better contribution guide by @vwxyzjn in #154
- Fix the default wandb project name in
ppo_atari_envpool.py
by @vwxyzjn in #160 - Removes unmaintained scripts by @vwxyzjn in #170
- Add PPO documentation by @vwxyzjn in #163
- Add docs header by @vwxyzjn in #174
- Update README.md by @ElliotMunro200 in #177
- Update issue_template.md by @vwxyzjn in #180
- Temporarily Remove PPO-RND by @vwxyzjn in #190
Utility changes
- Export
requirements.txt
automatically by @vwxyzjn in #143 - Auto-upgrade syntax via
pyupgrade
by @vwxyzjn in #158 - Introduce benchmark utilities by @vwxyzjn in #165
New Contributors
- @ElliotMunro200 made their first contribution in #177
- @Dipamc77 made their first contribution in #186
Full Changelog: v0.6.0...v1.0.0b1