Implementation of TOP, an off-policy deep actor-critic framework for continuous control, from our paper Tactical Optimism and Pessimism for Deep Reinforcement Learning.
Running Mujoco:
python train_top_agent.py
We've also included the saved runs across 10 seeds for each environment from the paper in the runs
folder. Each file contains the reward curves used for Figure 3, and is structured as a 10 x 1000 matrix, with each row representing a different seed.
TOP-TD3 is built on top of the fantastic TD3 implementation by Philip Ball.
Running DM Control Suite
python top_train.py
TOP-RAD is built on top of the original RAD implementation by Misha Laskin--the majority of the files are unchanged from the original repository.
We plan to add the saved training data from the DM Control experiments (as we have for the Mujoco experiments) soon!
Requirements:
- PyTorch >= 1.6.0
- Tensorboard
- Mujoco_py >= 2.0.2.13 (Mujoco only)
- OpenAI Gym >= 0.15.7
- DM Control suite (DM Control only)
- dmc2gym (DM Control only)