Skip to content

Implementation of Tactical Optimistic and Pessimistic value estimation

Notifications You must be signed in to change notification settings

tedmoskovitz/TOP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tactical Optimistic and Pessimistic estimation (TOP)

Implementation of TOP, an off-policy deep actor-critic framework for continuous control, from our paper Tactical Optimism and Pessimism for Deep Reinforcement Learning.

Running Mujoco:

python train_top_agent.py

We've also included the saved runs across 10 seeds for each environment from the paper in the runs folder. Each file contains the reward curves used for Figure 3, and is structured as a 10 x 1000 matrix, with each row representing a different seed.

TOP-TD3 is built on top of the fantastic TD3 implementation by Philip Ball.

Running DM Control Suite

python top_train.py

TOP-RAD is built on top of the original RAD implementation by Misha Laskin--the majority of the files are unchanged from the original repository.

We plan to add the saved training data from the DM Control experiments (as we have for the Mujoco experiments) soon!

Requirements:

About

Implementation of Tactical Optimistic and Pessimistic value estimation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages