Skip to content

Latest commit

 

History

History
63 lines (51 loc) · 3.41 KB

README.md

File metadata and controls

63 lines (51 loc) · 3.41 KB

How Far I'll Go:
Offline Goal-Conditioned Reinforcement Learning via
f-Advantage Regression

Jason Yecheng Ma1, Jason Yan1, Dinesh Jayaraman1, Osbert Bastani1

1University of Pennsylvania

This is a PyTorch implementation of our paper How Far I'll Go: Offline Goal-Conditioned Reinforcement Learning via F-Advantage Regression; this code can be used to reproduce Section 5.1 and 5.2 of the paper.

Here is a teaser video comparing GoFAR against state-of-art offline GCRL algorithms on a real robot!

SetUp

Requirements

  • MuJoCo=2.0.0

Setup Instructions

  1. Create conda environment and activate it:
    conda env create -f environment.yml
    conda activate gofar
    pip install --upgrade numpy
    pip install torch==1.10.0 torchvision==0.11.1 torchaudio===0.10.0 gym==0.17.3
    
  2. (Optionally) install the Robel environment for the D'Claw experiment.
  3. Download the offline dataset here and place /offline_data in the project root directory.

Experiments

We provide commands for reproducing the main GCRL results (Table 1), the ablations (Figure 3), and the stochastic offline GCRL experiment (Figure 4).

  1. The main results (Table 1) can be reproduced by the following command:
mpirun -np 1 python train.py --env $ENV --method $METHOD
Flags and Parameters Description
--env $ENV offline GCRL tasks: FetchReach, FetchPush, FetchPick, FetchSlide, HandReach, DClawTurn
--method $METHOD offline GCRL algorithms: gofar, gcsl, wgcsl, actionablemodel, ddpg
  1. To run the ablations (Figure 3), we can adjust some relevant command arguments. For example, to disable HER, we can do
mpirun -np 1 python train.py --env $ENV --method $METHOD --relabel False

Note that gofar defaults to not using HER, so this command is only relevant to the baselines. Relevant flags are listed here:

Flags and Parameters Description
--relabel whether hindsight experience replay is enabled: True, False
--relabel_percent The fraction of minibatch transitions that has relabeled goals: 0.0, 0.2, 0.5, 1.0; these are the hyperparameters attempted in the paper, you may try other fractions too.
--f Choices of f-divergence for GoFAR: kl, chi.
--reward_type Choices of reward function for GoFAR: disc, binary.
  1. The following command will run the stochastic environment experiment (Figure 4):
mpirun -np 1 python train.py --env FetchReach --method $METHOD --noise True --noise-eps $NOISE_EPS

where $NOISE_EPS can be chosen from 0.5, 1.0, 1.5.

Acknowledgement:

We borrowed some code from the following repositories: