Code for paper Not Only Domain Randomization: Universal Policy with Embedding System Identification.
This repo uses the same environment named robolite, which is a modified verison of robosuite to support domain randomisation and inverse kinematics (IK). Our modified environment is also used in another project.
If you're installing this repo for the first time, please ensure that you have anaconda
installed, and have IsaacGym_Preview_3_Package.tar.gz
from nvidia official website in this folder (upesi
), then run ./initialization.sh
using bash without super user privilege.
You'll then get a conda
environment named rlgpu
.
isaacgymenvs
, robolite
will be created as siblings of this directory.
You should always use bash to run commands.
Please cite the our paper if you make use of this repo:
@article{ding2021not,
title={Not Only Domain Randomization: Universal Policy with Embedding System Identification},
author={Ding, Zihan},
journal={arXiv preprint arXiv:2109.13438},
year={2021}
}
For the universal policy (UP) with embedding system identification (ESI), we use the following commands.
First pretrained models are needed for each environment to rollout samples for further usage (learn the dynamics prediction in our method):
- Get pretrained model
Remember to suspend parameter randomization (set randomized_params=None
in ./default_params.py
) for getting this policy.
python train.py basic.env_name=inverteddoublependulum
as an example for the InvertedDoublePendulum environment, using TD3 algorithm for training. After training, there will be weights in the data folder. You just need to replace the model path in later scripts with the one you got to make it run.
Go to the directory:
cd dynamics_predict
- Collect training and testing dataset
python train_dynamics.py --collect_train_data --env Env_NAME
python train_dynamics.py --collect_test_data --env Env_NAME
- Normailize data Run
cd ../data/dynamics_data
jupyter notebook
and open data_process_*ENV_NAME*.ipynb
and go through each cell.
- Train dynamics embedding (encoder, decoder and dynamics prediction model)
Back to the terminal in dynamics_predict/
.
Run the following to lauch training,
python train_dynamics.py --train_embedding --env Env_NAME
and use launch tensorboard --logdir runs
to monitor the training process.
- Test learned encoder and dynamics predictor Test the preformance of learned encoder and dynamics predictor by applying them in ESI on collected test data:
jupyter notebook
and open test_dynamics_*ENV_NAME*.ipynb
and go through each cell, including a Bayesian optimization (BO) process.
- Train UP
cd ..
python train.py --train --env *ENV_NAME*dynamics --process NUM
Select the encoder-decoder type in ./environment/*ENV_NAME*dynamics.py
to match with the one used in ./dynamics_predict/train_dynamics.py
.
- Test ESI with UP against other methods
cd dynamics_predict
python compare_methods_*ENV_NAME*.py