In NeurIPS 2018 [Project Website] [Demo Video] [pdf]
Tao Chen, Adithya Murali, Abhinav Gupta
The Robotics Institute, Carnegie Mellon University
This is a pytorch-based implementation for our NeurIPS 2018 paper on hardware conditioned policies. The idea is that the policy input(state) is augmented with a hardware-specific encoding vector for better multi-robot skill transfer. The encoding vector can be either explicitly constructed (HCP-E) or learned implicitly via back-propagation (HCP-I). It's compatible with most of the existing deep reinforcement learning algorithms. We demonstrate the usage of our idea with DDPG+HER and PPO. If you find this work useful in your research, please cite:
@inproceedings{chen2018hardware,
title={Hardware Conditioned Policies for Multi-Robot Transfer Learning},
author={Chen, Tao and Murali, Adithyavairavan and Gupta, Abhinav},
booktitle={Advances in Neural Information Processing Systems},
pages={9355--9366},
year={2018}
}
The code has been tested on Ubuntu 16.04.
-
Install Anaconda
-
Download code repo:
cd ~
git clone https://github.com/taochenshh/hcp.git
cd hcp
- Create python environment
conda env create -f environment.yml
conda activate hcp
- Install MuJoCo and mujoco-py 1.50
- Generate robot xml files
cd gen_robots
chmod +x gen_multi_dof_simrobot.sh
## generate both peg_insertion and reacher environments
./gen_multi_dof_simrobot.sh peg_insertion reacher
## generate peg_insertion environments only
./gen_multi_dof_simrobot.sh peg_insertion
## generate reacher environments only
./gen_multi_dof_simrobot.sh reacher
- Train the policy model
cd ../HCP-E
## HCP-E: peg_insertion
python main.py --env=peg_insertion --with_kin --train_ratio=0.9 --save_interval=200 --robot_dir=../xml/gen_xmls/simrobot/peg_insertion --save_dir=peg_data/HCP-E
## HCP-E: reacher
cd util
python gen_start_and_goal.py
cd ..
python main.py --env=reacher --with_kin --train_ratio=0.9 --save_interval=200 --robot_dir=../xml/gen_xmls/simrobot/reacher --save_dir=reacher_data/HCP-E
- Test the policy model
## HCP-E: peg_insertion
python main.py --env=peg_insertion --with_kin --train_ratio=0.9 --save_interval=200 --robot_dir=../xml/gen_xmls/simrobot/peg_insertion --save_dir=peg_data/HCP-E --test
## HCP-E: reacher
python main.py --env=reacher --with_kin --train_ratio=0.9 --save_interval=200 --robot_dir=../xml/gen_xmls/simrobot/reacher --save_dir=reacher_data/HCP-E --test
Add --render
in the end if you want to visually test the policy.
- Generate robot xml files
cd gen_robots
python gen_hoppers.py --robot_num=1000
- Train the policy model
cd ../HCP-I
python main.py --env=hopper --with_embed --robot_dir=../xml/gen_xmls/hopper --save_dir=hopper_data/HCP-I
- Test the policy model
python main.py --env=hopper --with_embed --robot_dir=../xml/gen_xmls/hopper --save_dir=hopper_data/HCP-I --test
Add --render
in the end if you want to visually test the policy.