Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Our project is fully open-sourced. We separate them into two repos: Learning & Deployment of iDP3 and Humanoid Teleoperation. This repo is for training and deployment of iDP3.

idp3.mov

News

2024-11-04 Full data and checkpoints (all 3 tasks) are released in Google Drive.
2024-10-13 Release the full code for learning/teleoperation. Have a try!

Training & Deployment of iDP3

This repo is for training and deployment of iDP3. We provide the training data example in this Google Drive, so that you could try to train the model without collecting data. The full data and the checkpoints are available in this Google Drive.

More info:

For the training machine, we use a local computer with an Nvidia RTX 4090 (24G memory).
For the deployment machine, we use the cpu of the onboard computer in Fourier GR1.
We use RealSense L515 for the depth camera. Note that RealSense D435 provides very imprecise depth data and is not recommended for training the 3D policy.

iDP3 is a general 3D visuomotor policy for any robot. You could use iDP3 without camera calibration and point cloud segmentation. Please check our RealSense wrapper for the proposed egocentric 3D visual representation.

Installation

Install conda env and packages for both learning and deployment machines:

conda remove -n idp3 --all
conda create -n idp3 python=3.8
conda activate idp3

# for cuda >= 12.1
pip3 install torch==2.1.0 torchvision --index-url https://download.pytorch.org/whl/cu121
# else, 
# just install the torch version that matches your cuda version



# install my visualizer
cd third_party
cd visualizer && pip install -e . && cd ..
pip install kaleido plotly open3d tyro termcolor h5py
cd ..


# install 3d diffusion policy
pip install --no-cache-dir wandb ipdb gpustat visdom notebook mediapy torch_geometric natsort scikit-video easydict pandas moviepy imageio imageio-ffmpeg termcolor av open3d dm_control dill==0.3.5.1 hydra-core==1.2.0 einops==0.4.1 diffusers==0.11.1 zarr==2.12.0 numba==0.56.4 pygame==2.1.2 shapely==1.8.4 tensorboard==2.10.1 tensorboardx==2.5.1 absl-py==0.13.0 pyparsing==2.4.7 jupyterlab==3.0.14 scikit-image yapf==0.31.0 opencv-python==4.5.3.56 psutil av matplotlib setuptools==59.5.0

cd Improved-3D-Diffusion-Policy
pip install -e .
cd ..

# install for diffusion policy if you want to use image-based policy
pip install timm==0.9.7

# install for r3m if you want to use image-based policy
cd third_party/r3m
pip install -e .
cd ../..

[Install on Deployment Machine] Install realsense package for deploy:

# first, install realsense driver
# check this version for RealSenseL515: https://github.com/IntelRealSense/librealsense/releases/tag/v2.54.2

# also install python api
pip install pyrealsense2==2.54.2.5684

Usage

We provide the training data example in Google Drive, so that you could try to train the model without collecting data. Download it and unzip it. Then specify the dataset path in scripts/train_policy.sh.

For example, I put the dataset in /home/ze/projects/Improved-3D-Diffusion-Policy/training_data_example, and I set dataset_path=/home/ze/projects/Improved-3D-Diffusion-Policy/training_data_example in scripts/train_policy.sh.

Then you could train the policy and deploy it.

Train. The script to train policy:

# 3d policy
bash scripts/train_policy.sh idp3 gr1_dex-3d 0913_example

# 2d policy
bash scripts/train_policy.sh dp_224x224_r3m gr1_dex-image 0913_example

Deploy. After you have trained the policy, deploy the policy with the following command. For missing packages such as communication.py, see another our repo

# 3d policy
bash scripts/deploy_policy.sh idp3 gr1_dex-3d 0913_example

# 2d policy
bash scripts/deploy_policy.sh dp_224x224_r3m gr1_dex-image 0913_example

Note that you may not run the deployment code without a robot (differet robots have different API). The code we provide is more like an example to show how to deploy the policy. You could modify the code to fit your own robot (any robot with a camera is OK).

Visualize. You can visualize our training data example by running (remember to set the dataset path):

bash scripts/vis_dataset.sh

You can specify vis_cloud=1 to render the point cloud as in the paper.

BibTeX

Please consider citing our paper if you find this repo useful:

@article{ze2024humanoid_manipulation,
  title   = {Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies},
  author  = {Yanjie Ze and Zixuan Chen and Wenhao Wang and Tianyi Chen and Xialin He and Ying Yuan and Xue Bin Peng and Jiajun Wu},
  year    = {2024},
  journal = {arXiv preprint arXiv:2410.10803}
}

Acknowledgement

We thank the authors of the following repos for their great work: 3D Diffusion Policy, Diffusion Policy, VisionProTeleop, Open-TeleVision.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Improved-3D-Diffusion-Policy		Improved-3D-Diffusion-Policy
scripts		scripts
third_party		third_party
.gitignore		.gitignore
ERROR_CATCH.md		ERROR_CATCH.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

News

Training & Deployment of iDP3

Installation

Usage

BibTeX

Acknowledgement

About

Releases

Packages

Languages

License

YanjieZe/Improved-3D-Diffusion-Policy

Folders and files

Latest commit

History

Repository files navigation

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

News

Training & Deployment of iDP3

Installation

Usage

BibTeX

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages