ReGenNet: Towards Human Action-Reaction Synthesis

This repository contains the content of the following paper:

ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu^1,2, Yizhou Zhou³, Yichao Yan¹, Xin Jin², Wenhan Zhu, Fengyun Rao³, Xiaokang Yang¹, Wenjun Zeng²
¹ Shanghai Jiao Tong University ² Eastern Institute of Technology, Ningbo ³WeChat, Tencent Inc.

News

[2024.07.14] We release the training, evaluating codes, and the trained models.
[2024.03.18] We release the paper and project page of ReGenNet.

Framework

Installation

First, please clone the repository by the following command:

git clone https://github.com/liangxuy/ReGenNet.git
cd ReGenNet

Setup the environment
1. Setup the conda environment with the following commands:
- Install ffmpeg (if not already installed)
```
sudo apt update
sudo apt install ffmpeg
```
- Setup conda environment
```
conda env create -f environment.yml
conda activate regennet
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
```
- Install mpi4py (multiple GPUs)
```
sudo apt-get install libopenmpi-dev openmpi-bin
pip install mpi4py
```
We also provide a Dockerfile (docker/Dockerfile) if you want to build your own docker environment.
Download other required files
- You can download the pretrained models at Google drive and move them to save folder to reproduce the results.
- You need to download the action recognition models at Google drive and move them to recognition_training for evaluation.
- Download the SMPL neutral models from the SMPL website and the SMPL-X models from the SMPL-X website and then move them to body_models/smpl and body_models/smplx. We also provide a copy here for the convenience.

Data Preparation

NTU RGB+D 120

Since the license of NTU RGB+D 120 dataset does not allow us to distribute its data and annotations, we cannot release the processed NTU RGB+D 120 dataset publicly. If someone is interested at the processed data, please email me.

Chi3D

You can download the original dataset here and the actor-reactor order annotations here.

You can also download the processed dataset at Google Drive and put them under the folder of dataset/chi3d.

InterHuman

You can download the original dataset here and the actor-reactor order annotations here and put them under the folder of dataset/interhuman.

Training

We provide the script to train the model of the online and unconstrained setting for human action-reaction synthesis on the NTU120-AS dataset. --arch, --unconstrained and --dataset can be customized for different settings.

Training with 1 GPU:

# NTU RGB+D 120 Dataset
python -m train.train_mdm --setting cmdm --save_dir save/cmdm/ntu_smplx --dataset ntu --cond_mask_prob 0 --num_person 2 --layers 8 --num_frames 60 --arch online --overwrite --pose_rep rot6d --body_model smplx --data_path PATH/TO/xsub.train.h5 --train_platform_type TensorboardPlatform --vel_threshold 0.03 --unconstrained

# Chi3D dataset
python -m train.train_mdm --setting cmdm --save_dir save/cmdm/chi3d_smplx --dataset chi3d --cond_mask_prob 0 --num_person 2 --layers 8 --num_frames 150 --arch online --overwrite --pose_rep rot6d --body_model smplx --data_path PATH/TO/chi3d_smplx_train.h5 --train_platform_type TensorboardPlatform --vel_threshold 0.01 --unconstrained

Training with multiple GPUs (4 GPUs in the example):

mpiexec -n 4 --allow-run-as-root python -m train.train_mdm --setting cmdm --save_dir save/cmdm/ntu_smplx --dataset ntu --cond_mask_prob 0 --num_person 2 --layers 8 --num_frames 60 --arch online --overwrite --pose_rep rot6d --body_model smplx --data_path PATH/TO/xsub.train.h5 --train_platform_type TensorboardPlatform --vel_threshold 0.03 --unconstrained

Evaluation

For the action recognition model, you can

Directly download the trained action recognition model here;

Or you can train your action recognition model:

The code of training the action recognition model is based on the ACTOR repository.

Commands for training your own action recognition model:

cd actor-x;
# Before training, you need to set up the `dataset` and folder of the `SMPL-X models`
### NTU RGB+D 120 ###
python -m src.train.train_stgcn --dataset ntu120_2p_smplx --pose_rep rot6d --num_epochs 100 --snapshot 10 --batch_size 64 --lr 0.0001 --num_frames 60 --sampling conseq --sampling_step 1 --glob --translation --folder recognition_training/ntu_smplx --datapath dataset/ntu120/smplx/conditioned/xsub.train.h5 --num_person 2 --body_model smplx

### Chi3D ###
python -m src.train.train_stgcn --dataset chi3d --pose_rep rot6d --num_epochs 100 --snapshot 10 --batch_size 64 --lr 0.0001 --num_frames 150 --sampling conseq --sampling_step 1 --glob --translation --folder recognition_training/chi3d_smplx --datapath dataset/chi3d/smplx/conditioned/chi3d_smplx_train.h5 --num_person 2 --body_model smplx

The following script will evaluate the trained model of PATH/TO/model_XXXX.pt, the rec_model_path is the action recognition model. The results will be written to PATH/TO/evaluation_results_XXXX_full.yaml. We use ddim5 to accelerate the evaluation process.

python -m eval.eval_cmdm --model PATH/TO/model_XXXX.pt --eval_mode full --rec_model_path PATH/TO/checkpoint_0100.pth.tar --use_ddim --timestep_respacing ddim5

If you want to get a table with mean and interval, you can use this script:

python -m eval.easy_table PATH/TO/evaluation_results_XXXX_full.yaml

Motion Synthesis and Visualize

Generate the results, and the results will be saved to results.npy.

python -m sample.cgenerate --model_path PATH/TO/model_XXXX.pt --action_file assets/action_names_XXX.txt --num_repetitions 10 --dataset ntu --body_model smplx --num_person 2 --pose_rep rot6d --data_path PATH/TO/xsub.test.h5 --output_dir XXX

Render the results

Install additional dependencies

pip install trimesh
pip install pyrender
pip install imageio-ffmpeg

python -m render.crendermotion --data_path PATH/TO/results.npy --num_person 2 --setting cmdm --body_model smplx

TODO

Release the training, evaluating codes, and the trained models.
Release the annotation results.

Acknowledgments

We want to thank the following contributors that our code is based on:

ACTOR, motion diffusion model, guided diffusion, text-to-motion, HumanML3D

License

This code is distributed under an MIT LICENSE.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.

Citation

If you find ReGenNet is useful for your research, please cite us:

@inproceedings{xu2024regennet,
  title={ReGenNet: Towards Human Action-Reaction Synthesis},
  author={Xu, Liang and Zhou, Yizhou and Yan, Yichao and Jin, Xin and Zhu, Wenhan and Rao, Fengyun and Yang, Xiaokang and Zeng, Wenjun},
  booktitle={CVPR},
  pages={1759--1769},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReGenNet: Towards Human Action-Reaction Synthesis

News

Framework

Installation

Data Preparation

NTU RGB+D 120

Chi3D

InterHuman

Training

Evaluation

Motion Synthesis and Visualize

TODO

Acknowledgments

License

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
actor-x		actor-x
assets		assets
data_loaders		data_loaders
diffusion		diffusion
docker		docker
eval		eval
model		model
preprocess		preprocess
render		render
sample		sample
train		train
utils		utils
visualize		visualize
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
environment.yml		environment.yml

License

liangxuy/ReGenNet

Folders and files

Latest commit

History

Repository files navigation

ReGenNet: Towards Human Action-Reaction Synthesis

News

Framework

Installation

Data Preparation

NTU RGB+D 120

Chi3D

InterHuman

Training

Evaluation

Motion Synthesis and Visualize

TODO

Acknowledgments

License

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages