Diff-IP2D: Diffusion Models for Joint HOI Prediction

This is the official implementation for our paper:

Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos.

Junyi Ma¹, Jingyi Xu¹, Xieyuanli Chen², Hesheng Wang^1*

¹SJTU ²NUDT ^*Corresponding author

Diff-IP2D is the first work using the devised denoising diffusion probabilistic model to jointly forecast future hand trajectories and object affordances with only 2D egocentric videos as input. It provides a foundation generative paradigm in the field of HOI prediction.

white: ours, blue: baseline, red: GT. Diff-IP2D generates plausible future hand waypoints and final hand positions (even if there is a large error in the early stage) with bidirectional constraints.

If you find our work helpful to your research, please cite our paper as

@article{ma2024diffip2d,
  title={Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos},
  author={Ma, Junyi and Xu, Jingyi and Chen, Xieyuanli and Wang, Hesheng},
  journal={arXiv preprint arXiv:2405.04370},
  year={2024}}

1. Setup

Clone the repository (requires git):

git clone https://github.com/IRMVLab/Diff-IP2D.git
cd Diff-IP2D

Create the environment and install dependencies into it:

conda create -n diffip python=3.8 pip
conda activate diffip
pip install -r requirements.txt

2. Data Structure

We suggest using our proposed data structure for faster reproducing, which is posted here:

├── base_models
│   └── model.pth.tar
├── common
│   ├── epic-kitchens-100-annotations # from OCT or merged ourselves
│   │   ├── EPIC_100_test_timestamps.csv
│   │   ├── EPIC_100_test_timestamps.pkl
│   │   ├── EPIC_100_train.csv
│   │   ├── EPIC_100_train.pkl
│   │   ├── EPIC_100_train_val_test.csv
│   │   ├── EPIC_100_verb_classes.csv
│   │   ├── EPIC_100_video_info.csv
│   │   ├── actions.csv
│   │   └── ...
│   └── rulstm # raw rulstm repo
│       ├── FEATEXT
│       ├── FasterRCNN
│       └── RULSTM
├── data
│   ├── ek100 # manually generated ourselves or from OCT or from raw EK
│   │   ├── feats_train
│   │   │   ├── full_data_with_future_train_part1.lmdb
│   │   │   └── full_data_with_future_train_part2.lmdb
│   │   ├── feats_test
│   │   │   └── data.lmdb
│   │   ├── labels # 
│   │   │   ├── label_0.pkl
│   │   │   └── ...
│   │   ├── ek100_eval_labels.pkl
│   │   └── video_info.json
│   ├── raw_images # raw EPIC-KITCHENS dataset
│   │   └── EPIC-KITCHENS
│   ├── homos_train # auto generated when first running
│   ├── homos_test # auto generated when first running  
├── diffip_weights # auto generated when first saving checkpoints
│   ├── checkpoint_1.pth.tar
│   └── ...
├── collected_pred_traj # auto generated when first eval traj
├── collected_pred_aff # auto generated when first eval affordance
├── log # auto generated when first running
└── uid2future_file_name.pickle

3. Access to Required Files

Here we provide the links to access all the above-mentioned files that cannot be generated automatically by running scripts of this repo:

base_models/model.pth.tar: Base model from OCT [1].
common/epic-kitchens-100-annotations: Annotations from raw EK [2] and our mannually merged files. Please do not confuse this folder with the one provided by OCT [1].
common/rulstm: Original RULSTM [3] repo.
data/ek100/feats_train: Our mannually generated feature files for training our model.
data/ek100/feats_test: Feature files provided by OCT [1] for testing our model.
data/ek100/labels: Labels from OCT [1] for training models.
data/ek100/ek100_eval_labels.pkl: Labels from OCT [1] for affordance evaluation. Please refer to the original OCT folder.
data/ek100/video_info.json: Used video index.
data/raw_images: Original EK images [2]. Following the instructions in EK repo for downloading raw RGB images by python epic_downloader.py --rgb-frames since only raw images are required in Diff-IP2D.
uid2future_file_name.pickle: Indicator generated ourselves.

4. How to Use

We have released the deployment of Diff-IP2D on EK100. We are going to release relevant codes and data on EK55 and EG soon ...

4.1 Pretrained Weights

Version	Download link	Notes
1.1	OneDrive / Google Drive	pretrained on EK100 (two val)
1.2	OneDrive / Google Drive	pretrained on EK100 (one val)

Please change the paths to pretrained weights in run_train.py, run_val_traj.py, and run_val_affordance.py.

4.2 Train

bash train.sh

4.3 Test

Please test trajectory prediction by

bash val_traj.sh

Test affordance prediction by

bash val_affordance.sh

4.4 Other Notes

We are working hard to organize and release a more polished version of the code, along with its application on the new dataset.
You may obtain results that slightly differ from those presented in the paper due to the stochastic nature of diffusion inference with different seeds. Prediction clusters can be obtained using multiple different seeds.
Homography will be automatically saved to data/homos_train and data/homos_test after the first training/test epoch for quick reuse.
Separate validation sets will lead to checkpoints at different epochs for two tasks.
Please modify the params in config files before training and testing. For example, change the paths in options/expopts.py. You can also set fast_test=True for faster inference without sacrificing much accuracy.

5. Acknowledgment

We sincerely appreciate the fantastic pioneering works that provide codebases and datasets for this work. Please also cite them if you use the relevant code and data.

[1] Shaowei Liu, Subarna Tripathi, Somdeb Majumdar, and Xiaolong Wang. Joint hand motion and interaction hotspots prediction from egocentric videos. In CVPR, pages 3282–3292, 2022. Paper Code

[2] Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100. IJCV, pages 1–23, 2022. Paper Code

[3] Antonino Furnari and Giovanni Maria Farinella. Rolling-unrolling lstms for action anticipation from first-person video. IEEE TPAMI, 43(11):4021–4036, 2020. Paper Code

[4] Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and Lingpeng Kong. Diffuseq: Sequence to sequence text generation with diffusion models. In ICLR, 2023. Paper Code

[5] Wentao Bao, Lele Chen, Libing Zeng, Zhong Li, Yi Xu, Junsong Yuan, and Yu Kong. Uncertainty-aware state space transformer for egocentric 3d hand trajectory forecasting. In ICCV, pages 13702–13711, 2023. Paper Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diff-IP2D: Diffusion Models for Joint HOI Prediction

1. Setup

2. Data Structure

3. Access to Required Files

4. How to Use

4.1 Pretrained Weights

4.2 Train

4.3 Test

4.4 Other Notes

5. Acknowledgment

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
datasets		datasets
diffip2d		diffip2d
docs		docs
evaluation		evaluation
netscripts		netscripts
networks		networks
options		options
preprocess		preprocess
README.md		README.md
basic_utils.py		basic_utils.py
requirements.txt		requirements.txt
run_train.py		run_train.py
run_val_affordance.py		run_val_affordance.py
run_val_traj.py		run_val_traj.py
train.sh		train.sh
traineval.py		traineval.py
val_affordance.sh		val_affordance.sh
val_traj.sh		val_traj.sh

IRMVLab/Diff-IP2D

Folders and files

Latest commit

History

Repository files navigation

Diff-IP2D: Diffusion Models for Joint HOI Prediction

1. Setup

2. Data Structure

3. Access to Required Files

4. How to Use

4.1 Pretrained Weights

4.2 Train

4.3 Test

4.4 Other Notes

5. Acknowledgment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages