Project Page | Paper | Bibtex
Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications
European Conference on Computer Vision (ECCV), 2022
Lingzhi Zhang*, Shenghao Zhou*, Simon Stent, Jianbo Shi (* indicates equal contribution)
Our main goal is to provide a tool for better hand-object segmentation on the in-the-wild egocentric videos.
- Linux
- Python 3
- NVIDIA GPU + CUDA CuDNN
Table of Contents:
- Setup - download pretrained models and resources
- Datasets - download our egocentric hand-object segmentation datasets
- Checkpoints - download the checkpoints for all our models
- Inference on Images - quick usage on images
- Inference on Videos - quick usage on videos
- Other Resources - other resources used in our papers
- Clone this repo:
git clone https://github.com/owenzlz/EgoHOS
- Install dependencies:
pip install -r requirements.txt
pip install -U openmim
mim install mmcv-full==1.6.0
cd mmsegmentation
pip install -v -e .
For more information, please refer to MMSegmentation: https://mmsegmentation.readthedocs.io/en/latest/
- Download our dataset using the following command line.
bash download_datasets.sh
After downloading, the dataset is structured as follows:
- [egohos dataset root]
|- train
|- image
|- label
|- contact
|- val
|- image
|- label
|- contact
|- test_indomain
|- image
|- label
|- contact
|- test_outdomain
|- image
|- label
|- contact
In each label image, the category ids are referred as below. In the contact labels, 'ones' indicate the dense contact region.
0 -> background
1 -> left hand
2 -> right hand
3 -> 1st order interacting object by left hand
4 -> 1st order interacting object by right hand
5 -> 1st order interacting object by both hands
6 -> 2nd order interacting object by left hand
7 -> 2nd order interacting object by right hand
8 -> 2nd order interacting object by both hands
- Download checkponts and config files:
bash download_checkpoints.sh
- Let's first download a few test images for running the demo:
bash download_testimages.sh
Depending on the application scenarios, you may want to use one of these commands to generate the segmentation predictions. Please modify the image directory paths in the bash file if needed. The backen segmentation model is Swin-L backbone with UPerNet head.
The default of the bash commands run on the images in "./testimages/images", and the results are saved in "./testimages" folder. If you wish to test on your own images, you may either put your images into "./testimages/images" folder or change directories in the bash files.
- Predict two hands, contact boundary, and interacting objects (1st order) sequentially.
cd mmsegmentation # if you are not in this directory
bash pred_all_obj1.sh
- Predict two hands, contact boundary, and interacting objects (1st and 2nd orders) sequentially.
cd mmsegmentation # if you are not in this directory
bash pred_all_obj2.sh
If you only want to predict only hand/contact segmentation, or want to use each module separately, see the commands below.
- Predict only the left and right hands.
cd mmsegmentation # if you are not in this directory
bash pred_twohands.sh
- Predict the dense contact boundary.
cd mmsegmentation # if you are not in this directory
bash pred_cb.sh
- Predict the (1st order) interacting objects.
cd mmsegmentation # if you are not in this directory
bash pred_obj1.sh
- Predict the (both 1st and 2nd orders) interacting objects.
cd mmsegmentation
bash pred_obj2.sh
- Let's first download a few test videos for running the demo:
bash download_testvideos.sh
- Predict hands and (1st order) interacting objects.
cd mmsegmentation # if you are not in this directory
bash pred_obj1_video.sh
- Predict hands and (1st and 2nd orders) interacting objects.
cd mmsegmentation # if you are not in this directory
bash pred_obj2_video.sh
We used other resources for the application section, i.e. mesh reconstruction. Please refer to below:
- Image Inpainting - LaMa: https://github.com/saic-mdal/lama
- Video Inpainting - Flow-edge Guided Video Completion: https://github.com/vt-vl-lab/FGVC
- Mesh Reconstruction of Hand-Object Interaction: https://github.com/hassony2/homan
- Video Recognition - SlowFast Newtork: https://github.com/epic-kitchens/epic-kitchens-slowfast
If you wish to generate higher quality mask, you may consider using mask refinement model, i.e. Cascade PSP: https://github.com/hkchengrex/CascadePSP
If you use this code for your research, please cite our paper:
@inproceedings{zhang2022fine,
title={Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and Applications},
author={Zhang, Lingzhi and Zhou, Shenghao and Stent, Simon and Shi, Jianbo},
booktitle={European Conference on Computer Vision},
pages={127--145},
year={2022},
organization={Springer}
}