Skip to content

Official pytorch implementation of MuST: Multi-Scale Transformers for Surgical Phase Recognition MICCAI 2024

Notifications You must be signed in to change notification settings

BCV-Uniandes/MuST

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuST: Multi-Scale Transformers for Surgical Phase Recognition

MuST Architecture

MuST Architecture

Multi-Term Frame Encoder (MTFE) Architecture

MTFE Architecture

Model Description

We present Multi-Scale Transformers for Surgical Phase Recognition (MuST), a two-stage Transformer-based architecture designed to enhance the modeling of short-, mid-, and long-term information within surgical phases. Our method employs a frame encoder that leverages multi-scale surgical context across different temporal dimensions. The frame encoder considers diverse time spans around a specific frame of interest, which we call a keyframe. The keyframe serves as the specific frame that we encode. We construct temporal windows around this keyframe to provide the necessary temporal context for accurate phase classification. Our encoder generates rich embeddings that capture short- and mid-term dependencies. To further enhance long-term understanding, we employ a Temporal Consistency Module that establishes relationships among frame embeddings within an extensive temporal window, ensuring coherent phase recognition within an extensive temporal window.

  • Confernece paper in Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. Proceedings available at Springer

  • Preprint available at Arxiv

  • Winning solution of the 2024 PhaKIR Challenge

  • You can also visit our Project Page

Installation

Please follow these steps to run MuST:

$ conda create --name must python=3.8 -y
$ conda activate must
$ conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia

$ conda install av -c conda-forge
$ pip install -U iopath
$ pip install -U opencv-python
$ pip install -U pycocotools
$ pip install 'git+https://github.com/facebookresearch/fvcore'
$ pip install 'git+https://github.com/facebookresearch/fairscale'
$ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

$ git clone https://github.com/BCV-Uniandes/MuST
$ cd MuST
$ pip install -r requirements.txt

Data Preparation

The DATA_PREPARATION.md file contains detailed instructions for preparing the datasets used to validate our method, downloading pre-trained model weights, and guidelines for setting up your own custom dataset.

Running the code

Dataset Test Metric (metric) Config Run File Frames Features Model
GraSP 79.14 (mAP) GrasP TCM Config Run GraSP TCM ./data/GraSP/frames_features GrasP Weights
MISAW 98.08 (mAP) MISAW TCM Config Run MISAW TCM ./data/misaw/frames_features MISAW Weights
HeiChole 77.25 (F1-score) HeiChole TCM Config Run HeiChole TCM ./data/heichole/frames_features Heichole Weights
Cholec80 85.57 (F1-score) Cholec80 TCM Config Run Cholec80 TCM ./data/cholec80/frames_features Cholec80 Weights

We provide bash scripts with the default parameters to evaluate each dataset. Please first download our preprocessed data files and pretrained models as instructed earlier and run the following commands to run evaluation on each task:

# Calculate features running the script corresponding to the desired dataset
$ sh run_files/extract_features/{dataset}_phases
# Run the script corresponding to the desired dataset to evaluate
$ sh run_files/tcm/{dataset}_phases

Training MuST

You can easily modify the bash scripts to train our models. Just set TRAIN.ENABLE True on the desired script to enable training, and set TEST.ENABLE False to avoid testing before training. You might also want to modify TRAIN.CHECKPOINT_FILE_PATH to the model weights you want to use as initialization. You can modify the config files or the bash scripts to modify the architecture design, training schedule, video input design, etc. We provide documentation for each hyperparameter in the defaults script. For the Temporal Consistency Module (TCM), ensure the temporal chunks are being used by setting TEMPORAL_MODULE.CHUNKS True. For more details to train MuST, refer to TRAINING.md

Citation

If you find this repository helpful, please consider citing:

@inproceedings{perez2024must,
  title={MuST: Multi-scale Transformers for Surgical Phase Recognition},
  author={P{\'e}rez, Alejandra and Rodr{\'\i}guez, Santiago and Ayobi, Nicol{\'a}s and Aparicio, Nicol{\'a}s and Dessevres, Eug{\'e}nie and Arbel{\'a}ez, Pablo},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={422--432},
  year={2024},
  organization={Springer}
}

About

Official pytorch implementation of MuST: Multi-Scale Transformers for Surgical Phase Recognition MICCAI 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published