MuST: Multi-Scale Transformers for Surgical Phase Recognition

MuST Architecture

Multi-Term Frame Encoder (MTFE) Architecture

Model Description

We present Multi-Scale Transformers for Surgical Phase Recognition (MuST), a two-stage Transformer-based architecture designed to enhance the modeling of short-, mid-, and long-term information within surgical phases. Our method employs a frame encoder that leverages multi-scale surgical context across different temporal dimensions. The frame encoder considers diverse time spans around a specific frame of interest, which we call a keyframe. The keyframe serves as the specific frame that we encode. We construct temporal windows around this keyframe to provide the necessary temporal context for accurate phase classification. Our encoder generates rich embeddings that capture short- and mid-term dependencies. To further enhance long-term understanding, we employ a Temporal Consistency Module that establishes relationships among frame embeddings within an extensive temporal window, ensuring coherent phase recognition within an extensive temporal window.

Confernece paper in Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. Proceedings available at Springer
Preprint available at Arxiv
Winning solution of the 2024 PhaKIR Challenge
You can also visit our Project Page

Installation

Please follow these steps to run MuST:

$ conda create --name must python=3.8 -y
$ conda activate must
$ conda install pytorch==1.9.0 torchvision==0.10.0 cudatoolkit=11.1 -c pytorch -c nvidia

$ conda install av -c conda-forge
$ pip install -U iopath
$ pip install -U opencv-python
$ pip install -U pycocotools
$ pip install 'git+https://github.com/facebookresearch/fvcore'
$ pip install 'git+https://github.com/facebookresearch/fairscale'
$ python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

$ git clone https://github.com/BCV-Uniandes/MuST
$ cd MuST
$ pip install -r requirements.txt

Data Preparation

The DATA_PREPARATION.md file contains detailed instructions for preparing the datasets used to validate our method, downloading pre-trained model weights, and guidelines for setting up your own custom dataset.

Running the code

Dataset	Test Metric (metric)	Config	Run File	Frames Features	Model
GraSP	79.14 (mAP)	GrasP TCM Config	Run GraSP TCM	./data/GraSP/frames_features	GrasP Weights
MISAW	98.08 (mAP)	MISAW TCM Config	Run MISAW TCM	./data/misaw/frames_features	MISAW Weights
HeiChole	77.25 (F1-score)	HeiChole TCM Config	Run HeiChole TCM	./data/heichole/frames_features	Heichole Weights
Cholec80	85.57 (F1-score)	Cholec80 TCM Config	Run Cholec80 TCM	./data/cholec80/frames_features	Cholec80 Weights

We provide bash scripts with the default parameters to evaluate each dataset. Please first download our preprocessed data files and pretrained models as instructed earlier and run the following commands to run evaluation on each task:

# Calculate features running the script corresponding to the desired dataset
$ sh run_files/extract_features/{dataset}_phases
# Run the script corresponding to the desired dataset to evaluate
$ sh run_files/tcm/{dataset}_phases

Training MuST

You can easily modify the bash scripts to train our models. Just set TRAIN.ENABLE True on the desired script to enable training, and set TEST.ENABLE False to avoid testing before training. You might also want to modify TRAIN.CHECKPOINT_FILE_PATH to the model weights you want to use as initialization. You can modify the config files or the bash scripts to modify the architecture design, training schedule, video input design, etc. We provide documentation for each hyperparameter in the defaults script. For the Temporal Consistency Module (TCM), ensure the temporal chunks are being used by setting TEMPORAL_MODULE.CHUNKS True. For more details to train MuST, refer to TRAINING.md

Citation

If you find this repository helpful, please consider citing:

@inproceedings{perez2024must,
  title={MuST: Multi-scale Transformers for Surgical Phase Recognition},
  author={P{\'e}rez, Alejandra and Rodr{\'\i}guez, Santiago and Ayobi, Nicol{\'a}s and Aparicio, Nicol{\'a}s and Dessevres, Eug{\'e}nie and Arbel{\'a}ez, Pablo},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={422--432},
  year={2024},
  organization={Springer}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
must		must
run_files		run_files
src		src
tools		tools
utils		utils
DATA_PREPARATION.md		DATA_PREPARATION.md
README.md		README.md
TRAINING.md		TRAINING.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuST: Multi-Scale Transformers for Surgical Phase Recognition

MuST Architecture

Multi-Term Frame Encoder (MTFE) Architecture

Model Description

Installation

Data Preparation

Running the code

Training MuST

Citation

About

Releases

Packages

Languages

BCV-Uniandes/MuST

Folders and files

Latest commit

History

Repository files navigation

MuST: Multi-Scale Transformers for Surgical Phase Recognition

MuST Architecture

Multi-Term Frame Encoder (MTFE) Architecture

Model Description

Installation

Data Preparation

Running the code

Training MuST

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages