Model structure:
torch==2.0.0
torchvision==0.15.0
torchmetrics==0.10.3
albumentations
We use five datasets (Robust-MIS 2019, EndoVis 2017, EndoVis 2018, CholecSeg8k, AutoLaparo) in our paper.
For Robust-MIS 2019
, you can download the dataset from here and then put the files in data/robomis
.
For EndoVis 2017
, You can apply the dataset here by registration.
For EndoVis 2018
, You can apply the dataset here by registration.
For CholecSeg8k
, you can download the dataset from here.
For AutoLaparo
, you can request the dataset from here.
NOTE: The `Robust-MIS 2019' dataset includes 3 stages of testing, and stage 3 is unseen images during the training process.
Train with ViT-L on a single GPU
python train.py \
--data_path ../data/robomis \
--output_dir .../eval_adapter_plus_vitl \
--arch vit_base \
--patch_size 14 \
--n_last_blocks 4 \
--imsize 588 \
--lr 0.01 \
--config_file dinov2/configs/eval/vitl14_pretrain.yaml \
--pretrained_weights https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth \
--num_workers 2 \
--epochs 500 \
Train with ViT-L on multiple GPUs
export CUDA_VISIBLE_DEVICES=0,1
export PYTHONPATH=.../AdapterSIS
python -m torch.distributed.launch --nproc_per_node=2 eval_paper.py \
--data_path .../data/robo \
--output_dir .../AdapterExp/paperonn \
--arch vit_base \
--patch_size 14 \
--n_last_blocks 4 \
--imsize 588 \
--lr 0.01 \
--config_file dinov2/configs/eval/vitl14_pretrain.yaml \
--pretrained_weights https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth \
--num_workers 2 \
--epochs 500 \
--batch_size_per_gpu 12
For evaluation, simply add --evaluate for the training file
python train.py \
--data_path ../data/robomis \
--output_dir .../eval_adapter_plus_vitl \
--arch vit_base \
--patch_size 14 \
--n_last_blocks 4 \
--imsize 588 \
--lr 0.01 \
--config_file dinov2/configs/eval/vitl14_pretrain.yaml \
--pretrained_weights https://dl.fbaipublicfiles.com/dinov2/dinov2_vitl14/dinov2_vitl14_pretrain.pth \
--num_workers 2 \
--epochs 500 \
--evaluate