This is the official implementation of the paper "Less is More: Focus Attention for Efficient DETR"
Authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang.
Focus-DETR is a model that focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Compared with the state-of-the-art sparse transformed-based detector under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO.
- Focus-DETR
- Table of Contents
- Main Results with Pretrained Models - Pretrained focus_detr with ResNet Backbone
- Installation
- Training
- Evaluation
- Citing Focus-DETR
Here we provide the pretrained Focus-DETR
weights based on detrex.
Name | Backbone | Pretrain | Epochs | Denoising Queries | box AP |
download |
---|---|---|---|---|---|---|
Focus-DETR-R50-4scale | R-50 | IN1k | 12 | 100 | 48.8 | model |
Focus-DETR-R50-4scale | R-50 | IN1k | 24 | 100 | 50.3 | model |
Focus-DETR-R50-4scale | R-50 | IN1k | 36 | 100 | 50.4 | model |
Focus-DETR-R101-4scale | R-101 | IN1k | 12 | 100 | 50.8 | model |
Focus-DETR-R101-4scale | R-101 | IN1k | 24 | 100 | 51.2 | model |
Focus-DETR-R101-4scale | R-101 | IN1k | 36 | 100 | 51.4 | model |
Name | Backbone | Pretrain | Epochs | Denoising Queries | box AP |
download |
---|---|---|---|---|---|---|
Focus-DETR-Swin-T-224-4scale | Swin-Tiny-224 | IN1k | 12 | 100 | 50.0 | model |
Focus-DETR-Swin-T-224-4scale | Swin-Tiny-224 | IN1k | 24 | 100 | 51.2 | model |
Focus-DETR-Swin-T-224-4scale | Swin-Tiny-224 | IN1k | 36 | 100 | 52.5 | model |
Focus-DETR-Swin-T-224-4scale | Swin-Tiny-224 | IN22k to IN1k | 36 | 100 | 53.2 | model |
Focus-DETR-Swin-B-384-4scale | Swin-Base-384 | IN22k to IN1k | 36 | 100 | 56.2 | model |
Focus-DETR-Swin-L-384-4scale | Swin-Large-384 | IN22k to IN1k | 36 | 100 | 56.3 | model |
Note:
- Swin-X-384 means the backbone pretrained resolution is 384 x 384 and IN22k to In1k means the model is pretrained on ImageNet-22k and finetuned on ImageNet-1k.
Please refer to Installation Instructions for the details of installation.
All configs can be trained with:
cd detrex
python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --num-gpus 8
By default, we use 8 GPUs with total batch size as 16 for training.
Model evaluation can be done as follows:
cd detrex
python tools/train_net.py --config-file projects/focus_detr/configs/path/to/config.py --eval-only train.init_checkpoint=/path/to/model_checkpoint
If you find our work helpful for your research, please consider citing the following BibTeX entry.
@misc{zheng2023more,
title={Less is More: Focus Attention for Efficient DETR},
author={Dehua Zheng and Wenhui Dong and Hailin Hu and Xinghao Chen and Yunhe Wang},
year={2023},
eprint={2307.12612},
archivePrefix={arXiv},
primaryClass={cs.CV}
}