Skip to content

Latest commit

 

History

History
162 lines (121 loc) · 8.73 KB

README.md

File metadata and controls

162 lines (121 loc) · 8.73 KB

Quick start

Setup
pip install -r requirements.txt

The following is the corresponding torch and torchvision versions.

rtdetr torch torchvision
- 2.4 0.19
- 2.2 0.17
- 2.1 0.16
- 2.0 0.15
Fig
image

Model Zoo

Base models

Model Dataset Input Size APval AP50val #Params(M) FPS config checkpoint
RT-DETRv2-S COCO 640 48.1 (+1.6) 65.1 20 217 config url
RT-DETRv2-M* COCO 640 49.9 (+1.0) 67.5 31 161 config url
RT-DETRv2-M COCO 640 51.9 (+0.6) 69.9 36 145 config url
RT-DETRv2-L COCO 640 53.4 (+0.3) 71.6 42 108 config url
RT-DETRv2-X COCO 640 54.3 72.8 (+0.1) 76 74 config url

Notes:

  • AP is evaluated on MSCOCO val2017 dataset.
  • FPS is evaluated on a single T4 GPU with $batch\_size = 1$, $fp16$, and $TensorRT>=8.5.1$.
  • COCO + Objects365 in the table means finetuned model on COCO using pretrained weights trained on Objects365.

Models of discrete sampling

Model Sampling Method APval AP50val config checkpoint
RT-DETRv2-S_dsp discrete_sampling 47.4 64.8 (-0.1) config url
RT-DETRv2-M*_dsp discrete_sampling 49.2 67.1 (-0.4) config url
RT-DETRv2-M_dsp discrete_sampling 51.4 69.7 (-0.2) config url
RT-DETRv2-L_dsp discrete_sampling 52.9 71.3 (-0.3) config url

Notes:

  • The impact on inference speed is related to specific device and software.
  • *_dsp* is the model inherit *_sp* model's knowledge and adapt to discrete_sampling strategy. You can use TensorRT 8.4 (or even older versions) to inference for these models

Ablation on sampling points

Model Sampling Method #Points APval AP50val checkpoint
rtdetrv2_r18vd_sp1 grid_sampling 21,600 47.3 64.3 (-0.6) url
rtdetrv2_r18vd_sp2 grid_sampling 43,200 47.7 64.7 (-0.2) url
rtdetrv2_r18vd_sp3 grid_sampling 64,800 47.8 64.8 (-0.1) url
rtdetrv2_r18vd(_sp4) grid_sampling 86,400 47.9 64.9 url

Notes:

  • The impact on inference speed is related to specific device and software.
  • #points the total number of sampling points in decoder for per image inference.

Usage

details
  1. Training
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config --use-amp --seed=0 &> log.txt 2>&1 &
  1. Testing
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -r path/to/checkpoint --test-only
  1. Tuning
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --master_port=9909 --nproc_per_node=4 tools/train.py -c path/to/config -t path/to/checkpoint --use-amp --seed=0 &> log.txt 2>&1 &
  1. Export onnx
python tools/export_onnx.py -c path/to/config -r path/to/checkpoint --check
  1. Inference

Support torch, onnxruntime, tensorrt and openvino, see details in references/deploy

python references/deploy/rtdetrv2_onnx.py --onnx-file=model.onnx --im-file=xxxx
python references/deploy/rtdetrv2_tensorrt.py --trt-file=model.trt --im-file=xxxx
python references/deploy/rtdetrv2_torch.py -c path/to/config -r path/to/checkpoint --im-file=xxx --device=cuda:0

Citation

If you use RTDETR or RTDETRv2 in your work, please use the following BibTeX entries:

bibtex
@misc{lv2023detrs,
      title={DETRs Beat YOLOs on Real-time Object Detection},
      author={Wenyu Lv and Shangliang Xu and Yian Zhao and Guanzhong Wang and Jinman Wei and Cheng Cui and Yuning Du and Qingqing Dang and Yi Liu},
      year={2023},
      eprint={2304.08069},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{lv2024rtdetrv2improvedbaselinebagoffreebies,
      title={RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer}, 
      author={Wenyu Lv and Yian Zhao and Qinyao Chang and Kui Huang and Guanzhong Wang and Yi Liu},
      year={2024},
      eprint={2407.17140},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2407.17140}, 
}