This repository currently under development
YOLO3D is inspired by Mousavian et al. in their paper 3D Bounding Box Estimation Using Deep Learning and Geometry. YOLO3D uses a different approach, as the detector uses YOLOv5 which previously used Faster-RCNN, and Regressor uses ResNet18/VGG11 which was previously VGG19.
YOLO3D use hydra as the config manager; please follow official website or ashleve/lightning-hydra-template.
You can use pretrained weight from Release, you can download it using script get_weights.py
:
# download pretrained model
python script/get_weights.py \
--tag v0.1 \
--dir ./weights
Inference with inference.py
:
python inference.py \
source_dir="./data/demo/videos/2011_09_26/image_02/data" \
detector.model_path="./weights/detector_yolov5s.pt" \
regressor_weights="./weights/mobilenetv3-best.pt"
There are two models that will be trained here: detector and regressor. For now, the detector model that can be used is only YOLOv5, while the regressor model can use all models supported by Torchvision.
For now, YOLO3D only supports the KITTI dataset. Going forward, we will try to add support to the Lyft and nuScene datasets.
You can download KITTI dataset from official website. After that, extract dataset to data/KITTI
. Since we will be using two models, it is highly recommended to rename images_2
to images
.
.
├── data
│ └── KITTI
│ ├── calib
│ ├── images # original images_2
│ └── labels_2
The kitti label format on labels
is different from the format required by the YOLO model. Therefore, we have to create a YOLO format from a KITTI format. The author has provided a script/kitti_to_yolo.py
that can be used.
python script/kitti_to_yolo.py \
--dataset_path ./data/KITTI \
--classes car, van, truck, pedestrian, cyclist \
--img_width 1224 \
--img_height 370
The script will generate a labels
folder containing the labels for each image in YOLO format.
.
├── data
│ └── KITTI
│ ├── calib
│ ├── images # original images_2
| ├── labels_2 # kitti labels
│ └── labels # yolo labels
The next thing is to generate a sets of images/labels training and validation, these sets are also used as partitions to divide the dataset. The author has provided a script/generate_sets.py
that can be used.
python script/generate_sets.py \
--images_path ./data/KITTI/images \
--dump_dir ./data/KITTI \
--postfix _yolo \
--train_size 0.8 \
--is_yolo
Right now author just use YOLOv5 model
For YOLOv5 training on a single GPU, you can use the command below:
cd yolov5
python train.py \
--data ../configs/detector/yolov5_kitti.yaml \
--weights yolov5s.pt \
--img 640
As for training on multiple GPUs, you can use the command below:
cd yolov5
python -m torch.distributed.launch \
--nproc_per_node 4 train.py \
--epochs 10 \
--batch 64 \
--data ../configs/detector/yolov5_kitti.yaml \
--weights yolov5s.pt \
--device 0,1,2,3
⚠️ Under development
You can use all the models available on Torchvision by adding some configuration to src/models/components/base.py
. The current author has provided ResNet18 and VGG11 which can be used directly.
python src/train.py \
experiment=sample
@misc{mousavian20173d,
title={3D Bounding Box Estimation Using Deep Learning and Geometry},
author={Arsalan Mousavian and Dragomir Anguelov and John Flynn and Jana Kosecka},
year={2017},
eprint={1612.00496},
archivePrefix={arXiv},
primaryClass={cs.CV}
}