Skip to content

Latest commit

 

History

History
288 lines (217 loc) · 12.7 KB

README.md

File metadata and controls

288 lines (217 loc) · 12.7 KB

CMTCoop - Cross Modal Transformers for Cooperative perception

This work is based on the work from "Cross Modal Transformer: Towards Fast and Robust 3D Object Detection"

Introduction

CMT is a transformer-based robust 3D detector for end-to-end 3D multi-modal detection. This model is extended to cooperative perception in CMTCoop to perform deep multi-model multi-view feature fusion for 3D object detection. Through extensive, studies this work shows that the proposed model provides a mAP of 97.3% on multi-modal cooperative fusion (+6.2% increase over vehicular perception) and 96.7% on LiDAR only cooperative perception (CMTCoop-L) which runs at near-real time FPS, and a 2.1% performance gain over the current SoTA, BEVFusionCoop.


Preparation

Docker installation

Docker provides an easy way to deal with package dependencies. Use the Dockerfile provided to build the image.

docker build . -t cmt-coop

Then run the image with the following command

nvidia-docker run -it --rm \
    --ipc=host --gpus all \
    -v <Path_to_datasets>:/mnt/datasets \
    -v <Path_to_pretrained_models>:/home/pretrained \
    --name cmt-coop \
    cmt-coop bash

Manual Installation

Create an new environment with Anaconda or venv if required

conda create -n cmt-coop
conda activate cmt-coop

Install the following packages

  • Python == 3.8
  • CUDA == 11.1
  • pytorch == 1.9.1
  • mmcv-full == 1.6.2
  • mmdet == 2.28.2
  • mmsegmentation == 0.30.0
  • mmdet3d == 1.0.0rc6
  • spconv-cu111 == 2.1.21
  • flash-attn == 0.2.2
  • pypcd
  • open3d

Note that the repository was tested on the above versions, but may also work with later versions.

Dataset

Follow the mmdet3d to process the nuScenes dataset. This is only required to repeat tests on the CMT model.

The dataset links will be released soon.

Download the TUMTraf Dataset Development Kit and follow the instructions to split the TUMTraf intersection dataset into train and val sets.The TUMTraf cooperative dataset is already split into train and val sets.

${Root}
└── datasets
    ├── tumtraf_intersection_dataset
    |    └── train
    |    └── val
    └── tumtraf_cooperative_dataset
         └── train
         └── val

Finally ensure that the dataset folder has been soft linked to the CMTCoop/data folder.

ln -s /path_to_data_folder CMTCoop/data

Data preparation

The TUMTraf dataset must be converted from Openlabel format to be compatible with mmdet3D framework

TUMTraf Intersection Dataset

Run this script for data preparation:

python ./tools/create_data.py a9_nusc \\
--root-path /home/CMTCoop/data/tumtraf_intersection_dataset \\
--out-dir /home/CMTCoop/data/tumtraf_intersection_processed \\
--splits training,validation

After data preparation, you will be able to see the following directory structure:

├── data
│   ├── tumtraf_intersection_dataset
|   |   ├── train
|   |   ├── val
|   ├── tumtraf_intersection_processed
│   │   ├── a9_nusc_gt_database
|   |   ├── train
|   |   ├── val
│   │   ├── a9_nusc_infos_train.pkl
│   │   ├── a9_nusc_infos_val.pkl
│   │   ├── a9_nusc_dbinfos_train.pkl

TraffiX Cooperative Dataset

Run this script for data preparation:

python ./tools/create_data.py a9coop_nusc \\
--root-path /home/CMTCoop/data/tumtraf_cooperative_dataset \\
--out-dir /home/CMTCoop/data/tumtraf_cooperative_processed \\
--splits training,validation

After data preparation, you will be able to see the following directory structure:

├── data
│   ├── tumtraf_cooperative_dataset
|   |   ├── train
|   |   ├── val
|   ├── tumtraf_cooperative_processed
│   │   ├── a9_nusc_coop_gt_database
|   |   ├── train
|   |   ├── val
│   │   ├── a9_nusc_coop_infos_train.pkl
│   │   ├── a9_nusc_coop_infos_val.pkl
│   │   ├── a9_nusc_coop_dbinfos_train.pkl

Train & inference

# train
bash tools/dist_train.sh /path_to_your_config 8
# inference
bash tools/dist_test.sh /path_to_your_config /path_to_your_pth 8 --eval bbox

Main Results

Results on the TUMTraf cooperative validation set. The FPS is evaluated on a single RTX3080 GPU.

Evaluation Results of CMTCoop model on TUMTraf Cooperative Dataset Test Set

Domain Modality mAPBEV mAP3D Easy mAP3D Mod. mAP3D Hard mAP3D Avg.
Vehicle Camera 69.76 68.76 79.85 66.44 69.30
Vehicle LiDAR 88.17 87.94 88.53 71.99 84.72
Vehicle Cam+LiDAR 91.65 84.83 91.32 72.18 85.57
Infra. Camera 71.89 70.86 80.38 58.72 71.66
Infra. LiDAR 94.42 91.28 95.60 77.48 91.89
Infra. Camera + LiDAR 96.09 91.94 95.15 82.35 92.16
Coop. Camera 84.07 81.03 90.05 77.94 83.43
Coop. LiDAR 96.68 92.18 96.77 82.20 93.43
Coop. Camera + LiDAR 97.31 93.70 96.65 79.84 94.10

Evaluation Results of Infrastructure-only models on TUMTraf Intersection Dataset Test Set

Model FOV Modality mAP3D Easy mAP3D Mod. mAP3D Hard mAP3D Avg.
InfraDet3D South 1 LiDAR 75.81 47.66 42.16 55.21
BEVFusionCoop South 1 LiDAR 76.24 48.23 35.19 69.47
CMTCoop South 1 LiDAR 80.62 64.46 50.41 72.68
InfraDet3D South 2 LiDAR 38.92 46.60 43.86 43.13
BEVFusionCoop South 2 LiDAR 74.97 55.55 39.96 69.94
CMTCoop South 2 LiDAR 79.34 60.81 45.53 70.31
InfraDet3D South 1 Camera + LiDAR 67.08 31.38 35.17 44.55
BEVFusionCoop South 1 Camera + LiDAR 75.68 45.63 45.63 66.75
CMTCoop South 1 Cam+LiDAR 80.86 61.37 45.32 70.65
InfraDet3D South 2 Camera + LiDAR 58.38 19.73 33.08 37.06
BEVFusionCoop South 2 Camera + LiDAR 74.73 53.46 41.96 66.89
CMTCoop South 2 Cam+LiDAR 78.92 52.67 39.76 67.21

Visualization

Performance of Vehicular only model (CMT) from infrastructure perspective (left) and vehicular perspective (right)

Performance of Cooperative model (CMTCoop - left) vs. Vehicular only model (CMT - right) from infrastructure perspective.

Resource

Refer the following links for other resources related to this project:

Citation

Please consider citing the original work on CMT if you find this work helpful.