Contrastive Learning for Lane Detection via Cross-Similarity
Contrastive Learning for Lane Detection via cross-similarity (CLLD), is a self-supervised learning method that tackles this challenge by enhancing lane detection models’ resilience to real-world conditions that cause lane low visibility. CLLD is a novel multitask contrastive learning that trains lane detection approaches to detect lane markings even in low visible situations by integrating local feature contrastive learning (CL) with our new proposed operation cross-similarity. To ease of understanding some details are listed in the following:
- CLLD employs similarity learning to improve the performance of deep neural networks in lane detection, particularly in challenging scenarios.
- The approach aims to enhance the knowledge base of neural networks used in lane detection.
- Our experiments were carried out using
ImageNet
as a pretraining dataset. We employed pioneering lane detection models likeRESA
,CLRNet
, andUNet
, to evaluate the impact of our approach on model performances.
-
Clone the repository
git clone https://github.com/sabadijou/clld_official.git
We call this directory as
$RESA_ROOT
-
Create an environment and activate it (We've used conda. but it is optional)
conda create -n clld python=3.9 -y conda activate clld
-
Install dependencies
# Install pytorch firstly, the cudatoolkit version should be same in your system. (you can also use pip to install pytorch and torchvision) conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch # Install kornia and einops pip install kornia pip install einops # Install other dependencies pip install -r requirements.txt
We conducted pretraining using the training data from ImageNet
. However, you are free to utilize other datasets and configurations as needed. The configuration file for our approach can be found in the configs
folder.
Once the dataset and new configurations are in place, you can execute the approach using the following command:
python main.py --dataset_path /Imagenet/train --encoder resnet50 --alpha 1 --batch_size 1024 --world_size 1 --gpus_id 0 1
The following is a quick guide on arguments:
dataset_path
: Path to training data directoryencoder
: Select an encoder for training.resnet18
,resnet34
,resnet50
,resnet101
,resnet152
,resnext50_32x4d
,resnext101_32x8d
,wide_resnet50_2
,wide_resnet101_2
.alpha
: Cross similarity window sizebatch_size
: Select a batch size that suits the GPU infrastructure you are using.world_size
: For example, if you are training a model on a single machine with 4 GPUs, the world size is 4. If you have 2 machines, each with 4 GPUs, and you use all of them for training, the world size would be 8.gpus_id
: Please specify all the GPU IDs that you used for training the approach.
Upon completing the training phase, you can execute the command below to prepare the trained weights for use as prior knowledge in the backbone of a lane detection model.
python main.py --checkpoint path/to/checkpoint --encoder resnet50
We specifically chose to evaluate CLLD with U-Net because it is a common encoder-decoder architecture used in various methods that approach lane detection as a segmentation-based problem. In addition, we tested our method using RESA, which is currently the state-of-the-art semantic segmentation lane detection method that is not based on the UNet architecture.This independent validation is necessary to ensure the accuracy of our results. Lastly, we evaluated CLLD using CLRNet, a leading anchor-based lane detection method.
Performance of UNet on CuLane and TuSimple with different contrastive learnings.
Method | # Epoch | Precision (CuLane) | Recall (CuLane) | F1-measure (CuLane) | Accuracy (TuSimple) |
---|---|---|---|---|---|
PixPro | 100 | 73.68 | 67.15 | 70.27 | 95.92 |
VICRegL | 300 | 67.75 | 63.43 | 65.54 | 93.58 |
DenseCL | 200 | 63.8 | 58.4 | 60.98 | 96.13 |
MoCo-V2 | 200 | 63.08 | 57.74 | 60.29 | 96.04 |
CLLD (α=1) | 100 | 71.98 | 69.2 | 70.56 | 95.9 |
CLLD (α=2) | 100 | 70.69 | 69.36 | 70.02 | 95.98 |
CLLD (α=3) | 100 | 71.31 | 69.59 | 70.43 | 96.17 |
Performance of RESA on CuLane and TuSimple with different contrastive learnings.
Method | # Epoch | Precision (CuLane) | Recall (CuLane) | F1-measure (CuLane) | Accuracy (TuSimple) |
---|---|---|---|---|---|
PixPro | 100 | 77.41 | 73.69 | 75.51 | 96.6 |
VICRegL | 300 | 76.27 | 69.58 | 72.77 | 96.18 |
DenseCL | 200 | 77.67 | 73.51 | 75.53 | 96.28 |
MoCo-V2 | 200 | 78.12 | 73.36 | 75.66 | 96.56 |
CLLD (α=1) | 100 | 79.01 | 72.99 | 75.88 | 96.74 |
CLLD (α=2) | 100 | 78 | 73.45 | 75.66 | 96.78 |
CLLD (α=3) | 100 | 78.34 | 74.29 | 76.26 | 96.81 |
Performance of CLRNet on CLRNet and TuSimple with different contrastive learnings.
Method | # Epoch | Precision (CuLane) | Recall (CuLane) | F1-measure (CuLane) | Accuracy (TuSimple) |
---|---|---|---|---|---|
PixPro | 100 | 89.19 | 70.39 | 78.67 | 93.88 |
VICRegL | 300 | 87.72 | 71.15 | 78.72 | 89.01 |
DenseCL | 200 | 88.07 | 69.67 | 77.8 | 85.15 |
MoCo-V2 | 200 | 88.91 | 71.02 | 78.96 | 93.87 |
CLLD (α=1) | 100 | 88.72 | 71.33 | 79.09 | 90.68 |
CLLD (α=2) | 100 | 87.95 | 71.44 | 78.84 | 93.48 |
CLLD (α=3) | 100 | 88.59 | 71.73 | 79.27 | 94.25 |