Skip to content

Latest commit

 

History

History
242 lines (191 loc) · 13 KB

File metadata and controls

242 lines (191 loc) · 13 KB

IITISoC-23-IVR1-LaneDetection-using-LimitedComputationPower

Goal :

To Develop a robust lane detection pipeline that consumes meager computational resources (No GPU allowed, limited CPU and RAM usage) and could be deployed on NVIDIA Jetson Nano Board or even a Raspberry Pi board

People Involved :

Mentors:

Members:

Outline :

This repository contains the implementation of a lane detection system using two different approaches. The main goal of this project was to understand the core fundamentals of lane detection in images. The two approaches utilized are as follows:

  • Approach-1 - Foundation Approach( Lane Detection using Canny Egde and Hough transform)
  • Approach-2 - Advanced approach with Deep Learning.

Approach-1 :

In this approach, traditional computer vision techniques were employed to detect lanes in an image. The input pipeline for Approach 1 consists of a sequence of techniques applied to the input image to detect lane lines. Each step in the pipeline is essential for accurate and reliable lane detection. Sequence is as follows:

  • Preprocessing : The input image was preprocessed to enhance lane features and reduce noise.

  • Canny Edge Detection : The Canny edge detection algorithm was applied to extract edges in the image.

  • Region of Interest selection : When finding lane lines, we don’t need to check the cloud and the mountains in a image. Thus the objective of this technique is to concentrate on the region of interest for us that is the road and lane on the road

  • Hough Transform : The Probabilistic Hough transform algorithm was used to detect lines in the edge-detected image, which represent potential lane markings.

  • Post-processing : The detected lines were further processed to combine and extend them to form complete lane boundaries.

Predicted Results :

Disadvantages :

  • Fails in complex curved roads.
  • Does not give satisfactory results in rainy and foggy environment.

Approach-2 :

In this advanced approach, we have explored the effectiveness of deep learning models, including Convolutional Neural Networks (CNNs) and Transformer models, for accurate and efficient lane detection. While CNNs are widely known for their image analysis capabilities, we also investigated the potential of Transformer-based models, such as LSTR (Lane Shape Transformer), which are specifically designed for sequence-to-sequence tasks like lane detection.

Model Selection :

We carefully curated and tested several state-of-the-art deep learning models for lane detection. The following models were among those evaluated :

  • 3 CNN Models : We explored three different CNN architectures that we found through YouTube and GitHub. These models were chosen for their effectiveness in image analysis tasks and had demonstrated promising results in lane detection scenarios.

  • YOLOP and YOLOPv2 : We experimented with YOLOP and its upgraded version, YOLOPv2, which are well-known for their real-time object detection capabilities. We adapted these models for lane detection and evaluated their performance.

  • HybridNets : HybridNets is a popular deep learning architecture specifically designed for lane detection. We examined its performance and capabilities for detecting complex lane geometries.

  • LSTR(Lane Shape Transformer) : While the Transformer-based model LSTR showed impressive frames per second (fps) performance, we found that its detection results did not meet our expectation.In this Transformer-based lane detection architecture, the model consists of several key components:
    • Backbone: The backbone extracts low-resolution features from the input image I and converts them into a sequence S by collapsing the spatial dimensions.
    • Reduced Transformer Network: The sequence S, along with positional embeddings Ep, is fed into the transformer encoder to produce a representation sequence Se. The transformer is responsible for capturing long- range dependencies and interactions within the sequence.
    • Decoder: The decoder generates an output sequence Sd by attending to an initial query sequence Sq and a learned positional embedding ELL, which implicitly learns positional differences. The decoder computes interactions with Se and Ep to attend to related features.
    • Feed-forward Networks (FFNs): Several feed-forward networks are employed to directly predict the parameters of proposed lane outputs.
    • Hungarian Loss: The model utilizes the Hungarian Loss, a specific loss function tailored for lane detection tasks, to optimize the parameters and ensure accurate lane predictions.
    • The architecture leverages the power of the transformer model for sequence-to-sequence tasks, allowing for more effective lane detection, especially in scenarios involving curved lanes and complex lane geometries.

Predicted Results :

the result for the three CNN models, YOLOP, YOLOPv2, LSTR and hybridnets can be found here

System Specifications :

All the work is done in 3 devices(2 of which are same) namely asus vivobook 15 pro and HP Pavilion Gaming 15 ec2008AX. HP Pavilion Gaming 15 ec2008AX has Processor AMD Hexa Core Ryzen 5 5600H, RAM 8 GB DDR4 RAM meanwhile the vivobook 15 pro is a 12th Gen Intel Core H-series processors with 16 GB of LPDDR5 RAM, and an NVIDIA ® GeForce ® RTX ™ 3050 Ti GPU

Model Evaluation :

During our comprehensive testing, we considered multiple deep learning architectures, such as CNNs, HybridNets, YOLOP, YOLOPv2, and LSTR. Each model underwent rigorous evaluation using performance metrics like Mean Average Precision (mAP), Intersection over Union (IoU), and inference speed (fps), precision, recall, f1 score.
Comparison of 3 CNN Models

Model Parameters Size(KB) Precision Recall F1 Score FPS Dice coefficient IoU
CNN 2 129,498 580.15 0.939 0.747 0.8327 12 0.8327 0.72
CNN 3 181,693 150.02 0.980 0.731 0.837 15 0.837 0.72
CNN 1 125,947 55 0.97 0.984 0.99 5 0.987 0.976

Graphical representation of comparison of models

Untitled design

Comparison of YOLOP, YOLOPV2, Hybridnets :

Model Parameters(million) Size(KB) Accuracy IoU(Lane line) IoU(Drivable area) FPS
YOLOP 7.9 31,763 0.70 0.262 0.91 10
YOLOPv2 48.64 38,955 0.87 0.27 0.93 41
Hybridnets 13 54,482 0.85 0.31 0.95 12

Graphical representation of comparison of models

graphical_comparison

Visualization

model used in all three are trained on the BDD100k dataset. Comparison.

A glimpse of the inference we obtained on our campus videos

Model Quantization:

In our pursuit of finding a balance between accuracy and computational efficiency, we explored the post-training quantization technique for one of the satisfactory models, YOLOP, which also boasts a simpler architecture compared to YOLOPv2. We chose YOLOP for quantization due to its smaller number of parameters and model size, making it more amenable to this process. In conclusion, the implementation of post-training quantization on YOLOP demonstrated its viability as an optimized solution for lane detection with limited computation power. This approach allows us to achieve near-comparable accuracy to the original model, YOLOP, while benefiting from reduced parameters and model size, thus making it well-suited for deployment in resource-constrained environments.

Deployment and Future Improvements

After post-training static quantization we end up reducing our model size and get a balance between accuracy and computational efficiency. Now we are ready to deploy it on an edge computing device like te NVIDIA Jetson Xavier. In the future, we plan to deploy our lane detection pipeline on the NVIDIA Xavier platform, a powerful and energy-efficient system-on-a-chip (SoC) designed for edge computing and AI applications. The NVIDIA Xavier's advanced architecture and computational capabilities make it an ideal candidate for running deep learning models, even in real-time scenarios. The successful deployment on Xavier will pave the way for scalable and practical integration of our lane detection solution in various real-time applications.

Refrerences :

[1] MLND Capstone project for Udacity's Machine Learning Nanodegree, (2017), Github reposoitory, https://github.com/mvirgo/MLND-Capstone

[2] Pytorch Profiler ,PyTorch Recipes[ + ], https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html

[3] Vision Transformers for Computer Vision[+],https://towardsdatascience.com/using-transformers-for-computer-vision-6f764c5a078b

[4] Chuan-en Lin (2018 , Dec. 17). “Tutorial: Build a lane detector”, Available: https://towardsdatascience.com/tutorial-build-a-lane-detector-679fd8953132.[Apr 06, 2019]\

[5] article : https://link.springer.com/article/10.1007/s11633-022-1339-y

[6]HybridNets model, Original Paper : https://arxiv.org/abs/2203.09035

[7] ibaiGorordo / ONNX-LSTR-Lane-Detection(2021), Github repository, https://github.com/ibaiGorordo/ONNX-LSTR-Lane-Detection

[8]CAIC-AD / YOLOPv2(2022), Github Repository, https://github.com/CAIC-AD/YOLOPv2

[9]For quantisation : https://www.researchgate.net/publication/372248473_Q-YOLOP_Quantization-aware_You_Only_Look_Once_for_Panoptic_Driving_Perception