- Developed an object detection model that identifies the presence, location, and type of one or more U.S. traffic signs in a given photograph using the LISA dataset — currently the largest known dataset of U.S. traffic signs with 47 different types of U.S. traffic signs.
- This group project involved exploratory data analysis (EDA), annotation preprocessing, memory reduction, parameter tuning, and training and testing a convolutional neural network-based object detection algorithm (YOLOv3) for maximum average precisions (AP) for each type of U.S. traffic sign, maximum average intersection of union (IoU), and maximum mean average precision (mAP). Our final (best) results were a maximum avg. IoU of 68% and a maximum mAP of 65%.
- Best Predictions:
- Stop (97%)
- Signal Ahead (97%)
- Pedestrian Crossing best (97%)
- Worst Predictions:
- Speed Limit 50 (9%)
- Merge Right (33%)
- Speed Limit 40 (47%)
- The LISA Traffic Sign Dataset is a set of annotated frames of US traffic signs.
- The dataset has the following characteristics:
- 47 types of signs (classes)
- 7855 annotations (signs) on 6610 frames (images)
- Sign sizes from 6x6 to 167x168 pixels
- Image sizes from 640x480 to 1024x522 pixels
- Images vary between color and grayscale
- Annotations include sign type, position, size, occluded (yes/no), on side road (yes/no).
-
Convert images from png to jpg
-
For each image, exported an annotation file
-
Update configuration file to train our dataset
-
Processed data from LISA dataset
- Created files that stored information about location of objects in images
-
Made sure all file paths to images were accurate
- Trained on 75% of the images in LISA dataset.
- We utilized Darknet, a wrapper around the YOLOv3 algorithm.
- GOOGLE COLAB WAS OUR SAVIOR!
- We conducted training in Google Colab for hosted GPU connected runtime.
- Decreased runtime by a factor of 30 relative to running locally
- Uploaded all images and their annotations to Google Drive
- Saved weights to local machine every 100 iterations
- We conducted training in Google Colab for hosted GPU connected runtime.
- Darknet outputs a weights file which holds information about important features
- We trained our model for 3000 iterations.
- Recommended iterations = 2000 * classes
- Unable to do so due to the time contraints imposed by our professor
- Recommended iterations = 2000 * classes
- YOLOv3 is the latest version of the "You Only Look Once" (YOLO) convolutional neural network (CNN) algorithm used for object detection. Overall, the object detection consists of determining the location of the target objects in an image and classifying those objects. YOLO takes in an image as input, passes it through a CNN, and outputs a vector of bounding boxes and class predictions.
- For an in-depth analysis on YOLOv3 theory (that's relatively digestible), we HIGHLY recommend these two links:
- Darknet is a wrapper around the YOLOv3 Convolutional Neural Network algorithm. More specifically, it is an open source neural network framework written in C and CUDA. It is fast, easy to install, and supports CPU and GPU computation.
- We used the following link for Darknet implementation instructions. When we read through the README.md file, we click on bullet point (6) in order to learn how to train our model to detect custom objects.
- For addition details: https://pjreddie.com/darknet/yolo/
- We used the following link for Darknet implementation instructions. When we read through the README.md file, we click on bullet point (6) in order to learn how to train our model to detect custom objects.
Nishant Sinha | Arjun Mitra |
Patrick Condie | Jose Canela |
Andreas Møgelmose, Mohan M. Trivedi, and Thomas B. Moeslund, "Vision based Traffic Sign Detection and Analysis for Intelligent Driver Assistance Systems: Perspectives and Survey," IEEE Transactions on Intelligent Transportation Systems, 2012.
- Our model can be implemented to assess traffic signs from cars in real time
- However, need to improve the classification of low quantity signs
- Future suggestions:
- Normalize all images to grayscale
- Oversample images to fix class imbalance through image augmentation
- Blur images (Gaussian Blur)
- Sharpening images (Unsharp Mask)
- Focus on training specific classes for higher accuracy
- Factoring occluded signs into the model differently