-
Notifications
You must be signed in to change notification settings - Fork 2
Perception
The Image recognition is divided into the traffic light detection and the object detection.
For traffic light detection we use three sensors:
The traffic light detection is based on the evaluation of the semantic, RGB and depth image. For this purpose the mask for traffic lights is applied on the semantic image in order to obtain the contours of the traffic lights. With the contours the same pixels can be determined in the depth image, with which the distance to the traffic light can be determined. For the classification of the traffic light an artificial neural network is used. As input for this, the previously determined contours are cut out from the RGB image. The cropped areas are then classified using the ANN. From classification and the calculated distance the TrafficLightInfo is composed and sent to the vehicle control.
The basic structure of the traffic light detection is shown in the following figure.
As an output from the traffic light detection, the traffic light relevant for the vehicle is determined and transmitted as TrafficLightInfo to the VehicleControl.
The artificial neural network consists of four convolutional layers and a final dense layer. This architecture was chosen because it has a very high accuracy (about 99%) on our own generated dataset. Furthermore, this architecture is very simple and has only a few weights (about 1000). Due to the simple structure, the model can be trained quickly when the dataset changes. For the current dataset, the weights are stored in the repo. The exact sequence of the layers can be seen in the following diagram.
Unfortunately, the recognition of traffic lights does not work well on Town10, this is because the traffic lights from this city are underrepresented. For better recognition of these traffic lights, new images would need to be added to the dataset and then the model would need to be re-trained.
For object detection we use three sensors:
In object detection, vehicles and pedestrians are detected by evaluating semantic and depth images. This is done by the detection of contours with the corresponding masks on the semantic image. For this purpose, the depth image is converted into a local point cloud and the contours can be used to determine the relative position of the objects. In addition to the detection the objects are tracked with the help of an object tracker. The object detection sends as output to the vehicle control the information of the identifier, the object class and the relative position of each detected object.
Following GIF outlines how the object detection works: