You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training set and testing set are first mapped from RGB space to YUV. Then, a global contrast normalization is applied on Y channel as well as a local contrast normalization to emphasize edges [1]. U and V channels are left unchanged as they are not used in the calculation below.
Architecture
The neural network architecture we used for the gtsrb challenge is based on the paper from Yann Lecun et al[1].
It is a classical 2-stage Neural Network but the output of the first stage is down-sampled and fully connected to the final 1-layer classifier. This allows the use of multi-scaled features and provides different scales of receptive fields to the classifier. Each stage is composed of a Tanh module to introduce non-linearity, an LP-pooling layer (a biologically inspired pooling layer modelled on complex cells [2]), and a substractive normalization layer.
Training
Here are some choices we made for the experiment :
32 filters in the first stage, 64 in the second and 100 neurons in the hidden layer of the classifier : best tradeoff between the hardware we have and the time we need to train the network
only use the Y channel of the images : Yann Lecun showed that the results were slightly better when we only used the Y channel, color can bring some information in very few cases.
4 different architectures were trained :
1-hidden layer neural network (64 neurons)
2-hidden layer neural network (32-32 neurons)
Single-scale convnet (32-64-100 features)
Multi-scale convnet (32-64-100 features)
Test
Convolutionnal networks are first tested using the test set and its corresponding labels.
The output features of the cNN are then fed into a 1NN classifier : the barycenter of each class is computed in the feature space and the test examples are tested against the set of these averaged training examples.
Results
The tests/trainings were conducted on a Macbook Pro late 2011 (2,2 GHz Intel Core i7 8go RAM).
cNN
The best results we got for these architectures are :
Set
1HL-NN
2HL-NN
SS architecture
MS architecture
Training
93.9%
89.18%
99.941 %
99.964 %
Testing
80.24%
77.61%
95.8 %
96 %
Epochs
64
80
16
16
Training time
~ 20 mins
~ 20 mins
~ 4 hours
~ 4 hours
The accuracy is better using convolutionnal network but the training time is also 12 times longer. However, we had 99% training accuracy after only 3 epochs for cNN models. The optimization method is SGD and the learning rate is hand-adjusted after each epoch looking at the objective function.
Below, a representation of the 32 filters of the first stage of the trained MS model. The filters have been scaled up as they initially are 5x5. On the right, this is the output features of the first stage for the same input.
1NN Classifier
We then used the cNN as feature extractors to feed a 1-Nearest Neighbor classifier :
Architecture
1st stage
2nd stage
SS
43.93%
58.8%
MS
44.54%
54.53%
The results are consistent as the more deeper we go in the network, the more abstract the representation of the features are. MS network's second stage yields worse results because it is composed of features from the first and second stage : the dimensionality is higher and the level of abstraction is somehow decreased.
Conclusion
The results are a little worse than those in the paper, we had bad accuracy on under-representated classes in the training set, which lower the overall accuracy. For example, class 0 has only 210 examples whereas class 1 contains 2220 examples. As a result, the accuracy for class 0 is only 60%. However, the multi-scale architecture seems to yield the best results of all methods.
Future work
Use of a more homogeneous training set with the same number of examples per class
GPU implementation
Learning rate annealing
Change optimization methods (lgbs, cg, …)
Try different architectures
Try with YUV channels, changing pre-processing, etc ...
Reference
Sermanet, P., & LeCun, Y. (2011, July). Traffic sign recognition with multi-scale convolutional networks. In Neural Networks (IJCNN), The 2011 International Joint Conference on (pp. 2809-2813). IEEE.
Hyvärinen, A., & Köster, U. (2007). Complex cell pooling and the statistics of natural images. Network: Computation in Neural Systems, 18(2), 81-100.
The text was updated successfully, but these errors were encountered:
GTSRB Challenge
Database
Preprocessing
The training set and testing set are first mapped from RGB space to YUV. Then, a global contrast normalization is applied on Y channel as well as a local contrast normalization to emphasize edges [1]. U and V channels are left unchanged as they are not used in the calculation below.
Architecture
The neural network architecture we used for the gtsrb challenge is based on the paper from Yann Lecun et al[1].
It is a classical 2-stage Neural Network but the output of the first stage is down-sampled and fully connected to the final 1-layer classifier. This allows the use of multi-scaled features and provides different scales of receptive fields to the classifier. Each stage is composed of a Tanh module to introduce non-linearity, an LP-pooling layer (a biologically inspired pooling layer modelled on complex cells [2]), and a substractive normalization layer.
Training
Here are some choices we made for the experiment :
4 different architectures were trained :
Test
Convolutionnal networks are first tested using the test set and its corresponding labels.
The output features of the cNN are then fed into a 1NN classifier : the barycenter of each class is computed in the feature space and the test examples are tested against the set of these averaged training examples.
Results
The tests/trainings were conducted on a Macbook Pro late 2011 (2,2 GHz Intel Core i7 8go RAM).
cNN
The best results we got for these architectures are :
The accuracy is better using convolutionnal network but the training time is also 12 times longer. However, we had 99% training accuracy after only 3 epochs for cNN models. The optimization method is SGD and the learning rate is hand-adjusted after each epoch looking at the objective function.
Below, a representation of the 32 filters of the first stage of the trained MS model. The filters have been scaled up as they initially are 5x5. On the right, this is the output features of the first stage for the same input.
1NN Classifier
We then used the cNN as feature extractors to feed a 1-Nearest Neighbor classifier :
The results are consistent as the more deeper we go in the network, the more abstract the representation of the features are. MS network's second stage yields worse results because it is composed of features from the first and second stage : the dimensionality is higher and the level of abstraction is somehow decreased.
Conclusion
The results are a little worse than those in the paper, we had bad accuracy on under-representated classes in the training set, which lower the overall accuracy. For example, class 0 has only 210 examples whereas class 1 contains 2220 examples. As a result, the accuracy for class 0 is only 60%. However, the multi-scale architecture seems to yield the best results of all methods.
Future work
Reference
The text was updated successfully, but these errors were encountered: