gtsrb challenge -hugo duthil- #1

Swiiip · 2015-07-09T13:22:43Z

GTSRB Challenge

Database

43 sign classes
39209 training examples
12630 testing examples

Preprocessing

The training set and testing set are first mapped from RGB space to YUV. Then, a global contrast normalization is applied on Y channel as well as a local contrast normalization to emphasize edges [1]. U and V channels are left unchanged as they are not used in the calculation below.

Architecture

The neural network architecture we used for the gtsrb challenge is based on the paper from Yann Lecun et al[1].

It is a classical 2-stage Neural Network but the output of the first stage is down-sampled and fully connected to the final 1-layer classifier. This allows the use of multi-scaled features and provides different scales of receptive fields to the classifier. Each stage is composed of a Tanh module to introduce non-linearity, an LP-pooling layer (a biologically inspired pooling layer modelled on complex cells [2]), and a substractive normalization layer.

Training

Here are some choices we made for the experiment :

32 filters in the first stage, 64 in the second and 100 neurons in the hidden layer of the classifier : best tradeoff between the hardware we have and the time we need to train the network
only use the Y channel of the images : Yann Lecun showed that the results were slightly better when we only used the Y channel, color can bring some information in very few cases.

4 different architectures were trained :

1-hidden layer neural network (64 neurons)
2-hidden layer neural network (32-32 neurons)
Single-scale convnet (32-64-100 features)
Multi-scale convnet (32-64-100 features)

Test

Convolutionnal networks are first tested using the test set and its corresponding labels.

The output features of the cNN are then fed into a 1NN classifier : the barycenter of each class is computed in the feature space and the test examples are tested against the set of these averaged training examples.

Results

The tests/trainings were conducted on a Macbook Pro late 2011 (2,2 GHz Intel Core i7 8go RAM).

cNN

The best results we got for these architectures are :

Set	1HL-NN	2HL-NN	SS architecture	MS architecture
Training	93.9%	89.18%	99.941 %	99.964 %
Testing	80.24%	77.61%	95.8 %	96 %
Epochs	64	80	16	16
Training time	~ 20 mins	~ 20 mins	~ 4 hours	~ 4 hours

The accuracy is better using convolutionnal network but the training time is also 12 times longer. However, we had 99% training accuracy after only 3 epochs for cNN models. The optimization method is SGD and the learning rate is hand-adjusted after each epoch looking at the objective function.

Below, a representation of the 32 filters of the first stage of the trained MS model. The filters have been scaled up as they initially are 5x5. On the right, this is the output features of the first stage for the same input.

1NN Classifier

We then used the cNN as feature extractors to feed a 1-Nearest Neighbor classifier :

Architecture	1st stage	2nd stage
SS	43.93%	58.8%
MS	44.54%	54.53%

The results are consistent as the more deeper we go in the network, the more abstract the representation of the features are. MS network's second stage yields worse results because it is composed of features from the first and second stage : the dimensionality is higher and the level of abstraction is somehow decreased.

Conclusion

The results are a little worse than those in the paper, we had bad accuracy on under-representated classes in the training set, which lower the overall accuracy. For example, class 0 has only 210 examples whereas class 1 contains 2220 examples. As a result, the accuracy for class 0 is only 60%. However, the multi-scale architecture seems to yield the best results of all methods.

Future work

Use of a more homogeneous training set with the same number of examples per class
GPU implementation
Learning rate annealing
Change optimization methods (lgbs, cg, …)
Try different architectures
Try with YUV channels, changing pre-processing, etc ...

Reference

Sermanet, P., & LeCun, Y. (2011, July). Traffic sign recognition with multi-scale convolutional networks. In Neural Networks (IJCNN), The 2011 International Joint Conference on (pp. 2809-2813). IEEE.
Hyvärinen, A., & Köster, U. (2007). Complex cell pooling and the statistics of natural images. Network: Computation in Neural Systems, 18(2), 81-100.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gtsrb challenge -hugo duthil- #1

gtsrb challenge -hugo duthil- #1

Swiiip commented Jul 9, 2015

gtsrb challenge -hugo duthil- #1

gtsrb challenge -hugo duthil- #1

Comments

Swiiip commented Jul 9, 2015

GTSRB Challenge

Database

Preprocessing

Architecture

Training

Test

Results

cNN

1NN Classifier

Conclusion

Future work

Reference