Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gtsrb challenge -hugo duthil- #1

Open
Swiiip opened this issue Jul 9, 2015 · 0 comments
Open

gtsrb challenge -hugo duthil- #1

Swiiip opened this issue Jul 9, 2015 · 0 comments

Comments

@Swiiip
Copy link
Owner

Swiiip commented Jul 9, 2015

GTSRB Challenge

Database

  • 43 sign classes
  • 39209 training examples
  • 12630 testing examples

Preprocessing

The training set and testing set are first mapped from RGB space to YUV. Then, a global contrast normalization is applied on Y channel as well as a local contrast normalization to emphasize edges [1]. U and V channels are left unchanged as they are not used in the calculation below.

Architecture

The neural network architecture we used for the gtsrb challenge is based on the paper from Yann Lecun et al[1].

capture d ecran 2015-07-09 a 11 21 15

It is a classical 2-stage Neural Network but the output of the first stage is down-sampled and fully connected to the final 1-layer classifier. This allows the use of multi-scaled features and provides different scales of receptive fields to the classifier. Each stage is composed of a Tanh module to introduce non-linearity, an LP-pooling layer (a biologically inspired pooling layer modelled on complex cells [2]), and a substractive normalization layer.

Training

Here are some choices we made for the experiment :

  • 32 filters in the first stage, 64 in the second and 100 neurons in the hidden layer of the classifier : best tradeoff between the hardware we have and the time we need to train the network
  • only use the Y channel of the images : Yann Lecun showed that the results were slightly better when we only used the Y channel, color can bring some information in very few cases.

4 different architectures were trained :

  • 1-hidden layer neural network (64 neurons)
  • 2-hidden layer neural network (32-32 neurons)
  • Single-scale convnet (32-64-100 features)
  • Multi-scale convnet (32-64-100 features)

Test

Convolutionnal networks are first tested using the test set and its corresponding labels.

The output features of the cNN are then fed into a 1NN classifier : the barycenter of each class is computed in the feature space and the test examples are tested against the set of these averaged training examples.

Results

The tests/trainings were conducted on a Macbook Pro late 2011 (2,2 GHz Intel Core i7 8go RAM).

cNN

The best results we got for these architectures are :

Set 1HL-NN 2HL-NN SS architecture MS architecture
Training 93.9% 89.18% 99.941 % 99.964 %
Testing 80.24% 77.61% 95.8 % 96 %
Epochs 64 80 16 16
Training time ~ 20 mins ~ 20 mins ~ 4 hours ~ 4 hours

The accuracy is better using convolutionnal network but the training time is also 12 times longer. However, we had 99% training accuracy after only 3 epochs for cNN models. The optimization method is SGD and the learning rate is hand-adjusted after each epoch looking at the objective function.

Below, a representation of the 32 filters of the first stage of the trained MS model. The filters have been scaled up as they initially are 5x5. On the right, this is the output features of the first stage for the same input.

activations

1NN Classifier

We then used the cNN as feature extractors to feed a 1-Nearest Neighbor classifier :

Architecture 1st stage 2nd stage
SS 43.93% 58.8%
MS 44.54% 54.53%

The results are consistent as the more deeper we go in the network, the more abstract the representation of the features are. MS network's second stage yields worse results because it is composed of features from the first and second stage : the dimensionality is higher and the level of abstraction is somehow decreased.

Conclusion

The results are a little worse than those in the paper, we had bad accuracy on under-representated classes in the training set, which lower the overall accuracy. For example, class 0 has only 210 examples whereas class 1 contains 2220 examples. As a result, the accuracy for class 0 is only 60%. However, the multi-scale architecture seems to yield the best results of all methods.

Future work

  • Use of a more homogeneous training set with the same number of examples per class
  • GPU implementation
  • Learning rate annealing
  • Change optimization methods (lgbs, cg, …)
  • Try different architectures
  • Try with YUV channels, changing pre-processing, etc ...

Reference

  • Sermanet, P., & LeCun, Y. (2011, July). Traffic sign recognition with multi-scale convolutional networks. In Neural Networks (IJCNN), The 2011 International Joint Conference on (pp. 2809-2813). IEEE.
  • Hyvärinen, A., & Köster, U. (2007). Complex cell pooling and the statistics of natural images. Network: Computation in Neural Systems, 18(2), 81-100.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant