Batch Normalization Bug #28

SpectralGuilem · 2017-08-14T06:21:10Z

Hi I'm reimplementing the ladder and tagger networks in TensorFlow and have found what I believe to be a (minor) bug. Could you please clarify this?

Batch normalization (BN) parameters in the decoder (lines 488-494 in ladder.py) are not annotated as having role BNPARAM. As such, they are not replaced in the graph for the training set statistics by TestMonitoring._get_bn_params. If one attempts to evaluate the model after training with very small batch sizes performance is degraded because the mean and variance for the BN step is computed from a very small sample.

Thanks,
Guillem

hotloo · 2017-08-18T07:25:57Z

Hi, Guillem!

You are right there! It could be a bug.

As of the implication of this issue, I think it has little effects because, in Ladder, we care about the validation/test classification error, and in test time, it should not really affect the encoder output, AFAIK.

The main task in Ladder should be the Encoder output, and decoder is there to support the main task via an auxiliary training task. In Tagger, the story might be different.

While on this, I would like to let you know there are Ladder implementations in Tensorflow:

https://github.com/rinuboney/ladder
https://robromijnders.github.io/ladder/
https://github.com/tarvaina/tensorflow-ladder

We do not have public Tagger implementation in Tensorflow. But if you have any trouble implementing it, please ping me here or at the Tagger repo.

SpectralGuilem · 2017-08-18T08:32:14Z

Thanks for the reply!

I had found already those repos. Actually, I'm using the rinuboney/ladder as reference for my tagger network but I'm stripping out all the bits and pieces which aren't required by tagger (denoising costs, corruption of input...).

I noticed this same "bug" in the theano tagger network. There non of the batch-normalization steps keep track of the bn parameters even though the comment in FinalTestMonitoring in utils.py say so. Again not really a problem for the paper results since the network is never evaluated on-line.

By the way, may I ask if using LayerNorm instead of BatchNorm is the recommended way to go as suggested by the results in v2 of the arxiv paper. It's much nicer to implement :)

Thanks for your time,
Guillem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch Normalization Bug #28

Batch Normalization Bug #28

SpectralGuilem commented Aug 14, 2017 •

edited

Loading

hotloo commented Aug 18, 2017

SpectralGuilem commented Aug 18, 2017

Batch Normalization Bug #28

Batch Normalization Bug #28

Comments

SpectralGuilem commented Aug 14, 2017 • edited Loading

hotloo commented Aug 18, 2017

SpectralGuilem commented Aug 18, 2017

SpectralGuilem commented Aug 14, 2017 •

edited

Loading