Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch Normalization Bug #28

Open
SpectralGuilem opened this issue Aug 14, 2017 · 2 comments
Open

Batch Normalization Bug #28

SpectralGuilem opened this issue Aug 14, 2017 · 2 comments

Comments

@SpectralGuilem
Copy link

SpectralGuilem commented Aug 14, 2017

Hi I'm reimplementing the ladder and tagger networks in TensorFlow and have found what I believe to be a (minor) bug. Could you please clarify this?

Batch normalization (BN) parameters in the decoder (lines 488-494 in ladder.py) are not annotated as having role BNPARAM. As such, they are not replaced in the graph for the training set statistics by TestMonitoring._get_bn_params. If one attempts to evaluate the model after training with very small batch sizes performance is degraded because the mean and variance for the BN step is computed from a very small sample.

Thanks,
Guillem

@hotloo
Copy link
Contributor

hotloo commented Aug 18, 2017

Hi, Guillem!

You are right there! It could be a bug.

As of the implication of this issue, I think it has little effects because, in Ladder, we care about the validation/test classification error, and in test time, it should not really affect the encoder output, AFAIK.

The main task in Ladder should be the Encoder output, and decoder is there to support the main task via an auxiliary training task. In Tagger, the story might be different.

While on this, I would like to let you know there are Ladder implementations in Tensorflow:

https://github.com/rinuboney/ladder
https://robromijnders.github.io/ladder/
https://github.com/tarvaina/tensorflow-ladder

We do not have public Tagger implementation in Tensorflow. But if you have any trouble implementing it, please ping me here or at the Tagger repo.

@SpectralGuilem
Copy link
Author

Thanks for the reply!

I had found already those repos. Actually, I'm using the rinuboney/ladder as reference for my tagger network but I'm stripping out all the bits and pieces which aren't required by tagger (denoising costs, corruption of input...).

I noticed this same "bug" in the theano tagger network. There non of the batch-normalization steps keep track of the bn parameters even though the comment in FinalTestMonitoring in utils.py say so. Again not really a problem for the paper results since the network is never evaluated on-line.

By the way, may I ask if using LayerNorm instead of BatchNorm is the recommended way to go as suggested by the results in v2 of the arxiv paper. It's much nicer to implement :)

Thanks for your time,
Guillem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants