-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Normalization Bug #28
Comments
Hi, Guillem! You are right there! It could be a bug. As of the implication of this issue, I think it has little effects because, in Ladder, we care about the validation/test classification error, and in test time, it should not really affect the encoder output, AFAIK. The main task in Ladder should be the Encoder output, and decoder is there to support the main task via an auxiliary training task. In Tagger, the story might be different. While on this, I would like to let you know there are Ladder implementations in Tensorflow: https://github.com/rinuboney/ladder We do not have public Tagger implementation in Tensorflow. But if you have any trouble implementing it, please ping me here or at the Tagger repo. |
Thanks for the reply! I had found already those repos. Actually, I'm using the rinuboney/ladder as reference for my tagger network but I'm stripping out all the bits and pieces which aren't required by tagger (denoising costs, corruption of input...). I noticed this same "bug" in the theano tagger network. There non of the batch-normalization steps keep track of the bn parameters even though the comment in FinalTestMonitoring in utils.py say so. Again not really a problem for the paper results since the network is never evaluated on-line. By the way, may I ask if using LayerNorm instead of BatchNorm is the recommended way to go as suggested by the results in v2 of the arxiv paper. It's much nicer to implement :) Thanks for your time, |
Hi I'm reimplementing the ladder and tagger networks in TensorFlow and have found what I believe to be a (minor) bug. Could you please clarify this?
Batch normalization (BN) parameters in the decoder (lines 488-494 in ladder.py) are not annotated as having role BNPARAM. As such, they are not replaced in the graph for the training set statistics by TestMonitoring._get_bn_params. If one attempts to evaluate the model after training with very small batch sizes performance is degraded because the mean and variance for the BN step is computed from a very small sample.
Thanks,
Guillem
The text was updated successfully, but these errors were encountered: