-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
accuracy drop #15
Comments
Did you divide your batch size by the number of GPUs you're using? Not really sure what else could be causing problems. |
Sorry, I misspoke, what I meant to say was: If you followed the instructions, you probably multiplied your batch size by the number of GPUs you're using. This increases perf, but does you know, make your batches bigger, which can result in worse performance. If you kept the same batch size (and it is divisible), I would expect you to get the same performance. Assuming this is the issue, you could try playing with other hyperparameters like the learning rate or dropout, etc. |
I tried same batch size, but still the accuracy drops. Did you try to reproduce any model trained on single-gpu? |
Below is my opinion: When you enlarge batch size to It is in fact a big problem for parallel training. You can take a look at this paper: Training ImageNet in 1 hour. The first trick you can try is to increase your learning rate to |
Hi @kuza55,
When I use single GPU for training, the model trains with training accuracy of 99.99%. But when i use make_parallel. The training accuracy gets stuck at 96%.
Minimum Loss:
Single GPU: 0.0063
Multi GPU: 0.1213
The loss is also not dropping much.
I am training a multi-label classifier with resnet-50 with sigmoid layers in the end with binary crossentropy.
The text was updated successfully, but these errors were encountered: