accuracy drop #15

dhimanpd · 2017-05-25T14:38:33Z

Hi @kuza55,
When I use single GPU for training, the model trains with training accuracy of 99.99%. But when i use make_parallel. The training accuracy gets stuck at 96%.

Minimum Loss:
Single GPU: 0.0063
Multi GPU: 0.1213
The loss is also not dropping much.

I am training a multi-label classifier with resnet-50 with sigmoid layers in the end with binary crossentropy.

kuza55 · 2017-06-08T21:30:05Z

Did you divide your batch size by the number of GPUs you're using? Not really sure what else could be causing problems.

kuza55 · 2017-06-08T21:40:36Z

Sorry, I misspoke, what I meant to say was:

If you followed the instructions, you probably multiplied your batch size by the number of GPUs you're using. This increases perf, but does you know, make your batches bigger, which can result in worse performance.

If you kept the same batch size (and it is divisible), I would expect you to get the same performance.

Assuming this is the issue, you could try playing with other hyperparameters like the learning rate or dropout, etc.

dhimanpd · 2017-07-09T12:04:22Z

I tried same batch size, but still the accuracy drops. Did you try to reproduce any model trained on single-gpu?

DarkForte · 2017-08-22T05:37:28Z

Below is my opinion:
When you train multi gpu models with the same batch size as using one gpu, each gpu will see less training samples one time, so the gradient estimated from the samples will decrease. As a result, the accuracy will decrease as well.

When you enlarge batch size to n_gpu * batch_size, you will have less chance to adjust your model within one epoch. For example, if you have 1000 training samples, and you enlarge your batch size from 50 to 100 when you have 2 gpus, although each single gpu still sees 50 samples per iteration, you will have only half the chance to apply the gradients to your model.

It is in fact a big problem for parallel training. You can take a look at this paper: Training ImageNet in 1 hour. The first trick you can try is to increase your learning rate to n_gpu times as well. For a simplified explaination, you have larger batches, so your estimated gradient is more accurate. In this way, you could trust this gradient more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

accuracy drop #15

accuracy drop #15

dhimanpd commented May 25, 2017

kuza55 commented Jun 8, 2017

kuza55 commented Jun 8, 2017

dhimanpd commented Jul 9, 2017

DarkForte commented Aug 22, 2017

accuracy drop #15

accuracy drop #15

Comments

dhimanpd commented May 25, 2017

kuza55 commented Jun 8, 2017

kuza55 commented Jun 8, 2017

dhimanpd commented Jul 9, 2017

DarkForte commented Aug 22, 2017