Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with mscoco data set #61

Open
SINDHUN97 opened this issue Jun 26, 2020 · 5 comments
Open

Training with mscoco data set #61

SINDHUN97 opened this issue Jun 26, 2020 · 5 comments

Comments

@SINDHUN97
Copy link

  • SteganoGAN version or git commit: 0.1.3
  • Python version (output of python --version): 3.7.3
  • Pip version (output of pip --version):20.1.1
  • PyTorch version (output of python -c "import torch; print(torch.__version__)"):1.0.0
  • Operating System: Ubuntu Deep Learning instance(Instance Type: g4dn.2xlarge)

Description

I started training SteganoGAN with mscoco data set using above instance specification.To complete an epoch it takes almost 3 hrs 30 min. What is the standard time taken to train with this data set and suggested instance type to increase the performance?

What I Did

Tried with 4 different GPU instance types

  1. g4dn.2xlarge(1 epoch-3 hrs 30 minutes)
  2. g4dn.8xlarge(1 epoch-3 hrs 30 minutes)
  3. p2.xlarge(1 epcoh-more than 4 hrs)
  4. g3.8xlarge(1 epoch-3 hrs 30 minutes)

SteganoGAN

@k15z
Copy link
Contributor

k15z commented Jul 8, 2020

That running time doesn't seem too unreasonable to me. It looks like you're running it through the Python API, just to confirm, you are setting cuda=True when you create the SteganoGAN instance?

@SINDHUN97
Copy link
Author

Hi @k15z ,
Yes I am setting cuda=True in the API
SteganoGAN(1, BasicEncoder, BasicDecoder, BasicCritic, hidden_size=32, cuda=True, verbose=True)

@k15z
Copy link
Contributor

k15z commented Jul 8, 2020

Ok. The running time seems reasonable to me, we ran our original experiments on p2.8xlarge instances and I believe it took around 2 hours per epoch.

@kveerama
Copy link
Contributor

kveerama commented Jul 8, 2020

@k15z how many number of epochs are needed to train the model? Also @udvattam reported 1.5 hours on the same dataset. Is there a recommendation on the batch_size?

@udvattam
Copy link

udvattam commented Jul 8, 2020

@kveerama I trained on a modified version of the dataset, using only half of the data, which is why I think it was faster. However, I experimented with batch size for training and validation, and the time reduced as I increased the batch size even up to a size of 100. The time decrease plateud around a batch size of 32 for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants