You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I came to post this question here as the NVlab stylegan and stylegan2 projects provide minimal instruction about training and don't allow creating issues. In the README, the author listed the expected run time given the number of GPU (1,2,4 or 8), and the number of images for certain resolution. (DGX-1 box has 8 Tesla V100 and 32GB each).
Configuration
Resolution
Total kimg
1 GPU
2 GPUs
4 GPUs
8 GPUs
GPU mem
config-f
1024×1024
25000
69d 23h
36d 4h
18d 14h
9d 18h
13.3 GB
config-f
1024×1024
10000
27d 23h
14d 11h
7d 10h
3d 22h
13.3 GB
config-e
1024×1024
25000
35d 11h
18d 15h
9d 15h
5d 6h
8.6 GB
config-e
1024×1024
10000
14d 4h
7d 11h
3d 20h
2d 3h
8.6 GB
config-f
256×256
25000
32d 13h
16d 23h
8d 21h
4d 18h
6.4 GB
config-f
256×256
10000
13d 0h
6d 19h
3d 13h
1d 22h
6.4 GB
Question: Is there a way to tune the parameters so that the GPU usage is fully maximized given the running host tech spec? If there isn't a magic flag like that, what are the key parameters that I should dial up or down given my training host technical specification?
On one extreme:
As each Tesla GPU has 32GB memory and the training only uses 6.4GB out of 32GB, at the same time, for DGX2 they have doubled the GPU counts to be 16 instead of 8, then the usage is only 8 * 6.4 / (16 * 32) = 10%. If we can tweak something like the minibatch size or something else, does that mean we can cut the training time from 13 days to 2 days?
On the other extreme:
I might only have two small gaming GPUs that each has 6GB GPU memory, then it might require a different batch size which all benchmarks above require a memory usage greater than 6GB.
By looking at the stylegan2 run_trainning.py, the closest parameter that I found is --total-kimg and -num-gpus, maybe --config too.
parser.add_argument('--num-gpus', help='Number of GPUs (default: %(default)s)', default=1, type=int, metavar='N')
parser.add_argument('--total-kimg', help='Training length in thousands of images (default: %(default)s)', metavar='KIMG', default=25000, type=int)
But --total-kimg feels like the total number of the tfrecords you want for training the length rather than width.
By looking into training_loop.py, there are another 50 parameters like minibatch_size_base=32 and minibatch_gpu_base=4 and others which I believe directly impacted the throughput of traning which I don't fully understand which knob should I turn.
Thought?
The text was updated successfully, but these errors were encountered:
I came to post this question here as the NVlab stylegan and stylegan2 projects provide minimal instruction about training and don't allow creating issues. In the README, the author listed the expected run time given the number of GPU (1,2,4 or 8), and the number of images for certain resolution. (DGX-1 box has 8 Tesla V100 and 32GB each).
Question: Is there a way to tune the parameters so that the GPU usage is fully maximized given the running host tech spec? If there isn't a magic flag like that, what are the key parameters that I should dial up or down given my training host technical specification?
On one extreme:
As each Tesla GPU has 32GB memory and the training only uses 6.4GB out of 32GB, at the same time, for DGX2 they have doubled the GPU counts to be 16 instead of 8, then the usage is only 8 * 6.4 / (16 * 32) = 10%. If we can tweak something like the minibatch size or something else, does that mean we can cut the training time from 13 days to 2 days?
On the other extreme:
I might only have two small gaming GPUs that each has 6GB GPU memory, then it might require a different batch size which all benchmarks above require a memory usage greater than 6GB.
By looking at the stylegan2 run_trainning.py, the closest parameter that I found is
--total-kimg
and-num-gpus
, maybe--config
too.But
--total-kimg
feels like the total number of the tfrecords you want for training the length rather than width.By looking into training_loop.py, there are another 50 parameters like
minibatch_size_base=32
andminibatch_gpu_base=4
and others which I believe directly impacted the throughput of traning which I don't fully understand which knob should I turn.Thought?
The text was updated successfully, but these errors were encountered: