How to reduce GPU memory? #58

ckcz123 · 2016-10-30T05:07:36Z

What a wonderful project! I have used it to solve some problems.
But there is one problem that always bothers me.

In one of the cases, I have to use rnn_size=512, num_layers=2, seq_length=1200.
Other arguments: batch_size=10, num_epochs=50, grad_clip=5.0, and so on.
But it will allocate 7.23GiB in GPU, which is only 8GB-free.
So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU.
rnn_size, num_layers, seq_length cannot be modified.

Here is some of the ouputs.

I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 7967745639
InUse: 7764832256
MaxInUse: 7764842496
NumAllocs: 60834
MaxAllocSize: 14428160

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048]
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Sorry for my poor English, and thanks a lot!

The text was updated successfully, but these errors were encountered:

ckcz123 · 2016-10-30T05:12:41Z

And I wonder why it need 7.23GiB memory? Can anyone explain it?

fujimotomh · 2016-10-31T04:02:30Z

I think your seq length is very high. 1200 is quite long. Tensorflow has this issue with the way it creates these kinds of graphs using seq2seq see this issue. You may try to remake it using dynamic_rnn as they suggest in the comments. A quick fix might be to lower your batch size. Though it is already low so your loss may be noiser.

ckcz123 · 2016-10-31T13:48:57Z

@fujimotomh Thank you for your reply.
But I wonder how can I modify the code as I'm just new to tensorflow?

I just tried to use
outputs, last_state = tf.nn.rnn(cell, inputs, initial_state=self.initial_state, scope='rnnlm')
instead of
outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm')
, and it can work! (But I don't know if the result is correct).

But if I tried
outputs, last_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=self.initial_state, scope='rnnlm')
, it would throwValueError: Dimension must be 2 but is 3.

So how can I modify the code?

fujimotomh · 2016-10-31T18:35:46Z

@ckcz123 You almost have it. dynamic_rnn takes the input as a tensor and not a list. This works on my laptop with a seq_length of 1200.

outputs, last_state = tf.nn.dynamic_rnn(cell, tf.nn.embedding_lookup(embedding, self.input_data), initial_state=self.initial_state, scope='rnnlm')

To confirm correctness, I think the best thing to do would be to run it with default parameters and see if you can get low loss on the training set. I would suspect this would work though as rnn_decoder and dynamic_rnn claim have the same function.

ckcz123 · 2016-11-01T02:08:56Z

@fujimotomh Oh, it works! Only 1.1G usage of GPU memory!
Thanks for your advice!

rpryzant mentioned this issue Jan 6, 2017

Bug in error message from dynamic_rnn tensorflow/tensorflow#6308

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce GPU memory? #58

How to reduce GPU memory? #58

ckcz123 commented Oct 30, 2016 •

edited

Loading

ckcz123 commented Oct 30, 2016

fujimotomh commented Oct 31, 2016

ckcz123 commented Oct 31, 2016 •

edited

Loading

fujimotomh commented Oct 31, 2016

ckcz123 commented Nov 1, 2016

How to reduce GPU memory? #58

How to reduce GPU memory? #58

Comments

ckcz123 commented Oct 30, 2016 • edited Loading

ckcz123 commented Oct 30, 2016

fujimotomh commented Oct 31, 2016

ckcz123 commented Oct 31, 2016 • edited Loading

fujimotomh commented Oct 31, 2016

ckcz123 commented Nov 1, 2016

ckcz123 commented Oct 30, 2016 •

edited

Loading

ckcz123 commented Oct 31, 2016 •

edited

Loading