Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to reduce GPU memory? #58

Open
ckcz123 opened this issue Oct 30, 2016 · 5 comments
Open

How to reduce GPU memory? #58

ckcz123 opened this issue Oct 30, 2016 · 5 comments

Comments

@ckcz123
Copy link

ckcz123 commented Oct 30, 2016

What a wonderful project! I have used it to solve some problems.
But there is one problem that always bothers me.

In one of the cases, I have to use rnn_size=512, num_layers=2, seq_length=1200.
Other arguments: batch_size=10, num_epochs=50, grad_clip=5.0, and so on.
But it will allocate 7.23GiB in GPU, which is only 8GB-free.
So I just wonder if I can reduce GPU memory to 7GiB or less. If so, I can run it on GPU.
rnn_size, num_layers, seq_length cannot be modified.

Here is some of the ouputs.

I tensorflow/core/common_runtime/bfc_allocator.cc:689] Summary of in-use Chunks by size:
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 22 Chunks of size 256 totalling 5.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 512 totalling 2.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 1280 totalling 1.2KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 7499 Chunks of size 2048 totalling 14.65MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1087 Chunks of size 4096 totalling 4.25MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 4608 totalling 4.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 6144 totalling 6.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 616 Chunks of size 8192 totalling 4.81MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 9984 totalling 9.8KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 4 Chunks of size 10240 totalling 40.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 2 Chunks of size 12288 totalling 24.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 303 Chunks of size 14336 totalling 4.14MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 5 Chunks of size 198656 totalling 970.0KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 208384 totalling 203.5KiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 919 Chunks of size 8388608 totalling 7.18GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 10775552 totalling 10.28MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:692] 1 Chunks of size 14428160 totalling 13.76MiB
I tensorflow/core/common_runtime/bfc_allocator.cc:696] Sum Total of in-use chunks: 7.23GiB
I tensorflow/core/common_runtime/bfc_allocator.cc:698] Stats:
Limit: 7967745639
InUse: 7764832256
MaxInUse: 7764842496
NumAllocs: 60834
MaxAllocSize: 14428160

W tensorflow/core/common_runtime/bfc_allocator.cc:270] ****************************************************************************************************
W tensorflow/core/common_runtime/bfc_allocator.cc:271] Ran out of memory trying to allocate 8.00MiB. See logs for memory state.
W tensorflow/core/framework/op_kernel.cc:968] Resource exhausted: OOM when allocating tensor with shape[1024,2048]
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 8.00G (8589934592 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY

Sorry for my poor English, and thanks a lot!

@ckcz123
Copy link
Author

ckcz123 commented Oct 30, 2016

And I wonder why it need 7.23GiB memory? Can anyone explain it?

@fujimotomh
Copy link

I think your seq length is very high. 1200 is quite long. Tensorflow has this issue with the way it creates these kinds of graphs using seq2seq see this issue. You may try to remake it using dynamic_rnn as they suggest in the comments. A quick fix might be to lower your batch size. Though it is already low so your loss may be noiser.

@ckcz123
Copy link
Author

ckcz123 commented Oct 31, 2016

@fujimotomh Thank you for your reply.
But I wonder how can I modify the code as I'm just new to tensorflow?

I just tried to use
outputs, last_state = tf.nn.rnn(cell, inputs, initial_state=self.initial_state, scope='rnnlm')
instead of
outputs, last_state = seq2seq.rnn_decoder(inputs, self.initial_state, cell, loop_function=loop if infer else None, scope='rnnlm')
, and it can work! (But I don't know if the result is correct).

But if I tried
outputs, last_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=self.initial_state, scope='rnnlm')
, it would throwValueError: Dimension must be 2 but is 3.

So how can I modify the code?

@fujimotomh
Copy link

@ckcz123 You almost have it. dynamic_rnn takes the input as a tensor and not a list. This works on my laptop with a seq_length of 1200.

outputs, last_state = tf.nn.dynamic_rnn(cell, tf.nn.embedding_lookup(embedding, self.input_data), initial_state=self.initial_state, scope='rnnlm')

To confirm correctness, I think the best thing to do would be to run it with default parameters and see if you can get low loss on the training set. I would suspect this would work though as rnn_decoder and dynamic_rnn claim have the same function.

@ckcz123
Copy link
Author

ckcz123 commented Nov 1, 2016

@fujimotomh Oh, it works! Only 1.1G usage of GPU memory!
Thanks for your advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants