GPU-Training supported? #9

songpu2015617 · 2018-11-13T01:46:19Z

Dear Authors:

I want to know whether this code support GPU trainning? I tried install tensorflow-gpu with latest version and also 0.12.1 version. I will get errors tensor shapes doesn't match. I want to know whether you got the same error and how to fix it? Thanks

Bartzi · 2018-11-13T09:08:26Z

Well, our code should put your data automatically on your GPU....
What kind of errors do you get? I can not help you, if you are not providing the exact error you are encountering ;)

songpu2015617 · 2018-11-20T02:16:13Z

Hi, Bartzi:
Thanks for your reply. I installed tensorflow-gpu==1.12.0, I got the following error:
Logging to logs/2018-11-19-17-25-30
Traceback (most recent call last):
File "train.py", line 88, in
model_file_name = train(cli_args, log_dir)
File "train.py", line 38, in train
model = model_class.create_model(train_data_generator.get_input_shape(), config)
File "/home/pu.song/Documents/ASRDev/LID/crnn-lid/keras/models/topcoder_crnn_finetune.py", line 54, in create_model
model.add(Bidirectional(LSTM(512, return_sequences=False), merge_mode="concat"))
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/models.py", line 324, in add
output_tensor = layer(self.outputs[0])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/engine/topology.py", line 491, in call
self.build(input_shapes[0])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/wrappers.py", line 218, in build
self.forward_layer.build(input_shape)
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/recurrent.py", line 733, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 753, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1122, in concat
tensor_shape.scalar())
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (4, 256, 512) and () are incompatible
When I install tensorflow-gpu==0.12.1 I got the following errors:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:03:00.0
Total memory: 10.91GiB
Free memory: 9.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0)
WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:517 in _set_model.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:521 in _set_model.: init (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
Epoch 1/50
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
It seems your keras code will eat up all the GPU memory very quickly. Thanks.

Bartzi · 2018-11-20T11:13:44Z

Our code is not eating all the available memory that is a problem of tensorflow, as tensorflow always allocates all available memory...

Let's have a look at your problems:

Tensorflow 1.12.0: it seems that the data loader does not supplt the correct data format... are you using the correct data?
tensorflow 0.12.1: You have a newer CuDNN library installed than expected by the library, the program tells you this:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

So you'll either have to compile the old tensorflow version by yourself or install a different version of CuDNN, or use a more modern version.

songpu2015617 · 2018-11-30T22:57:35Z

Thank you. I got it worked on my machine.

nikhil031294 · 2019-12-11T10:59:14Z

@songpu2015617
Can you please tell me how did it work in your machine?
Also, what are your cuDNN and CUDA versions?

Thanks

Arafat4341 · 2020-07-14T08:52:23Z

Hello everyone!
I am using google colab for training. I enabled GPU but the GPU is not utilized. I get message from colab:
You are not utilizing GPU runtime, please switch to standard runtime

How can I make this code utilize GPU of colab?!

bytosaur · 2020-08-14T08:32:23Z

@nikhil031294

I used Ubuntu 16.04
disabled the nouveau driver and used the shipped NVIDIA driver (384.130)
installed cuda 8.0 via runfile (https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html) but did not update the driver
then downloaded cuDNN 5.1 for CUDA 8.0 (https://developer.nvidia.com/rdp/cudnn-archive) and moved it to /usr/local/cuda-8.0/lib64 and the header to /usr/local/cuda-8.0/include
set the paths:
-- $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
-- $ export PATH=/usr/local/cuda-8.0/bin:$PATH
cloned the repo and replaced tensorflow==0.12.1 with tensorflow-gpu==0.12.1 in requirements.txt before installing

you might want to look in here:(https://chromium.googlesource.com/external/github.com/tensorflow/tensorflow/+/refs/heads/r0.12/tensorflow/g3doc/get_started/os_setup.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU-Training supported? #9

GPU-Training supported? #9

songpu2015617 commented Nov 13, 2018

Bartzi commented Nov 13, 2018

songpu2015617 commented Nov 20, 2018

Bartzi commented Nov 20, 2018

songpu2015617 commented Nov 30, 2018

nikhil031294 commented Dec 11, 2019

Arafat4341 commented Jul 14, 2020

bytosaur commented Aug 14, 2020 •

edited

Loading

GPU-Training supported? #9

GPU-Training supported? #9

Comments

songpu2015617 commented Nov 13, 2018

Bartzi commented Nov 13, 2018

songpu2015617 commented Nov 20, 2018

Bartzi commented Nov 20, 2018

songpu2015617 commented Nov 30, 2018

nikhil031294 commented Dec 11, 2019

Arafat4341 commented Jul 14, 2020

bytosaur commented Aug 14, 2020 • edited Loading

bytosaur commented Aug 14, 2020 •

edited

Loading