Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU-Training supported? #9

Open
songpu2015617 opened this issue Nov 13, 2018 · 7 comments
Open

GPU-Training supported? #9

songpu2015617 opened this issue Nov 13, 2018 · 7 comments

Comments

@songpu2015617
Copy link

Dear Authors:

I want to know whether this code support GPU trainning? I tried install tensorflow-gpu with latest version and also 0.12.1 version. I will get errors tensor shapes doesn't match. I want to know whether you got the same error and how to fix it? Thanks

@Bartzi
Copy link
Member

Bartzi commented Nov 13, 2018

Well, our code should put your data automatically on your GPU....
What kind of errors do you get? I can not help you, if you are not providing the exact error you are encountering ;)

@songpu2015617
Copy link
Author

Hi, Bartzi:
Thanks for your reply. I installed tensorflow-gpu==1.12.0, I got the following error:
Logging to logs/2018-11-19-17-25-30
Traceback (most recent call last):
File "train.py", line 88, in
model_file_name = train(cli_args, log_dir)
File "train.py", line 38, in train
model = model_class.create_model(train_data_generator.get_input_shape(), config)
File "/home/pu.song/Documents/ASRDev/LID/crnn-lid/keras/models/topcoder_crnn_finetune.py", line 54, in create_model
model.add(Bidirectional(LSTM(512, return_sequences=False), merge_mode="concat"))
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/models.py", line 324, in add
output_tensor = layer(self.outputs[0])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/engine/topology.py", line 491, in call
self.build(input_shapes[0])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/wrappers.py", line 218, in build
self.forward_layer.build(input_shape)
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/layers/recurrent.py", line 733, in build
self.W = K.concatenate([self.W_i, self.W_f, self.W_c, self.W_o])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 753, in concatenate
return tf.concat(axis, [to_dense(x) for x in tensors])
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1122, in concat
tensor_shape.scalar())
File "/home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/tensorflow/python/framework/tensor_shape.py", line 848, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (4, 256, 512) and () are incompatible
When I install tensorflow-gpu==0.12.1 I got the following errors:
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.6705
pciBusID 0000:03:00.0
Total memory: 10.91GiB
Free memory: 9.81GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0)
WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:517 in _set_model.: merge_all_summaries (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.merge_all.
WARNING:tensorflow:From /home/pu.song/anaconda2/envs/LID/lib/python2.7/site-packages/keras/callbacks.py:521 in _set_model.: init (from tensorflow.python.training.summary_io) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.FileWriter. The interface and behavior is the same; this is just a rename.
Epoch 1/50
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
Aborted (core dumped)
It seems your keras code will eat up all the GPU memory very quickly. Thanks.

@Bartzi
Copy link
Member

Bartzi commented Nov 20, 2018

Our code is not eating all the available memory that is a problem of tensorflow, as tensorflow always allocates all available memory...

Let's have a look at your problems:

  • Tensorflow 1.12.0: it seems that the data loader does not supplt the correct data format... are you using the correct data?
  • tensorflow 0.12.1: You have a newer CuDNN library installed than expected by the library, the program tells you this:
E tensorflow/stream_executor/cuda/cuda_dnn.cc:378] Loaded runtime CuDNN library: 6021 (compatibility version 6000) but source was compiled with 5105 (compatibility version 5100). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.

So you'll either have to compile the old tensorflow version by yourself or install a different version of CuDNN, or use a more modern version.

@songpu2015617
Copy link
Author

Thank you. I got it worked on my machine.

@nikhil031294
Copy link

@songpu2015617
Can you please tell me how did it work in your machine?
Also, what are your cuDNN and CUDA versions?

Thanks

@Arafat4341
Copy link

Hello everyone!
I am using google colab for training. I enabled GPU but the GPU is not utilized. I get message from colab:
You are not utilizing GPU runtime, please switch to standard runtime

How can I make this code utilize GPU of colab?!

@bytosaur
Copy link

bytosaur commented Aug 14, 2020

@nikhil031294

  • I used Ubuntu 16.04
  • disabled the nouveau driver and used the shipped NVIDIA driver (384.130)
  • installed cuda 8.0 via runfile (https://docs.nvidia.com/cuda/archive/8.0/cuda-installation-guide-linux/index.html) but did not update the driver
  • then downloaded cuDNN 5.1 for CUDA 8.0 (https://developer.nvidia.com/rdp/cudnn-archive) and moved it to /usr/local/cuda-8.0/lib64 and the header to /usr/local/cuda-8.0/include
  • set the paths:
    -- $ export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
    -- $ export PATH=/usr/local/cuda-8.0/bin:$PATH
  • cloned the repo and replaced tensorflow==0.12.1 with tensorflow-gpu==0.12.1 in requirements.txt before installing

you might want to look in here:(https://chromium.googlesource.com/external/github.com/tensorflow/tensorflow/+/refs/heads/r0.12/tensorflow/g3doc/get_started/os_setup.md)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants