RuntimeError: CUDA error: invalid device ordinal #1

kazuma0606 · 2021-08-19T12:32:31Z

Hi,

In google Collaboratory, the following comment appears and train.py does not work.
----------------- End -------------------
Traceback (most recent call last):
File "train.py", line 15, in
opt = TrainOptions().parse()
File "/content/drive/My Drive/Colab Notebooks/3D_CycleGAN_Project/3D-CycleGan-Pytorch-MedImaging/options/base_options.py", line 109, in parse
torch.cuda.set_device(opt.gpu_ids[0])
File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 264, in set_device
torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Do you know of a better way?

Enviroments:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0

Cheers,
Hisanori Yoshimura
E-mail: [email protected]

davidiommi · 2021-08-19T13:53:08Z

You have to change the gpu ids in base options.

I was selecting '2,3', ( I have 4 gpus.).

You should have fewer GPUs available. Write '0,1' if you have 2 GPUs available

kazuma0606 · 2021-08-19T21:06:15Z

Sorry,

I take a "--gpu_ids '0,1'"
the following comment appears and train.py does not work.
----------------- End -------------------
lenght train list: 3
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py:481: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
cpuset_checked))
Traceback (most recent call last):
File "train.py", line 31, in
model = create_model(opt) # creation of the model
File "/content/drive/My Drive/Colab Notebooks/3D_CycleGAN_Project/3D-CycleGan-Pytorch-MedImaging/models/init.py", line 42, in create_model
instance.initialize(opt)
File "/content/drive/My Drive/Colab Notebooks/3D_CycleGAN_Project/3D-CycleGan-Pytorch-MedImaging/models/cycle_gan_model.py", line 88, in initialize
not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)
File "/content/drive/My Drive/Colab Notebooks/3D_CycleGAN_Project/3D-CycleGan-Pytorch-MedImaging/models/networks3D.py", line 93, in define_G
return init_net(net, init_type, init_gain, gpu_ids)
File "/content/drive/My Drive/Colab Notebooks/3D_CycleGAN_Project/3D-CycleGan-Pytorch-MedImaging/models/networks3D.py", line 70, in init_net
net = torch.nn.DataParallel(net, gpu_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 142, in init
_check_balance(self.device_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/parallel/data_parallel.py", line 23, in _check_balance
dev_props = _get_devices_properties(device_ids)
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 455, in _get_devices_properties
return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 455, in
return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 438, in _get_device_attr
return get_member(torch.cuda)
File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 455, in
return [_get_device_attr(lambda m: m.get_device_properties(i)) for i in device_ids]
File "/usr/local/lib/python3.7/dist-packages/torch/cuda/init.py", line 312, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

Do you know of a better way?

kazuma0606 · 2021-08-21T02:05:23Z

I was mistaken.
Training is now working after changing parameters.
--gpu_ids 0, --patch_size 64,64,32
However, I get the following message and test.py is not running.

0% 0/123 [00:00<?, ?it/s]
Traceback (most recent call last):
File "test.py", line 218, in

File "test.py", line 158, in inference
model.set_input(batch)
File "/content/drive/My Drive/Colab Notebooks/3D_CycleGAN_Project/3D-CycleGan-Pytorch-MedImaging/models/cycle_gan_model.py", line 118, in set_input
self.real_B = input[1 if AtoB else 0].to(self.device)
IndexError: index 1 is out of bounds for dimension 0 with size 1

Which parameters should I change?
Please let me know. Thank you very much for your help.

s-jafarpoor · 2022-09-23T10:21:43Z

hello
Thanks for sharing the code
I want to run on cpu and I set gpu_ids to -1 :

 python .\train.py  --gpu_ids -1

but I have the following error:

Traceback (most recent call last):
  File ".\train.py", line 15, in <module>
    opt = TrainOptions().parse()
  File "E:\cyclegan3d\3D-CycleGan-Pytorch-MedImaging\options\base_options.py", line 102, in parse
    str_ids.remove(',')
ValueError: list.remove(x): x not in list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA error: invalid device ordinal #1

RuntimeError: CUDA error: invalid device ordinal #1

kazuma0606 commented Aug 19, 2021 •

edited

Loading

davidiommi commented Aug 19, 2021

kazuma0606 commented Aug 19, 2021

kazuma0606 commented Aug 21, 2021

s-jafarpoor commented Sep 23, 2022 •

edited

Loading

RuntimeError: CUDA error: invalid device ordinal #1

RuntimeError: CUDA error: invalid device ordinal #1

Comments

kazuma0606 commented Aug 19, 2021 • edited Loading

davidiommi commented Aug 19, 2021

kazuma0606 commented Aug 19, 2021

kazuma0606 commented Aug 21, 2021

s-jafarpoor commented Sep 23, 2022 • edited Loading

kazuma0606 commented Aug 19, 2021 •

edited

Loading

s-jafarpoor commented Sep 23, 2022 •

edited

Loading