RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. ...: 1333 #55

mrkieumy · 2019-03-18T16:09:07Z

Hi @andy-yun ,
I meet this error (the same with #33):
Traceback (most recent call last):
File "train.py", line 385, in
main()
File "train.py", line 160, in main
nsamples = train(epoch)
File "train.py", line 229, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 623, in next
return self._process_next_batch(batch)
File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in default_collate
return [default_collate(samples) for samples in transposed]
File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 232, in
return [default_collate(samples) for samples in transposed]
File "/home/kieumy/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 209, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 480 and 416 in dimension 2 at /pytorch/aten/src/TH/generic/THTensorMoreMath.cpp:1333

I think that the problem cause the get_difference_scale() method, because when I turn of it by setting shape = (img.width,img.height), the error has gone.
I set my image with and height is (544 x 480) because original size is 640x512 and w don't want to scale too small (416x416) so I used 544 x 480 (it still divides for 32)
Do you have any recommendation to fix this error?
Thanks & best regards.

andy-yun · 2019-03-19T11:10:37Z

@mrkieumy You can refer the same issue at https://github.com/marvis/pytorch-yolo2/issues/89

Here's the reason.
https://medium.com/@yvanscher/pytorch-tip-yielding-image-sizes-6a776eb4115b

The solution is set the batch_size=1 or in the get_different_scale() you should change the 64 to self.batch_size. (re-download dataset.py)

mrkieumy · 2019-03-25T14:00:39Z

Thanks @andy-yun. I set 64 to self.batch_size (re-downloaded the dataset file), but it's still error. If I set batch_size=1 that means dataloader load every 1 image and the network train with batch=1? Is that right or not? If it's, that not a good because we want to train with the largest possible batch_size.
Any help is appreciated.
Thanks & Best regards.

andy-yun · 2019-03-27T00:14:40Z

@mickolka yup. set batch_size=1 is recommended for test environment.
How many GPUs do you use? I wonder the different image sets are used together.

mrkieumy · 2019-03-27T10:44:32Z

Hi @andy-yun , I have only 1 GPU, for test step the batchsize is always 2 images, when I set 1, it's error. But btw, for training, we don't want to set batch_size=1, right? Because we want to train with as large batchsize as possible. But my GPU on can train V3 with batchsize = 8 is maximum (GTX 1080).
Now, I uncomment the line get_different_scale(), only train with the constant shape (544,480). But the result will be bad comparing to get difference scale. How can I get difference scale without set batchsize=1?
Thanks.

andy-yun · 2019-03-30T04:47:40Z

Hi @mrkieumy Would you change the following 64 to self.batch_size ?
57th line of dataset.py:
if index % 64 == 0:
-->
if index % (self.batch_size * 10) == 0:

After checking the above code, please report me. thanks.

mrkieumy · 2019-04-02T21:05:37Z

Hi @andy-yun ,
I changed everything as the same you said but it's still error. I also try the crop=True with those sizes but it still errors the same.
Do you know where is the problem? How can you train by the difference_scale without error?
I don't know what I understand correctly is that every 10*batch_size the shape will be random in the get_differnece_scale function (but the same width and height), and the data will load images with that shape. It supposed to be the same shape within the batch_size, in contrast, it raises the error difference dimension in the batch_size.
How to set every batch have the same shape?
Thanks.

andy-yun · 2019-04-02T23:30:54Z

@mrkieumy I don't know what the exact problem is. But, in my opinion, the codes are well working with other people thus I am doubting your dataset and environment. Cheers.

mrkieumy · 2019-04-03T09:06:04Z

@andy-yun ,
Thanks for your reply.
After printed the index I saw that the dataloader loaded images with shuffle so the index not in order. I recognize that self.seen increases in order, so I changed:
if index % (self.batch_size10) == 0:
-->
if self.seen % (self.batch_size10) ==0:
It has been worked until now for 20 epochs.
I hope that was the final case to solve this problem. I don't know it correct or not. I'll let you know if any other thing.

On other thing is that: in your repo, you should replace the line 425 and 427 in darknet.py:
save_fc(fc, model) --> save_fc(fp,model). Because fc was not declared, it must be fp (file).
because yolov3 doesn't have fully connected layer so nobody used it. But in my case, I add more some fully connected.
The new problem is that I can not save weight file of fully connected until now because it said that the fc doesn't have bias and weight properties in save_fc function in cfg.py file. I have been saved model first.
The last is that, can you help me to explain the #59
Thanks.

andy-yun · 2019-04-03T13:23:17Z

Thanks @mrkieumy I updated codes.

zhangguotai · 2019-05-09T10:53:08Z

my code has modified,but the question already still exist.
Traceback (most recent call last):
File "train.py", line 379, in
main()
File "train.py", line 156, in main
nsamples = train(epoch)
File "train.py", line 222, in train
for batch_idx, (data, target) in enumerate(train_loader):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 336, in next
return self._process_next_batch(batch)
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 357, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
RuntimeError: Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 106, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 187, in default_collate
return [default_collate(samples) for samples in transposed]
File "/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.py", line 164, in default_collate
return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 416 and 480 in dimension 2 at /pytorch/aten/src/TH/generic/THTensorMath.cpp:3616

I train voc dataset,the size of image is 416*416, batch_size = 8, the number of GPU is 1.
Do you have any recommendation to fix this error?

andy-yun · 2019-05-09T14:02:38Z

@zhangguotai I updated the code dataset.py and train.py.
Try them.
Refer to https://discuss.pytorch.org/t/runtimeerror-invalid-argument-0-sizes-of-tensors-must-match-except-in-dimension-0-got-3-and-2-in-dimension-1/23890/15

richard0326 · 2019-06-14T05:09:41Z

I have same problem and It seems I have downloaded the updated source code.
can you help me with this problem??
I'm having problem after epoch 15

sgflower66 · 2019-08-29T07:51:10Z

I met the same problem after epoch 15. (pytorch1.0, python 3.6.3, my own data, 4 gpus)

through reading previous problems and solutions, I guess the problem is in the dataset.py line53:
def get_different_scale(self):
if self.seen < 4000self.batch_size:
wh = 1332 # 416
elif self.seen < 8000*self.batch_size:
wh = (random.randint(0,3) + 13)32 # 416, 480
elif self.seen < 12000self.batch_size:
wh = (random.randint(0,5) + 12)*32 # 384, ..., 544
.....
so maybe we get different shape in the same batch(dataset.py line 14):
def custom_collate(batch):
data = torch.stack([item[0] for item in batch], 0)
[X,X,416,X] and [X,X,317,X]

although shape transfer happended after self.seen < xx*self.batch_size, maybe the errror due to multi-gpu?
I just have this guess， but I don't know how to solve it, I found there are many people have same question, maybe the problem is important, looking forward to your reply~

Ginbor · 2019-11-02T11:55:00Z

in my case, the problem disappeared when I didn't use savemodel() function. I suppose that the problem appears after cur_model.save_weights(). also in my case i have train dataset that len(train_dataset)%batch_size = 0

mrkieumy mentioned this issue Apr 3, 2019

training cnn after get_transforms() throws RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. ... fastai/fastai#1028

Closed

andy-yun mentioned this issue May 9, 2019

train error #65

Open

sgflower66 mentioned this issue Aug 29, 2019

About def get_different_scale(self) and RuntimeError: .....size of tensor must match #81

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. ...: 1333 #55

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. ...: 1333 #55

mrkieumy commented Mar 18, 2019

andy-yun commented Mar 19, 2019 •

edited

Loading

mrkieumy commented Mar 25, 2019 •

edited

Loading

andy-yun commented Mar 27, 2019

mrkieumy commented Mar 27, 2019

andy-yun commented Mar 30, 2019

mrkieumy commented Apr 2, 2019

andy-yun commented Apr 2, 2019

mrkieumy commented Apr 3, 2019

andy-yun commented Apr 3, 2019 •

edited

Loading

zhangguotai commented May 9, 2019

andy-yun commented May 9, 2019 •

edited

Loading

richard0326 commented Jun 14, 2019

sgflower66 commented Aug 29, 2019 •

edited

Loading

Ginbor commented Nov 2, 2019

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. ...: 1333 #55

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. ...: 1333 #55

Comments

mrkieumy commented Mar 18, 2019

andy-yun commented Mar 19, 2019 • edited Loading

mrkieumy commented Mar 25, 2019 • edited Loading

andy-yun commented Mar 27, 2019

mrkieumy commented Mar 27, 2019

andy-yun commented Mar 30, 2019

mrkieumy commented Apr 2, 2019

andy-yun commented Apr 2, 2019

mrkieumy commented Apr 3, 2019

andy-yun commented Apr 3, 2019 • edited Loading

zhangguotai commented May 9, 2019

andy-yun commented May 9, 2019 • edited Loading

richard0326 commented Jun 14, 2019

sgflower66 commented Aug 29, 2019 • edited Loading

Ginbor commented Nov 2, 2019

andy-yun commented Mar 19, 2019 •

edited

Loading

mrkieumy commented Mar 25, 2019 •

edited

Loading

andy-yun commented Apr 3, 2019 •

edited

Loading

andy-yun commented May 9, 2019 •

edited

Loading

sgflower66 commented Aug 29, 2019 •

edited

Loading