You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I met the same problem after epoch 15. (pytorch1.0, python 3.6.3, my own data, 4 gpus)
through reading previous problems and solutions, I have a guess(uncertain) , the problem is in the dataset.py line53:
def get_different_scale(self):
if self.seen < 4000self.batch_size:
wh = 1332 # 416
elif self.seen < 8000*self.batch_size:
wh = (random.randint(0,3) + 13)32 # 416, 480
elif self.seen < 12000self.batch_size:
wh = (random.randint(0,5) + 12)*32 # 384, ..., 544
.....
so maybe we get different shape in the same batch(dataset.py line 14):
def custom_collate(batch):
data = torch.stack([item[0] for item in batch], 0)
[X,X,416,X] and [X,X,317,X]
although shape transfer happended after self.seen < xx*self.batch_size, maybe the errror due to multi-gpu?
I just have this guess, but I don't know how to solve it, I found there are many people have same question, maybe the problem is important, looking forward to your reply~
The text was updated successfully, but these errors were encountered:
sgflower66
changed the title
I met the same problem after epoch 15. (pytorch1.0, python 3.6.3, my own data, 4 gpus)
About def get_different_scale(self) and RuntimeError: .....size of tensor must match
Aug 29, 2019
Now, I set the sample amount to be an integer multiple of batchsize, and it seems working well (it has been worked until now for more than10 epochs, rather than happened error in 5 epochs )
I hope this way can solve this problem. N
in my case, the problem disappeared when I didn't use savemodel() function. I suppose that the problem appears after cur_model.save_weights(). also in my case i have train dataset that len(train_dataset)%batch_size is 0
I met the same problem after epoch 15. (pytorch1.0, python 3.6.3, my own data, 4 gpus)
through reading previous problems and solutions, I have a guess(uncertain) , the problem is in the dataset.py line53:
def get_different_scale(self):
if self.seen < 4000self.batch_size:
wh = 1332 # 416
elif self.seen < 8000*self.batch_size:
wh = (random.randint(0,3) + 13)32 # 416, 480
elif self.seen < 12000self.batch_size:
wh = (random.randint(0,5) + 12)*32 # 384, ..., 544
.....
so maybe we get different shape in the same batch(dataset.py line 14):
def custom_collate(batch):
data = torch.stack([item[0] for item in batch], 0)
[X,X,416,X] and [X,X,317,X]
although shape transfer happended after self.seen < xx*self.batch_size, maybe the errror due to multi-gpu?
I just have this guess, but I don't know how to solve it, I found there are many people have same question, maybe the problem is important, looking forward to your reply~
Originally posted by @sgflower66 in #55 (comment)
The text was updated successfully, but these errors were encountered: