Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About def get_different_scale(self) and RuntimeError: .....size of tensor must match #81

Closed
sgflower66 opened this issue Aug 29, 2019 · 3 comments

Comments

@sgflower66
Copy link

sgflower66 commented Aug 29, 2019

I met the same problem after epoch 15. (pytorch1.0, python 3.6.3, my own data, 4 gpus)
image

through reading previous problems and solutions, I have a guess(uncertain) , the problem is in the dataset.py line53:
def get_different_scale(self):
if self.seen < 4000self.batch_size:
wh = 13
32 # 416
elif self.seen < 8000*self.batch_size:
wh = (random.randint(0,3) + 13)32 # 416, 480
elif self.seen < 12000
self.batch_size:
wh = (random.randint(0,5) + 12)*32 # 384, ..., 544
.....
so maybe we get different shape in the same batch(dataset.py line 14):
def custom_collate(batch):
data = torch.stack([item[0] for item in batch], 0)
[X,X,416,X] and [X,X,317,X]

although shape transfer happended after self.seen < xx*self.batch_size, maybe the errror due to multi-gpu?
I just have this guess, but I don't know how to solve it, I found there are many people have same question, maybe the problem is important, looking forward to your reply~

Originally posted by @sgflower66 in #55 (comment)

@sgflower66 sgflower66 changed the title I met the same problem after epoch 15. (pytorch1.0, python 3.6.3, my own data, 4 gpus) About def get_different_scale(self) and RuntimeError: .....size of tensor must match Aug 29, 2019
@sgflower66
Copy link
Author

Now, I set the sample amount to be an integer multiple of batchsize, and it seems working well (it has been worked until now for more than10 epochs, rather than happened error in 5 epochs )
I hope this way can solve this problem. N

@Ginbor
Copy link

Ginbor commented Oct 18, 2019

Helped set the sample amount to be an integer multiple of batchsize + reset model.seen parameter

@Ginbor
Copy link

Ginbor commented Nov 2, 2019

in my case, the problem disappeared when I didn't use savemodel() function. I suppose that the problem appears after cur_model.save_weights(). also in my case i have train dataset that len(train_dataset)%batch_size is 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants