Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not reproduce stably from ckpt #730

Open
ChrisDong-THU opened this issue Dec 2, 2024 · 0 comments
Open

Can not reproduce stably from ckpt #730

ChrisDong-THU opened this issue Dec 2, 2024 · 0 comments

Comments

@ChrisDong-THU
Copy link

When I was using the SubMConv2d module, I found that the ckpt saved during training could not reproduce its performance after loading it during evaluation. Here is my testing code:

if __name__ == "__main__":
    bs = 4
    num_gs = 256
    in_channels = 64
    out_channels = 64
    kernel_size = 5
    fm_shape = (32, 32)
    use_out_proj = False
    
    load = True
    
    import pytorch_lightning as pl
    pl.seed_everything(0)
    torch.set_float32_matmul_precision("high")
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    
    spconv2d = SparseConv(in_channels, out_channels, kernel_size, fm_shape, use_out_proj)
    
    if load:
        sd = torch.load('./tmp/spconv2d.pth')
        spconv2d.load_state_dict(sd)
    else:
        torch.save(spconv2d.state_dict(), './tmp/spconv2d.pth')
    
    spconv2d = spconv2d.cuda()
    spconv2d = spconv2d.eval()
    
    instance_feature = torch.randn(bs, num_gs, in_channels).cuda()
    anchor = torch.randn(bs, num_gs, 5 + in_channels).cuda()
    
    with torch.no_grad():
        output = spconv2d(instance_feature, anchor)

SubMConv2d is a layer of SparseConv, whose forward func I've carefully checked for debugging:

input = SparseConvTensor(instance_feature.flatten(0, 1), indices=batch_indices, spatial_shape=spatial_shape, batch_size=b)
output = self.conv(input)

The input stayed the same, but output varied between two possible outcomes. You can easily see this by running the above demo. Looking forward to a solution.

@ChrisDong-THU ChrisDong-THU changed the title Can not reproduce stably from ckpt Can not reproduce stably from .ckpt Dec 2, 2024
@ChrisDong-THU ChrisDong-THU changed the title Can not reproduce stably from .ckpt Can not reproduce stably from ckpt Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant