Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-trained conv weight is not same as that self.conv.weight #108

Open
aleemsidra opened this issue Aug 4, 2023 · 3 comments
Open

Pre-trained conv weight is not same as that self.conv.weight #108

aleemsidra opened this issue Aug 4, 2023 · 3 comments

Comments

@aleemsidra
Copy link

Hi! I am trying to fine tune Conv2d with LoRA. I first loaded the pre-trained model weights. Below is the snippet of original Conv2d in the model and respective weight of first matrix:

 model
Out[1]: 
UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  )
)

In [2]: model.init_path[0].weight[0][0]
Out[2]: 
tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

Now when I replaced Conv2d with ConvLoRA (tried both manualy replacing and dynamically replacing the layer), the model architecture got updated as follows, but strangely, the weights of model.init_path[0].conv.weight[0][0] is not same as of original pre-trained weights:

 model
Out[1]: 
UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    )
  )
)

In [2]: model.init_path[0].conv.weight[0][0]
Out[2]: 
tensor([[-0.0576,  0.0696,  0.1721],
        [ 0.2691,  0.3037, -0.2643],
        [ 0.0839, -0.1434, -0.0365]])

Why the conv weights is different than what was in original model?

@edwardjhu
Copy link
Collaborator

Thanks for sharing this. It might have something to do with how you load the checkpoint. Can you provide a minimal example where this happens?

@aleemsidra
Copy link
Author

aleemsidra commented Aug 9, 2023

@edwardjhu , I am first loading the model as:

model.load_state_dict(torch.load( "/home/sidra/Documents/Domain_Apatation/UDAS/src/checkpoints/base_model_mms_2023-07-06_12-45-28_PM/dc_model.pth"), strict=False).

Below is the structure of a part of loaded model:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
      )
    )
  )
)

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

# Replacing Conv layers with LoRa layer 

for name, sub_module in model.named_children():
    for name, layer in list(sub_module.named_children()): 
        #Conv2d
        if isinstance(layer, nn.Conv2d):
            setattr(sub_module, name, lora.Conv2d(
            layer.in_channels,
            layer.out_channels,
            kernel_size=layer.kernel_size[0],
            r=2,
            lora_alpha=2))


        # ResBlock
        elif isinstance(sub_module, nn.Sequential):
            for name, layer in list(sub_module.named_children()):
                if isinstance(layer, ResBlock):
                        for i, preactivation_module in enumerate(layer.conv_path):
                            if isinstance(preactivation_module, PreActivationND) and isinstance(preactivation_module.layer, nn.Conv2d):
                                setattr(preactivation_module, 'layer', lora.Conv2d(
                                    preactivation_module.layer.in_channels,
                                    preactivation_module.layer.out_channels,
                                    kernel_size=preactivation_module.layer.kernel_size[0],
                                    r=2,
                                    lora_alpha=2))

The updated model structure looks like this:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
    )
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
      )
    )
  )
)

Then I checked the weather lora matrices have been injected correctly by checking param names as:

for name, param in model.named_parameters():
      print(name)
init_path.0.lora_A
init_path.0.lora_B
init_path.0.conv.weight
init_path.0.conv.bias
init_path.2.conv_path.0.bn.weight
init_path.2.conv_path.0.bn.bias
init_path.2.conv_path.0.layer.lora_A
init_path.2.conv_path.0.layer.lora_B
init_path.2.conv_path.0.layer.conv.weight
init_path.2.conv_path.0.layer.conv.bias
init_path.2.conv_path.1.bn.weight
init_path.2.conv_path.1.bn.bias
init_path.2.conv_path.1.layer.lora_A
init_path.2.conv_path.1.layer.lora_B
init_path.2.conv_path.1.layer.conv.weight
init_path.2.conv_path.1.layer.conv.bias

Which shows that lora layers have been correctly added. But when I check the weights of conv layers in pre-trained model and one after injecting LoRa layers its not same:

# Pre-trained model
 model.init_path[0].weight[0][0]

tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

# with LoRa
model.init_path[0].conv.weight[0][0]
tensor([[ 0.1168,  0.0223, -0.1227],
        [-0.2735, -0.2281, -0.2859],
        [ 0.2369, -0.1391, -0.0499]])

Moreoevr, in my original Conv2D, bias is set to False, but when I checked model.init_path[0].conv.bias it gives:

Parameter containing:
tensor([-0.1540,  0.0532, -0.0386, -0.0889, -0.1558,  0.0867, -0.2746,  0.3279,
        -0.0516,  0.0622,  0.1098, -0.1297,  0.2631, -0.0025,  0.0273, -0.3173],
       requires_grad=True)

The requires_grad is also True, but in pre-trained conv layer, bias was False, so from where these values are coming?

Can you please give your feedback on this?

@edwardjhu
Copy link
Collaborator

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

This seems to be the problem. If you manually replace the layers after loading the ckpt, these layers will be rewritten. These layers also have biases because you didn't pass bias=False to the constructor.

Can you try modifying the architecture and then loading the ckpt?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants