Pre-trained conv weight is not same as that self.conv.weight #108

aleemsidra · 2023-08-04T09:34:01Z

Hi! I am trying to fine tune Conv2d with LoRA. I first loaded the pre-trained model weights. Below is the snippet of original Conv2d in the model and respective weight of first matrix:

 model
Out[1]: 
UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
  )
)

In [2]: model.init_path[0].weight[0][0]
Out[2]: 
tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

Now when I replaced Conv2d with ConvLoRA (tried both manualy replacing and dynamically replacing the layer), the model architecture got updated as follows, but strangely, the weights of model.init_path[0].conv.weight[0][0] is not same as of original pre-trained weights:

 model
Out[1]: 
UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    )
  )
)

In [2]: model.init_path[0].conv.weight[0][0]
Out[2]: 
tensor([[-0.0576,  0.0696,  0.1721],
        [ 0.2691,  0.3037, -0.2643],
        [ 0.0839, -0.1434, -0.0365]])

Why the conv weights is different than what was in original model?

The text was updated successfully, but these errors were encountered:

edwardjhu · 2023-08-05T17:03:34Z

Thanks for sharing this. It might have something to do with how you load the checkpoint. Can you provide a minimal example where this happens?

aleemsidra · 2023-08-09T10:10:39Z

@edwardjhu , I am first loading the model as:

model.load_state_dict(torch.load( "/home/sidra/Documents/Domain_Apatation/UDAS/src/checkpoints/base_model_mms_2023-07-06_12-45-28_PM/dc_model.pth"), strict=False).

Below is the structure of a part of loaded model:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        )
      )
    )
  )
)

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

# Replacing Conv layers with LoRa layer 

for name, sub_module in model.named_children():
    for name, layer in list(sub_module.named_children()): 
        #Conv2d
        if isinstance(layer, nn.Conv2d):
            setattr(sub_module, name, lora.Conv2d(
            layer.in_channels,
            layer.out_channels,
            kernel_size=layer.kernel_size[0],
            r=2,
            lora_alpha=2))


        # ResBlock
        elif isinstance(sub_module, nn.Sequential):
            for name, layer in list(sub_module.named_children()):
                if isinstance(layer, ResBlock):
                        for i, preactivation_module in enumerate(layer.conv_path):
                            if isinstance(preactivation_module, PreActivationND) and isinstance(preactivation_module.layer, nn.Conv2d):
                                setattr(preactivation_module, 'layer', lora.Conv2d(
                                    preactivation_module.layer.in_channels,
                                    preactivation_module.layer.out_channels,
                                    kernel_size=preactivation_module.layer.kernel_size[0],
                                    r=2,
                                    lora_alpha=2))

The updated model structure looks like this:

UNet2D(
  (init_path): Sequential(
    (0): Conv2d(
      (conv): Conv2d(1, 16, kernel_size=(3, 3), stride=(1, 1))
    )
    (1): ReLU()
    (2): ResBlock(
      (conv_path): Sequential(
        (0): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
        (1): PreActivationND(
          (bn): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (activation): ReLU()
          (layer): Conv2d(
            (conv): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1))
          )
        )
      )
    )
  )
)

Then I checked the weather lora matrices have been injected correctly by checking param names as:

for name, param in model.named_parameters():
      print(name)
init_path.0.lora_A
init_path.0.lora_B
init_path.0.conv.weight
init_path.0.conv.bias
init_path.2.conv_path.0.bn.weight
init_path.2.conv_path.0.bn.bias
init_path.2.conv_path.0.layer.lora_A
init_path.2.conv_path.0.layer.lora_B
init_path.2.conv_path.0.layer.conv.weight
init_path.2.conv_path.0.layer.conv.bias
init_path.2.conv_path.1.bn.weight
init_path.2.conv_path.1.bn.bias
init_path.2.conv_path.1.layer.lora_A
init_path.2.conv_path.1.layer.lora_B
init_path.2.conv_path.1.layer.conv.weight
init_path.2.conv_path.1.layer.conv.bias

Which shows that lora layers have been correctly added. But when I check the weights of conv layers in pre-trained model and one after injecting LoRa layers its not same:

# Pre-trained model
 model.init_path[0].weight[0][0]

tensor([[ 0.2988,  0.2760, -0.0493],
        [ 0.3431, -0.0962,  0.0716],
        [-0.1536,  0.1956,  0.2885]], grad_fn=<SelectBackward0>)

# with LoRa
model.init_path[0].conv.weight[0][0]
tensor([[ 0.1168,  0.0223, -0.1227],
        [-0.2735, -0.2281, -0.2859],
        [ 0.2369, -0.1391, -0.0499]])

Moreoevr, in my original Conv2D, bias is set to False, but when I checked model.init_path[0].conv.bias it gives:

Parameter containing:
tensor([-0.1540,  0.0532, -0.0386, -0.0889, -0.1558,  0.0867, -0.2746,  0.3279,
        -0.0516,  0.0622,  0.1098, -0.1297,  0.2631, -0.0025,  0.0273, -0.3173],
       requires_grad=True)

The requires_grad is also True, but in pre-trained conv layer, bias was False, so from where these values are coming?

Can you please give your feedback on this?

edwardjhu · 2023-08-19T17:04:55Z

After loading the model, I am replacing the Conv2d isntances in nn.sequential and in ResBlock as:

This seems to be the problem. If you manually replace the layers after loading the ckpt, these layers will be rewritten. These layers also have biases because you didn't pass bias=False to the constructor.

Can you try modifying the architecture and then loading the ckpt?

fardinayar mentioned this issue Oct 12, 2024

Resolve the problem with conv weights #183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-trained conv weight is not same as that self.conv.weight #108

Pre-trained conv weight is not same as that self.conv.weight #108

aleemsidra commented Aug 4, 2023

edwardjhu commented Aug 5, 2023

aleemsidra commented Aug 9, 2023 •

edited

Loading

edwardjhu commented Aug 19, 2023

Pre-trained conv weight is not same as that self.conv.weight #108

Pre-trained conv weight is not same as that self.conv.weight #108

Comments

aleemsidra commented Aug 4, 2023

edwardjhu commented Aug 5, 2023

aleemsidra commented Aug 9, 2023 • edited Loading

edwardjhu commented Aug 19, 2023

aleemsidra commented Aug 9, 2023 •

edited

Loading