-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pre-trained conv weight is not same as that self.conv.weight #108
Comments
Thanks for sharing this. It might have something to do with how you load the checkpoint. Can you provide a minimal example where this happens? |
@edwardjhu , I am first loading the model as:
Below is the structure of a part of loaded model:
After loading the model, I am replacing the
The updated model structure looks like this:
Then I checked the weather lora matrices have been injected correctly by checking param names as:
Which shows that lora layers have been correctly added. But when I check the weights of conv layers in pre-trained model and one after injecting LoRa layers its not same:
Moreoevr, in my original Conv2D, bias is set to
The requires_grad is also True, but in pre-trained conv layer, bias was False, so from where these values are coming? Can you please give your feedback on this? |
This seems to be the problem. If you manually replace the layers after loading the ckpt, these layers will be rewritten. These layers also have biases because you didn't pass bias=False to the constructor. Can you try modifying the architecture and then loading the ckpt? |
Hi! I am trying to fine tune Conv2d with LoRA. I first loaded the pre-trained model weights. Below is the snippet of original Conv2d in the model and respective weight of first matrix:
Now when I replaced Conv2d with ConvLoRA (tried both manualy replacing and dynamically replacing the layer), the model architecture got updated as follows, but strangely, the weights of model.init_path[0].conv.weight[0][0] is not same as of original pre-trained weights:
Why the conv weights is different than what was in original model?
The text was updated successfully, but these errors were encountered: