You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @rinongal, how are you?
I'm trying to train a person face with Textual Inversion on two models, the standard 1.5 base model and Deliberate V2 (DreamBooth fine tuned model based on 1.5, very photorealistic).
When doing the train on 1.5 model, it converges successfully, but when doing the train on Deliberate it doesn't converges despite testing all kind of config (different LR, etc.), including going 10k steps (the only setting that I've kept fixed is using two vectors to represent the token).
Someone told me that It could be related to EMA Weights on the model, but it doesn't make so much sense to me because when we are training we are moving the vectors around trying to find a position that represent the face and I don't see how the EMA Weights could be related to not finding the vector position.
Do you have any idea/insight/intuition on why I can find a vector that represent the face in the base model but can't find it in the other being that the last one was trained based on the other and is quite photorealistic?
I want to deepen this, any direction on how to debug is very appreciated!
Thanks,
Fran
The text was updated successfully, but these errors were encountered:
First of all, a possible workaround may be to train the face in the V1.5 model, and then initialize your DeliberateV2 training using this learned face embedding. The embeddings tend to transfer reasonably well between fine-tuned models, so it might serve as a good initialization. I haven't actively tried doing this, but these sorts of tricks typically work with GANs.
On the broader question - I'm actually not sure why this would happen. I wouldn't expect EMA weights to have a large impact on embedding tuning. Two possible things that do come to mind:
If the weights are saved / loaded with different precision than the baseline model, this may have an impact.
If Deliberate V2 is DreamBooth trained, does it come with its own keyword? This might conflict with the inversion process (e.g. the DB keyword might take attention away from the new TI word). Do your training prompts include this keyword? Have you tried removing / adding it to the prompts?
Thanks @rinongal!
Will try to initialize with the learned embedding.
I don't really know how the model is trained, only that is a fine-tuned version of 1.5 model.
How do I find characters that do not have multiple embeddings for open clip (SDv2)? I have changed the code to work with my need, but no matter which placeholder character I try, it is always in the embedding. Can you help me @rinongal ?
Hi @rinongal, how are you?
I'm trying to train a person face with Textual Inversion on two models, the standard 1.5 base model and Deliberate V2 (DreamBooth fine tuned model based on 1.5, very photorealistic).
When doing the train on 1.5 model, it converges successfully, but when doing the train on Deliberate it doesn't converges despite testing all kind of config (different LR, etc.), including going 10k steps (the only setting that I've kept fixed is using two vectors to represent the token).
Someone told me that It could be related to EMA Weights on the model, but it doesn't make so much sense to me because when we are training we are moving the vectors around trying to find a position that represent the face and I don't see how the EMA Weights could be related to not finding the vector position.
Do you have any idea/insight/intuition on why I can find a vector that represent the face in the base model but can't find it in the other being that the last one was trained based on the other and is quite photorealistic?
I want to deepen this, any direction on how to debug is very appreciated!
Thanks,
Fran
The text was updated successfully, but these errors were encountered: