Is it possible to use SD3 16 channel VAE in SDXL? If yes what can be done ? #8713
-
The idea is to use the 16 channel VAE of SD3 in SDXL , intution is to improve capability of SDXL with this VAE. Possible ideas that I had :
|
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
I think you will still need to retrain the model with it, but I haven't really experimented with the VAEs yet. @jaretburkett is doing some experiments with a new VAE for SD 1.5 cc: @sayakpaul |
Beta Was this translation helpful? Give feedback.
-
I have done a LoRA to change SDXL latent space to SD1/2 latent space which worked out well. With additional channels it gets a little trickier, but doable. I trained a kl f16 d42 vae recently (16 depth 42 channel) and have been testing training SD 1.5 to work with it. With 16 depth it doubles the output size. It is working, but will take much more time to train, I think because of so much additional information in the latent space. I have a kl f8 d16 VAE training right now, which is almost done. Since it is the same depth it will hopefully train faster. I personally don't want to use the SD3 VAE because it would inherit the restrictive license since SAI has not released it and licenced it separately. Plus, VAEs are easy to train. So it should work, I plan to do it with my own VAE when I get there. Most of the unet can be kept intact, only conv_in and conv_out need to be trained from scratch. To get the full potential of the 16 channel VAE, you would likely need to do a much longer training run to teach it the fine details it is missing, but a simple conversion to generate at the same quality shouldn't take too long. LoRAs, embeddings, IP adapters, etc should all still work when doing this. The one thing I can think of that will get weird is Control Nets since they take a 4ch latent input. So you would need to either do a clever merge on them from the new model, or continue to use the 4 ch VAE for their inputs, either way, you could still get them to work. |
Beta Was this translation helpful? Give feedback.
I have done a LoRA to change SDXL latent space to SD1/2 latent space which worked out well. With additional channels it gets a little trickier, but doable. I trained a kl f16 d42 vae recently (16 depth 42 channel) and have been testing training SD 1.5 to work with it. With 16 depth it doubles the output size. It is working, but will take much more time to train, I think because of so much additional information in the latent space.
I have a kl f8 d16 VAE training right now, which is almost done. Since it is the same depth it will hopefully train faster. I personally don't want to use the SD3 VAE because it would inherit the restrictive license since SAI has not released it and licenced it…