Is it possible to use SD3 16 channel VAE in SDXL? If yes what can be done ? #8713

Anustup900 · 2024-06-26T11:44:01Z

Anustup900
Jun 26, 2024

The idea is to use the 16 channel VAE of SD3 in SDXL , intution is to improve capability of SDXL with this VAE. Possible ideas that I had :

Load the SD3 VAE and put a conv layer just before the input layer of unet which will convert the 16 channel to 4 channel the doing the vice versa in the output block as well to get back the 16 channel states.
Indeed this might not be a full use of the SD3 VAE. But taking this as starting point, encountered couple of issues while implementing this , seems the VAE is it self not trained like the older ones, as it is having missing layers.
So just curious about the following points:

Is it even possible ?
If yes what can be probable code changes or areas to look on
any ideas or discussions ?

Answered by jaretburkett

Jun 26, 2024

I have done a LoRA to change SDXL latent space to SD1/2 latent space which worked out well. With additional channels it gets a little trickier, but doable. I trained a kl f16 d42 vae recently (16 depth 42 channel) and have been testing training SD 1.5 to work with it. With 16 depth it doubles the output size. It is working, but will take much more time to train, I think because of so much additional information in the latent space.

I have a kl f8 d16 VAE training right now, which is almost done. Since it is the same depth it will hopefully train faster. I personally don't want to use the SD3 VAE because it would inherit the restrictive license since SAI has not released it and licenced it…

View full answer

asomoza · 2024-06-26T18:26:27Z

asomoza
Jun 26, 2024
Maintainer

I think you will still need to retrain the model with it, but I haven't really experimented with the VAEs yet.

@jaretburkett is doing some experiments with a new VAE for SD 1.5

cc: @sayakpaul

0 replies

jaretburkett · 2024-06-26T20:02:40Z

jaretburkett
Jun 26, 2024

I have done a LoRA to change SDXL latent space to SD1/2 latent space which worked out well. With additional channels it gets a little trickier, but doable. I trained a kl f16 d42 vae recently (16 depth 42 channel) and have been testing training SD 1.5 to work with it. With 16 depth it doubles the output size. It is working, but will take much more time to train, I think because of so much additional information in the latent space.

I have a kl f8 d16 VAE training right now, which is almost done. Since it is the same depth it will hopefully train faster. I personally don't want to use the SD3 VAE because it would inherit the restrictive license since SAI has not released it and licenced it separately. Plus, VAEs are easy to train.

So it should work, I plan to do it with my own VAE when I get there. Most of the unet can be kept intact, only conv_in and conv_out need to be trained from scratch. To get the full potential of the 16 channel VAE, you would likely need to do a much longer training run to teach it the fine details it is missing, but a simple conversion to generate at the same quality shouldn't take too long.

LoRAs, embeddings, IP adapters, etc should all still work when doing this. The one thing I can think of that will get weird is Control Nets since they take a 4ch latent input. So you would need to either do a clever merge on them from the new model, or continue to use the 4 ch VAE for their inputs, either way, you could still get them to work.

1 reply

Anustup900 Jul 3, 2024
Author

Thanks @jaretburkett , for the explanation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it possible to use SD3 16 channel VAE in SDXL? If yes what can be done ? #8713

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Is it possible to use SD3 16 channel VAE in SDXL? If yes what can be done ? #8713

Anustup900 Jun 26, 2024

Replies: 2 comments · 1 reply

asomoza Jun 26, 2024 Maintainer

jaretburkett Jun 26, 2024

Anustup900 Jul 3, 2024 Author

Anustup900
Jun 26, 2024

Replies: 2 comments 1 reply

asomoza
Jun 26, 2024
Maintainer

jaretburkett
Jun 26, 2024

Anustup900 Jul 3, 2024
Author