Skip to content
Vladimir Mandic edited this page Oct 15, 2024 · 10 revisions

Models

List of popular text-to-image generative models with their respective parameters and architecture overview
Original URL: https://github.com/vladmandic/automatic/wiki/Models

Vendor Model Variant Size Architecture Model Params Text Encoder(s) TE Params VAE Other
StabilityAI Stable Diffusion 1.5 2.28GB UNet 0.86B CLiP ViT-L 0.12B VAE
StabilityAI Stable Diffusion 2.1 2.58GB UNet 0.86B CLiP ViT 0.34B VAE
StabilityAI Stable Diffusion XL 6.94GB UNet 2.56B CLiP ViT-L + ViT+G 0.12B + 0.69B VAE
StabilityAI Stable Diffusion 3 Medium 15.14GB MMDiT 2.0B CLiP ViT-L + ViT+G + T5-XXL 0.12B + 0.69B + 4.76B 16ch VAE
StabilityAI Stable Cascade Medium 11.82GB Multi-stage UNet 1.56B + 3.6B CLiP ViT-G 0.69B 42x VQE
StabilityAI Stable Cascade Lite 4.97GB Multi-stage UNet 0.7B + 1.0B CLiP ViT-G 0.69B 42x VQE
Black Forest Labs Flux 1 Dev 32.93GB MMDiT 11.9B CLiP ViT-L + T5-XXL 0.12B + 4.769B 16ch VAE
Black Forest Labs Flux 1 Schnell 32.93GB MMDiT 11.9B CLiP ViT-L + T5-XXL 0.12B + 4.769B 16ch VAE
FAL AuraFlow 0.3 31.90GB MMDiT 6.8B UMT5 12.1B VAE
AlphaVLLM Lumina Next SFT 8.67GB DiT 1.7B Gemma 2.5B LM
PixArt Alpha XL 2 21.3GB DiT 0.61B T5-XXL 4.76B VAE
PixArt Sigma XL 2 21.3GB DiT 0.61B T5-XXL 4.76B VAE
Segmind SSD-1B N/A 8.72GB UNet 1.33B CLiP ViT-L + ViT+G 0.12B + 0.69B VAE
Segmind Vega N/A 6.43GB UNet 0.75B CLiP ViT-L + ViT+G 0.12B + 0.69B VAE
Segmind Tiny N/A 1.03GB UNet 0.32B CLiP ViT-L 0.12B VAE
Kwai Kolors N/A 17.40GB UNnet 2.58B ChatGLM 6.24B VAE LM
PlaygroundAI Playground 1.0 4.95GB UNet 0.86B CLiP ViT-L 0.12B VAE
PlaygroundAI Playground 2.x 13.35GB UNet 2.57B CLiP ViT-L + ViT+G 0.12B + 0.69B VAE
Tencent HunyuanDiT 1.2 14.09GB DiT 1.5B BERT + T5-XL 3.52B + 1.67B VAE LM
Warp AI Wuerstchen N/A 12.16GB Multi-stage UNet 1.0B + 1.05B CLiP ViT-L + ViT+G 0.12B + 0.69B VQE
Kandinsky Kandinsky 2.2 5.15GB Unet 1.25B CLiP ViT-G 0.69B VQ
Kandinsky Kandinsky 3.0 27.72GB Unet 3.05B T5-XXXL 8.72B VQ
Thudm CogView 3 Plus 24.96GB DiT 2.85B T5-XXL 4.76B VAE

Notes:

  • Created using SD.Next built-in model analyzer
  • Number of parameters is proportional to model complexity and ability to learn
    Quality of generated images is also influenced by training data and duration of training
  • Size refers to original model variant in 16bit precision where available
    Quantized variations may be smaller
Clone this wiki locally