diff --git a/index.html b/index.html index f3869cd..58ac88f 100644 --- a/index.html +++ b/index.html @@ -231,7 +231,9 @@

the text prompt, thus simplifying the learning of mapping from embeddings to image outputs. Finally, to align the pre-trained Stable Diffusion model (1.4) with the embeddings of our modular encoder, we retrain the conditioning by finetuning the cross-attention weights (2.2).

- architecture figure
+ architecture figure 1
+ architecture figure 2
+ architecture figure 3