From 7184903bdba97f567e57233ff6dee9e54ce8075f Mon Sep 17 00:00:00 2001 From: HannahBenita <77296142+HannahBenita@users.noreply.github.com> Date: Fri, 1 Dec 2023 16:42:43 +0100 Subject: [PATCH] Update index.html --- index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/index.html b/index.html index 4f5db99..49986d2 100644 --- a/index.html +++ b/index.html @@ -251,9 +251,9 @@

Image Fidelity and Text-to-Image Alignment

First we meassure image fidelity and image-text-alignment using the standard metrics FID-30K and Clip Scores. We find that MultiFusion prompted with text only performs on par with Stable Diffusion despite extension of the Encoder to support multiple languages and modalities.


Compositional Robustness

-
+
- method
+ method

Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark MCC-250 we show that multimodal prompting leads to more compositional robustness as judged by humans. Each prompt is a complex conjunction of two different objects with different