diff --git a/index.html b/index.html index 3745c3b..cfff8a8 100644 --- a/index.html +++ b/index.html @@ -231,7 +231,6 @@
We meassure image fidelity and image-text-alignment using the standard metrics FID-30K and Clip Scores. We find that MultiFusion prompted with text only performs on par with Stable Diffusion despite extension of the Encoder to support multiple languages and modalities.
-Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark MCC-250 we show that multimodal prompting leads to more compositional robustness as judged by humans.