From c7cbc0b7386e49a95453ff733d91b5a176875023 Mon Sep 17 00:00:00 2001 From: HannahBenita <77296142+HannahBenita@users.noreply.github.com> Date: Fri, 1 Dec 2023 17:09:37 +0100 Subject: [PATCH] Update index.html --- index.html | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/index.html b/index.html index 381e437..fbfad43 100644 --- a/index.html +++ b/index.html @@ -259,6 +259,7 @@

Compositional Robustness

Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark MCC-250 we show that multimodal prompting leads to more compositional robustness as judged by humans. Each prompt is a complex conjunction of two different objects with different colors, with multimodal prompts containing one visual reference for each object interleaved with the text input.

+

Multilinguality

Below we demostrate the multilingual alignment of images generated by MultiFusion. All images were generated using the same seed and with the respective translation of the prompt ‘an image of an astronaut riding @@ -275,9 +276,9 @@

Multilinguality

- Attention Manipulation for Multimodal inference + Attention Manipulation for Multimodal Inference

-

Attention Manipulation allows us to weight image and text tokens at inference time and guide their influence on the resulting generation.

+

Attention Manipulation, based on AtMan, allows us to weight image and text tokens at inference time and guide their influence on the resulting generation.

method