From c7cbc0b7386e49a95453ff733d91b5a176875023 Mon Sep 17 00:00:00 2001
From: HannahBenita <77296142+HannahBenita@users.noreply.github.com>
Date: Fri, 1 Dec 2023 17:09:37 +0100
Subject: [PATCH] Update index.html

---
 index.html | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/index.html b/index.html
index 381e437..fbfad43 100644
--- a/index.html
+++ b/index.html
@@ -259,6 +259,7 @@ <h4>Compositional Robustness</h4>
 			<p>Image Composition is a known limitation of Diffusion Models. Through evaluation of our new benchmark <a href="https://huggingface.co/datasets/AIML-TUDA/MCC-250">MCC-250</a> we show that multimodal prompting leads to more compositional robustness as judged by humans. Each prompt is a complex conjunction of two different objects with different
 			colors, with multimodal prompts containing one visual reference for each object interleaved with the text input.  </p>
 		</div>
+		<p> </p>
 		</div>
 		<h4>Multilinguality</h4>
 		<p>Below we demostrate the multilingual alignment of images generated by MultiFusion. All images were generated using the same seed and with the respective translation of the prompt ‘an image of an astronaut riding
@@ -275,9 +276,9 @@ <h4>Multilinguality</h4>
 		<div class="row">
             <div class="col-md-8 col-md-offset-2">
 	<h3>
-        Attention Manipulation for Multimodal inference
+        Attention Manipulation for Multimodal Inference
         </h3>
-	<p> Attention Manipulation allows us to weight image and text tokens at inference time and guide their influence on the resulting generation. </p>
+	<p> Attention Manipulation, based on <a href="https://arxiv.org/abs/2301.08110">AtMan</a>, allows us to weight image and text tokens at inference time and guide their influence on the resulting generation. </p>
 	<image src="https://Aleph-Alpha.github.io/MultiFusion/src/imgs/attn_manipulation.png" class="img-responsive" alt="method"><br>	
 	</div>
         </div>