update

lena-voita · Jun 26, 2021 · 537b984 · 537b984
1 parent 9e679fa
commit 537b984
Show file tree

Hide file tree

Showing 33 changed files with 62 additions and 33 deletions.
diff --git a/index.md b/index.md
@@ -22,13 +22,14 @@ I (still) [teach NLP](https://github.com/yandexdataschool/nlp_course) at the [Ya
 
 ## <span style="color:darkblue">News </span>
 
-* 03-06/2021 Invited talks at: [Stanford NLP Seminar](https://nlp.stanford.edu/seminar/), CornellNLP, [MT@UPC](https://mt.cs.upc.edu/seminars/), <span style="color:#888">CambridgeNLP, [DeeLIO workshop at NAACL 2021](https://sites.google.com/view/deelio-ws/), ...TBU.</span>
-* 10-12/2020 Invited talks at: CMU, [USC ISI](https://nlg.isi.edu/nl-seminar/), ENS Paris, [ML Street Talk](https://www.youtube.com/watch?v=Q0kN_ZHHDQY).
+* 06/2021 Our [Source and Target Contributions paper](https://arxiv.org/pdf/2010.10907.pdf) is _accepted to __ACL__ 2021_.
+* 03-06/2021 <span style="color:#888"><u>Invited talks at</u></span>: [Stanford NLP Seminar](https://nlp.stanford.edu/seminar/), CornellNLP, [MT@UPC](https://mt.cs.upc.edu/seminars/), CambridgeNLP, [DeeLIO workshop at NAACL 2021](https://sites.google.com/view/deelio-ws/).
+* 10-12/2020 <span style="color:#888"><u>Invited talks at</u></span>: CMU, [USC ISI](https://nlg.isi.edu/nl-seminar/), ENS Paris, [ML Street Talk](https://www.youtube.com/watch?v=Q0kN_ZHHDQY).
 * 09/2020 __2__ papers _accepted to __EMNLP__ 2020_.
-* 06-08/2020 Invited talks at: MIT, DeepMind, [Grammarly AI](https://grammarly.ai/information-theoretic-probing-with-minimum-description-length/), Unbabel, [NLP with Friends](https://nlpwithfriends.com).
+* 06-08/2020 <span style="color:#888"><u>Invited talks at</u></span>: MIT, DeepMind, [Grammarly AI](https://grammarly.ai/information-theoretic-probing-with-minimum-description-length/), Unbabel, [NLP with Friends](https://nlpwithfriends.com).
 * 04/2020 Our [BPE-dropout](https://arxiv.org/pdf/1910.13267.pdf) is _accepted to __ACL__ 2020_. 
 * 01/2020 I'm [awarded Facebook PhD Fellowship](https://research.fb.com/blog/2020/01/announcing-the-recipients-of-the-2020-facebook-fellowship-awards/).
-* 01/2020 Invited talks at: [Rasa](https://www.meetup.com/ru-RU/Bots-Berlin-Build-better-conversational-interfaces-with-AI/events/267058207/), Google Research Berlin, [Naver Labs Europe](https://europe.naverlabs.com/research/seminars/analyzing-information-flow-in-transformers/), NLP track at [Applied Machine Learning Days at EPFL](https://appliedmldays.org/tracks/ai-nlp).
+* 01/2020 <span style="color:#888"><u>Invited talks at</u></span>: [Rasa](https://www.meetup.com/ru-RU/Bots-Berlin-Build-better-conversational-interfaces-with-AI/events/267058207/), Google Research Berlin, [Naver Labs Europe](https://europe.naverlabs.com/research/seminars/analyzing-information-flow-in-transformers/), NLP track at [Applied Machine Learning Days at EPFL](https://appliedmldays.org/tracks/ai-nlp).
 * 08-09/2019 __2__ papers _accepted to __EMNLP__ 2019_, one at __NeurIPS__ _2019_.
 * 05/2019 __2__ papers _accepted to __ACL__ 2019_, one is oral.
 
diff --git a/posts.html b/posts.html
@@ -2,7 +2,7 @@
 layout: default
 title: Blog
 description: Intuitive explanations for some of my papers.
-menu: no
+menu: yes
 order: 1
 ---
 

diff --git a/posts/nmt_inside_out.html b/posts/nmt_inside_out.html
@@ -1048,21 +1048,18 @@ <h4>What will our model do?</h4>
 <p><font face="arial">What will our model do: ignore the source or the prefix?</font>
     Previous work shows that, in principle, our model can ignore either the source or the prefix.
 </p>
-<p>We see that at early generation steps, when the prefix is short, the model “recovers”.
-    It ignores the prefix: we see very high source contribution.
-
-    But later, when the prefix is long, the model starts to ignore the source: the source contribution
-    drops down significantly.
+<p>As we see from the results, the model tends to fall into hallucination mode even when a random prefix
+    is very short, e.g.
+    one token: we see a large drop of source influence for all positions.
+    This
+    behavior is what we would expect when a model is hallucinating, and there is no self-recovery ability.
 </p>
 
 <center>
-<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=85% alt="" />
+<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=65% alt="" />
 </center>
 
-<p>Overall, a model’s decision of which of these two contradicting parts to support
-    changes depending on the prefix length. If the prefix is short, it relies on the source;
-    if the prefix is long, it relies on the prefix.
-</p>
+
 
 
 <h3>More in the paper:</h3>

diff --git a/posts/source_target_contributions_to_nmt.html b/posts/source_target_contributions_to_nmt.html
@@ -377,19 +377,38 @@ <h4><span style="margin-right:15px;">&#8226;</span><font face="arial">
     reported this <font face="arial">self-recovery ability</font> of LMs.
 </p>
 
+<div style="margin-bottom:40px;">
 <center>
 <img src="../resources/posts/src_dst_nmt/random_prefix_previous_work-min.png" style="margin-bottom:20px;" width=80% alt="" />
 </center>
+<div style="margin-left:15%;max-width:20%;float:left;margin-top:-20px;">
+            <p style="font-size:small;">
+                <font color="#888">
+                (<a href="https://www.aclweb.org/anthology/W17-3204/" target="_blank">Koehn & Knowles, 2017</a>;
+                <a href="https://www.aclweb.org/anthology/W19-5361/" target="_blank">Berard et al, 2019</a>)
+                </font>
+            </p>
+    </div>
+    <div style="margin-right:15%;max-width:30%;float:right;margin-top:-20px;">
+            <p style="font-size:small;">
+                <font color="#888">
+                (<a href="https://arxiv.org/abs/1905.10617" target="_blank">He et al, 2019</a>)
+                </font>
+            </p>
+    </div>
+</div>
 
 <p><font face="arial">What will our model do: ignore the source or the prefix?</font>
     According to previous work, it can do either!</p>
-<p>As we see from the results, it depends on the prefix length. When a random prefix is short,
-    the model recovers: it ignores the prefix and bases its predictions mostly on the source. When a random prefix becomes longer,
-    the model's choice shifts towards ignoring the source: source contribution drops drastically. This
-    behavior is what we would expect when a model is hallucinating.
+<p>
+    As we see from the results, the model tends to fall into hallucination mode even when a random prefix
+    is very short, e.g.
+    one token: we see a large drop of source influence for all positions.
+    This
+    behavior is what we would expect when a model is hallucinating, and there is no self-recovery ability.
 </p>
 <center>
-<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=85% alt="" />
+<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=65% alt="" />
 </center>
 
 <p>Next, we see that with a random prefix, the entropy of contributions is very high and is roughly constant across
@@ -421,11 +440,15 @@ <h3>Exposure Bias and Source Contributions</h3>
 
 <img src="../resources/posts/src_dst_nmt/exposure_bias_lets_measure-min.png" style="margin-bottom:20px;" width=100% alt="" />
 
-<h4><u>How</u>: Feed Random Prefixes, Look at Contributions</h4>
+<h4><u>How</u>: Feed Different Prefixes, Look at Contributions</h4>
 
 <p>We want to check to what extent models that suffer from exposure bias to differing extent are prone to hallucinations.
-    For this, we feed fluent but unrelated to source prefixes and look whether a model is likely to fall into a
-    language modeling regime, i.e., <font face="arial">to what extent it ignores the source</font>.
+    For this, we feed different types of prefixes,
+    prefixes of either model-generated translations or random sentences, and look at model behavior.
+    While conditioning on model-generated prefixes shows what happens in the standard setting at model's inference,
+    random prefixes (fluent but unrelated to source prefixes) show
+    whether the model is likely to fall into a
+    language modeling regime, i.e., <font face="arial">to what extent it ignores the source and hallucinates</font>.
 </p>
 
 <img src="../resources/posts/src_dst_nmt/exposure_bias_models-min.png"
@@ -438,9 +461,9 @@ <h4><u>How</u>: Feed Random Prefixes, Look at Contributions</h4>
 
 </p>
 
-<img src="../resources/posts/src_dst_nmt/exposure_bias_results-min.png" style="margin-bottom:20px;" width=100% alt="" />
+<img src="../resources/posts/src_dst_nmt/mrt_all_prefixes_src_contribution-min.png" style="margin-bottom:20px;" width=100% alt="" />
 
-<p>The results confirm our hypothesis:
+<p>The results for both types of prefixes confirm our hypothesis:
 </p>
 <div style="font-size:18px;
 border-left: 5px solid #b7db67;
@@ -455,8 +478,18 @@ <h4><u>How</u>: Feed Random Prefixes, Look at Contributions</h4>
             </div>
 
 <p>Indeed, we see that MRT-trained models ignore source less
-    than any other model; the second best is the target-side word dropout, which also reduces exposure bias.
+    than any other model; the second best for random prefixes is the target-side word dropout, which also reduces exposure bias.
+</p>
+
+
+<p>It is also interesting to look at the entropy of source contributions to see whether these objectives
+    make the model more or less "focused". We see that only MRT leads to more confident contributions
+    of source tokens: the entropy is lower. In contrast, both word dropout variants teach
+    the model to use broader context.
 </p>
+<center>
+<img src="../resources/posts/src_dst_nmt/mrt_all_prefixes_entropy-min.png" style="margin-bottom:20px;" width=80% alt="" />
+</center>
 
 <br>
 
@@ -469,11 +502,7 @@ <h4>TL;DR: With more data, models use source more and do it more confidently.</h
 
 <img src="../resources/posts/src_dst_nmt/data_amount_both-min.png" style="margin-bottom:20px;" width=100% alt="" />
 <p>First, we see that, generally, models trained with more data use source more.
-    Surprisingly, this increase is not spread evenly across positions:
-    at approximately 80% of the target length, models trained with more data use
-    source more, but at the last positions, they switch to more actively using the prefix.
-</p>
-<p>Second, with more training data, the model becomes more confident in the choice of important tokens: the entropy
+   Second, with more training data, the model becomes more confident in the choice of important tokens: the entropy
     of contributions becomes lower (in the paper, we also show entropy of target contributions).</p>
 
 <!--
@@ -598,7 +627,7 @@ <h3>What's next?</h3>
 <ul>
     <li><font face="arial">measure how different training regimes change contributions</font><br>
         Some method claims to reduce exposure bias?
-        That's great - now you can measure if indeed increases contribution of source.
+        That's great - now you can measure if it indeed increases contribution of source.
     </li>
 
     <li><font face="arial">look deeper into a model's pathological behavior</font><br>

diff --git a/resources/posts/src_dst_nmt/average_over_set-min.png b/resources/posts/src_dst_nmt/average_over_set-min.png
diff --git a/resources/posts/src_dst_nmt/average_over_set.png b/resources/posts/src_dst_nmt/average_over_set.png
diff --git a/resources/posts/src_dst_nmt/changes_are_not_monotonic-min.png b/resources/posts/src_dst_nmt/changes_are_not_monotonic-min.png
diff --git a/resources/posts/src_dst_nmt/changes_are_not_monotonic.png b/resources/posts/src_dst_nmt/changes_are_not_monotonic.png
diff --git a/resources/posts/src_dst_nmt/contributions_converge_early_v2-min.png b/resources/posts/src_dst_nmt/contributions_converge_early_v2-min.png
diff --git a/resources/posts/src_dst_nmt/contributions_converge_early_v2.png b/resources/posts/src_dst_nmt/contributions_converge_early_v2.png
diff --git a/resources/posts/src_dst_nmt/data_amount_both-min.png b/resources/posts/src_dst_nmt/data_amount_both-min.png
diff --git a/resources/posts/src_dst_nmt/data_amount_both.png b/resources/posts/src_dst_nmt/data_amount_both.png
diff --git a/resources/posts/src_dst_nmt/general_source_influence-min.png b/resources/posts/src_dst_nmt/general_source_influence-min.png
diff --git a/resources/posts/src_dst_nmt/general_source_influence.png b/resources/posts/src_dst_nmt/general_source_influence.png
diff --git a/resources/posts/src_dst_nmt/general_src_entropy-min.png b/resources/posts/src_dst_nmt/general_src_entropy-min.png
diff --git a/resources/posts/src_dst_nmt/general_src_entropy.png b/resources/posts/src_dst_nmt/general_src_entropy.png
diff --git a/resources/posts/src_dst_nmt/mrt_all_prefixes_entropy-min.png b/resources/posts/src_dst_nmt/mrt_all_prefixes_entropy-min.png
diff --git a/resources/posts/src_dst_nmt/mrt_all_prefixes_entropy.png b/resources/posts/src_dst_nmt/mrt_all_prefixes_entropy.png
diff --git a/resources/posts/src_dst_nmt/mrt_all_prefixes_src_contribution-min.png b/resources/posts/src_dst_nmt/mrt_all_prefixes_src_contribution-min.png
diff --git a/resources/posts/src_dst_nmt/mrt_all_prefixes_src_contribution.png b/resources/posts/src_dst_nmt/mrt_all_prefixes_src_contribution.png
diff --git a/resources/posts/src_dst_nmt/our_vs_frankle-min.png b/resources/posts/src_dst_nmt/our_vs_frankle-min.png
diff --git a/resources/posts/src_dst_nmt/our_vs_frankle.png b/resources/posts/src_dst_nmt/our_vs_frankle.png
diff --git a/resources/posts/src_dst_nmt/prefix_model_vs_reference-min.png b/resources/posts/src_dst_nmt/prefix_model_vs_reference-min.png
diff --git a/resources/posts/src_dst_nmt/prefix_model_vs_reference.png b/resources/posts/src_dst_nmt/prefix_model_vs_reference.png
diff --git a/resources/posts/src_dst_nmt/random_prefix_entropy-min.png b/resources/posts/src_dst_nmt/random_prefix_entropy-min.png
diff --git a/resources/posts/src_dst_nmt/random_prefix_entropy.png b/resources/posts/src_dst_nmt/random_prefix_entropy.png
diff --git a/resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png b/resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png
diff --git a/resources/posts/src_dst_nmt/random_prefix_source_contribution.png b/resources/posts/src_dst_nmt/random_prefix_source_contribution.png
diff --git a/resources/posts/src_dst_nmt/stages_with_positions-min.png b/resources/posts/src_dst_nmt/stages_with_positions-min.png
diff --git a/resources/posts/src_dst_nmt/stages_with_positions.png b/resources/posts/src_dst_nmt/stages_with_positions.png
diff --git a/resources/posts/src_dst_nmt/timeline-min.png b/resources/posts/src_dst_nmt/timeline-min.png
diff --git a/resources/posts/src_dst_nmt/timeline.png b/resources/posts/src_dst_nmt/timeline.png
diff --git a/talks.md b/talks.md
@@ -24,6 +24,8 @@ Program committee:
 
 ### <span style="color:darkblue"> Speaker </span>
 
+* 06/2021 <span class="talk-title">_Neural Machine Translation Inside Out_</span> <span style="font-size:0.9em">(keynote)</span> 
+<br/>&nbsp; &nbsp; &nbsp; [DeeLIO workshop at NAACL 2021](https://sites.google.com/view/deelio-ws/)
 * 01/2020 <span class="talk-title">_Evolution of Representations in the Transformer_</span> <span style="font-size:0.9em">(keynote)</span> <a href="https://drive.google.com/file/d/1bEiWUZDNbKAgMkt5PPF9sQl3x9AYNVoZ/view?usp=sharing" class="label slides">Slides</a> <a href="https://youtu.be/ZyWLrBGiEpI" class="label video">Video</a>
 <br/>&nbsp; &nbsp; &nbsp; AI & NLP track at [Applied Machine Learning Days at EPFL](https://appliedmldays.org/tracks/ai-nlp), <span style="font-size:0.9em">Lausanne, Switzerland</span>  
 * 08/2019 <span class="talk-title">_Inside the Transformer_</span>
@@ -39,7 +41,7 @@ Program committee:
 <br/>&nbsp; &nbsp; &nbsp; [Stanford NLP Seminar](https://nlp.stanford.edu/seminar/)
 <br/>&nbsp; &nbsp; &nbsp; Cornell NLP
 <br/>&nbsp; &nbsp; &nbsp; [MT@UPC](https://mt.cs.upc.edu/seminars/)
-
+<br/>&nbsp; &nbsp; &nbsp; Cambridge NLP
 
 * 08-12/2020 <span class="talk-title">_Evaluating Source and Target Contributions to NMT Predictions_</span> <a href="https://drive.google.com/file/d/1ZW51j6Eas7qONdHyLf59EY1bMNrvhOBQ/view?usp=sharing" class="label slides">Slides</a>
 <br/>&nbsp; &nbsp; &nbsp; Unbabel