Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
lena-voita committed Jun 26, 2021
1 parent 9e679fa commit 537b984
Show file tree
Hide file tree
Showing 33 changed files with 62 additions and 33 deletions.
9 changes: 5 additions & 4 deletions index.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,14 @@ I (still) [teach NLP](https://github.com/yandexdataschool/nlp_course) at the [Ya

## <span style="color:darkblue">News </span>

* 03-06/2021 Invited talks at: [Stanford NLP Seminar](https://nlp.stanford.edu/seminar/), CornellNLP, [MT@UPC](https://mt.cs.upc.edu/seminars/), <span style="color:#888">CambridgeNLP, [DeeLIO workshop at NAACL 2021](https://sites.google.com/view/deelio-ws/), ...TBU.</span>
* 10-12/2020 Invited talks at: CMU, [USC ISI](https://nlg.isi.edu/nl-seminar/), ENS Paris, [ML Street Talk](https://www.youtube.com/watch?v=Q0kN_ZHHDQY).
* 06/2021 Our [Source and Target Contributions paper](https://arxiv.org/pdf/2010.10907.pdf) is _accepted to __ACL__ 2021_.
* 03-06/2021 <span style="color:#888"><u>Invited talks at</u></span>: [Stanford NLP Seminar](https://nlp.stanford.edu/seminar/), CornellNLP, [MT@UPC](https://mt.cs.upc.edu/seminars/), CambridgeNLP, [DeeLIO workshop at NAACL 2021](https://sites.google.com/view/deelio-ws/).
* 10-12/2020 <span style="color:#888"><u>Invited talks at</u></span>: CMU, [USC ISI](https://nlg.isi.edu/nl-seminar/), ENS Paris, [ML Street Talk](https://www.youtube.com/watch?v=Q0kN_ZHHDQY).
* 09/2020 __2__ papers _accepted to __EMNLP__ 2020_.
* 06-08/2020 Invited talks at: MIT, DeepMind, [Grammarly AI](https://grammarly.ai/information-theoretic-probing-with-minimum-description-length/), Unbabel, [NLP with Friends](https://nlpwithfriends.com).
* 06-08/2020 <span style="color:#888"><u>Invited talks at</u></span>: MIT, DeepMind, [Grammarly AI](https://grammarly.ai/information-theoretic-probing-with-minimum-description-length/), Unbabel, [NLP with Friends](https://nlpwithfriends.com).
* 04/2020 Our [BPE-dropout](https://arxiv.org/pdf/1910.13267.pdf) is _accepted to __ACL__ 2020_.
* 01/2020 I'm [awarded Facebook PhD Fellowship](https://research.fb.com/blog/2020/01/announcing-the-recipients-of-the-2020-facebook-fellowship-awards/).
* 01/2020 Invited talks at: [Rasa](https://www.meetup.com/ru-RU/Bots-Berlin-Build-better-conversational-interfaces-with-AI/events/267058207/), Google Research Berlin, [Naver Labs Europe](https://europe.naverlabs.com/research/seminars/analyzing-information-flow-in-transformers/), NLP track at [Applied Machine Learning Days at EPFL](https://appliedmldays.org/tracks/ai-nlp).
* 01/2020 <span style="color:#888"><u>Invited talks at</u></span>: [Rasa](https://www.meetup.com/ru-RU/Bots-Berlin-Build-better-conversational-interfaces-with-AI/events/267058207/), Google Research Berlin, [Naver Labs Europe](https://europe.naverlabs.com/research/seminars/analyzing-information-flow-in-transformers/), NLP track at [Applied Machine Learning Days at EPFL](https://appliedmldays.org/tracks/ai-nlp).
* 08-09/2019 __2__ papers _accepted to __EMNLP__ 2019_, one at __NeurIPS__ _2019_.
* 05/2019 __2__ papers _accepted to __ACL__ 2019_, one is oral.

2 changes: 1 addition & 1 deletion posts.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: default
title: Blog
description: Intuitive explanations for some of my papers.
menu: no
menu: yes
order: 1
---

Expand Down
17 changes: 7 additions & 10 deletions posts/nmt_inside_out.html
Original file line number Diff line number Diff line change
Expand Up @@ -1048,21 +1048,18 @@ <h4>What will our model do?</h4>
<p><font face="arial">What will our model do: ignore the source or the prefix?</font>
Previous work shows that, in principle, our model can ignore either the source or the prefix.
</p>
<p>We see that at early generation steps, when the prefix is short, the model “recovers”.
It ignores the prefix: we see very high source contribution.

But later, when the prefix is long, the model starts to ignore the source: the source contribution
drops down significantly.
<p>As we see from the results, the model tends to fall into hallucination mode even when a random prefix
is very short, e.g.
one token: we see a large drop of source influence for all positions.
This
behavior is what we would expect when a model is hallucinating, and there is no self-recovery ability.
</p>

<center>
<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=85% alt="" />
<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=65% alt="" />
</center>

<p>Overall, a model’s decision of which of these two contradicting parts to support
changes depending on the prefix length. If the prefix is short, it relies on the source;
if the prefix is long, it relies on the prefix.
</p>



<h3>More in the paper:</h3>
Expand Down
63 changes: 46 additions & 17 deletions posts/source_target_contributions_to_nmt.html
Original file line number Diff line number Diff line change
Expand Up @@ -377,19 +377,38 @@ <h4><span style="margin-right:15px;">&#8226;</span><font face="arial">
reported this <font face="arial">self-recovery ability</font> of LMs.
</p>

<div style="margin-bottom:40px;">
<center>
<img src="../resources/posts/src_dst_nmt/random_prefix_previous_work-min.png" style="margin-bottom:20px;" width=80% alt="" />
</center>
<div style="margin-left:15%;max-width:20%;float:left;margin-top:-20px;">
<p style="font-size:small;">
<font color="#888">
(<a href="https://www.aclweb.org/anthology/W17-3204/" target="_blank">Koehn & Knowles, 2017</a>;
<a href="https://www.aclweb.org/anthology/W19-5361/" target="_blank">Berard et al, 2019</a>)
</font>
</p>
</div>
<div style="margin-right:15%;max-width:30%;float:right;margin-top:-20px;">
<p style="font-size:small;">
<font color="#888">
(<a href="https://arxiv.org/abs/1905.10617" target="_blank">He et al, 2019</a>)
</font>
</p>
</div>
</div>

<p><font face="arial">What will our model do: ignore the source or the prefix?</font>
According to previous work, it can do either!</p>
<p>As we see from the results, it depends on the prefix length. When a random prefix is short,
the model recovers: it ignores the prefix and bases its predictions mostly on the source. When a random prefix becomes longer,
the model's choice shifts towards ignoring the source: source contribution drops drastically. This
behavior is what we would expect when a model is hallucinating.
<p>
As we see from the results, the model tends to fall into hallucination mode even when a random prefix
is very short, e.g.
one token: we see a large drop of source influence for all positions.
This
behavior is what we would expect when a model is hallucinating, and there is no self-recovery ability.
</p>
<center>
<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=85% alt="" />
<img src="../resources/posts/src_dst_nmt/random_prefix_source_contribution-min.png" style="margin-bottom:20px;" width=65% alt="" />
</center>

<p>Next, we see that with a random prefix, the entropy of contributions is very high and is roughly constant across
Expand Down Expand Up @@ -421,11 +440,15 @@ <h3>Exposure Bias and Source Contributions</h3>

<img src="../resources/posts/src_dst_nmt/exposure_bias_lets_measure-min.png" style="margin-bottom:20px;" width=100% alt="" />

<h4><u>How</u>: Feed Random Prefixes, Look at Contributions</h4>
<h4><u>How</u>: Feed Different Prefixes, Look at Contributions</h4>

<p>We want to check to what extent models that suffer from exposure bias to differing extent are prone to hallucinations.
For this, we feed fluent but unrelated to source prefixes and look whether a model is likely to fall into a
language modeling regime, i.e., <font face="arial">to what extent it ignores the source</font>.
For this, we feed different types of prefixes,
prefixes of either model-generated translations or random sentences, and look at model behavior.
While conditioning on model-generated prefixes shows what happens in the standard setting at model's inference,
random prefixes (fluent but unrelated to source prefixes) show
whether the model is likely to fall into a
language modeling regime, i.e., <font face="arial">to what extent it ignores the source and hallucinates</font>.
</p>

<img src="../resources/posts/src_dst_nmt/exposure_bias_models-min.png"
Expand All @@ -438,9 +461,9 @@ <h4><u>How</u>: Feed Random Prefixes, Look at Contributions</h4>

</p>

<img src="../resources/posts/src_dst_nmt/exposure_bias_results-min.png" style="margin-bottom:20px;" width=100% alt="" />
<img src="../resources/posts/src_dst_nmt/mrt_all_prefixes_src_contribution-min.png" style="margin-bottom:20px;" width=100% alt="" />

<p>The results confirm our hypothesis:
<p>The results for both types of prefixes confirm our hypothesis:
</p>
<div style="font-size:18px;
border-left: 5px solid #b7db67;
Expand All @@ -455,8 +478,18 @@ <h4><u>How</u>: Feed Random Prefixes, Look at Contributions</h4>
</div>

<p>Indeed, we see that MRT-trained models ignore source less
than any other model; the second best is the target-side word dropout, which also reduces exposure bias.
than any other model; the second best for random prefixes is the target-side word dropout, which also reduces exposure bias.
</p>


<p>It is also interesting to look at the entropy of source contributions to see whether these objectives
make the model more or less "focused". We see that only MRT leads to more confident contributions
of source tokens: the entropy is lower. In contrast, both word dropout variants teach
the model to use broader context.
</p>
<center>
<img src="../resources/posts/src_dst_nmt/mrt_all_prefixes_entropy-min.png" style="margin-bottom:20px;" width=80% alt="" />
</center>

<br>

Expand All @@ -469,11 +502,7 @@ <h4>TL;DR: With more data, models use source more and do it more confidently.</h

<img src="../resources/posts/src_dst_nmt/data_amount_both-min.png" style="margin-bottom:20px;" width=100% alt="" />
<p>First, we see that, generally, models trained with more data use source more.
Surprisingly, this increase is not spread evenly across positions:
at approximately 80% of the target length, models trained with more data use
source more, but at the last positions, they switch to more actively using the prefix.
</p>
<p>Second, with more training data, the model becomes more confident in the choice of important tokens: the entropy
Second, with more training data, the model becomes more confident in the choice of important tokens: the entropy
of contributions becomes lower (in the paper, we also show entropy of target contributions).</p>

<!--
Expand Down Expand Up @@ -598,7 +627,7 @@ <h3>What's next?</h3>
<ul>
<li><font face="arial">measure how different training regimes change contributions</font><br>
Some method claims to reduce exposure bias?
That's great - now you can measure if indeed increases contribution of source.
That's great - now you can measure if it indeed increases contribution of source.
</li>

<li><font face="arial">look deeper into a model's pathological behavior</font><br>
Expand Down
Binary file modified resources/posts/src_dst_nmt/average_over_set-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/average_over_set.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/changes_are_not_monotonic-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/changes_are_not_monotonic.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/contributions_converge_early_v2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/data_amount_both-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/data_amount_both.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/general_source_influence-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/general_source_influence.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/general_src_entropy-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/general_src_entropy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/our_vs_frankle-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/our_vs_frankle.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/prefix_model_vs_reference-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/prefix_model_vs_reference.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/random_prefix_entropy-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/random_prefix_entropy.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/stages_with_positions-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified resources/posts/src_dst_nmt/stages_with_positions.png
Binary file modified resources/posts/src_dst_nmt/timeline-min.png
Binary file modified resources/posts/src_dst_nmt/timeline.png
4 changes: 3 additions & 1 deletion talks.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ Program committee:

### <span style="color:darkblue"> Speaker </span>

* 06/2021 <span class="talk-title">_Neural Machine Translation Inside Out_</span> <span style="font-size:0.9em">(keynote)</span>
<br/>&nbsp; &nbsp; &nbsp; [DeeLIO workshop at NAACL 2021](https://sites.google.com/view/deelio-ws/)
* 01/2020 <span class="talk-title">_Evolution of Representations in the Transformer_</span> <span style="font-size:0.9em">(keynote)</span> <a href="https://drive.google.com/file/d/1bEiWUZDNbKAgMkt5PPF9sQl3x9AYNVoZ/view?usp=sharing" class="label slides">Slides</a> <a href="https://youtu.be/ZyWLrBGiEpI" class="label video">Video</a>
<br/>&nbsp; &nbsp; &nbsp; AI & NLP track at [Applied Machine Learning Days at EPFL](https://appliedmldays.org/tracks/ai-nlp), <span style="font-size:0.9em">Lausanne, Switzerland</span>
* 08/2019 <span class="talk-title">_Inside the Transformer_</span>
Expand All @@ -39,7 +41,7 @@ Program committee:
<br/>&nbsp; &nbsp; &nbsp; [Stanford NLP Seminar](https://nlp.stanford.edu/seminar/)
<br/>&nbsp; &nbsp; &nbsp; Cornell NLP
<br/>&nbsp; &nbsp; &nbsp; [MT@UPC](https://mt.cs.upc.edu/seminars/)

<br/>&nbsp; &nbsp; &nbsp; Cambridge NLP

* 08-12/2020 <span class="talk-title">_Evaluating Source and Target Contributions to NMT Predictions_</span> <a href="https://drive.google.com/file/d/1ZW51j6Eas7qONdHyLf59EY1bMNrvhOBQ/view?usp=sharing" class="label slides">Slides</a>
<br/>&nbsp; &nbsp; &nbsp; Unbabel
Expand Down

0 comments on commit 537b984

Please sign in to comment.