New paper and blog post: Source and Target Contributions to NMT

lena-voita · Oct 22, 2020 · 8ddeadc · 8ddeadc
1 parent 7244779
commit 8ddeadc
Show file tree

Hide file tree

Showing 73 changed files with 694 additions and 0 deletions.
diff --git a/_data/papers/2020.yml b/_data/papers/2020.yml
@@ -1,4 +1,20 @@
 
+-
+  layout: paper
+  paper-type: inproceedings
+  selected: y
+  year: 2020
+  img: src_dst_contributions-min.png
+  title: "Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation"
+  authors: "<u>Elena Voita</u>, Rico Sennrich, Ivan Titov"
+  doc-url:
+  conf_name:
+  conf_year:
+  url: "https://arxiv.org/pdf/2010.10907.pdf"
+  code: "https://github.com/lena-voita/the-story-of-heads"
+  blog: "https://lena-voita.github.io/posts/source_target_contributions_to_nmt.html"
+  abstract: "In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature."
+
 -
   layout: paper
   paper-type: inproceedings

diff --git a/_layouts/post.html b/_layouts/post.html
@@ -9,6 +9,12 @@ <h1 style="font-size:38px;" class="post-title">{{ page.title }}</h1>
     <p class="post-meta">{{ page.date | date: "%b %-d, %Y" }}{% if page.tag %} • {{ page.tag }}{% endif %}{% if page.author %} • {{ page.author }}{% endif %}{% if page.meta %} • {{ page.meta }}{% endif %}</p>
   </header>
 
+  <style>
+      p {
+      text-align: justify;
+      }
+  </style>
+
   <article class="post-content">
     {{ content }}
   </article>

diff --git a/img/paper/src_dst_contributions-min.png b/img/paper/src_dst_contributions-min.png
diff --git a/img/paper/src_dst_contributions.png b/img/paper/src_dst_contributions.png
diff --git a/posts.html b/posts.html
@@ -37,6 +37,52 @@
 </style>
 
 
+
+<div class="fullCard" id="thumbnail" >
+    <div class="cardContent">
+
+        <h1 style="font-size:28px;">Source and Target Contributions to NMT Predictions</h1>
+
+        <video width="300" height="auto" style="float: right; margin-left: 15px;" loop autoplay muted>
+          <source src="../resources/posts/src_dst_nmt/src_dst_main.mp4" type="video/mp4">
+        </video>
+
+        <span style="font-size:14px;">
+        This is a post for the paper
+            <a href="https://arxiv.org/pdf/2010.10907.pdf">
+                Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation.
+            </a>
+        </span>
+
+
+        <br/>
+        <br/>
+        <span style="font-size:15px;">
+            In NMT, the generation of a target token is based on two types of context: the source and the prefix of the target sentence.
+            We show how to evaluate the relative contributions of source and target to NMT predictions and find that:
+        <ul>
+          <li>models suffering from exposure bias are more prone to over-relying on target history (and hence to hallucinating) than
+          the ones where the exposure bias is mitigated;</li>
+          <li>models trained with more data rely on the source more and do it more confidently;</li>
+          <li>the training process is non-monotonic with several distinct stages.</li>
+        </ul>
+        </span>
+
+        <a class="pull-right" href="/posts/source_target_contributions_to_nmt.html" onMouseOver="document.readmore5.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore5.src='../resources/posts/buttons/button_read_more-min.png';">
+        <img src="../resources/posts/buttons/button_read_more-min.png" name="readmore5" width=120px class="pull-right"></a>
+        <a class="pull-right" href="https://arxiv.org/pdf/2010.10907.pdf" onMouseOver="document.readpaper5.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper5.src='../resources/posts/buttons/button_read_paper-min.png';">
+        <img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper5" width=120px class="pull-right"></a>
+        <a class="pull-right" href="https://github.com/lena-voita/the-story-of-heads" onMouseOver="document.viewcode5.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode5.src='../resources/posts/buttons/button_view_code-min.png';">
+        <img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode5" width=120px></a>
+
+        <span style="font-size:15px; text-align: right; float: right; color:gray">October 2020</span>
+
+    </div>
+</div>
+
+
+<!-- ################################################################################### -->
+
 <div class="fullCard" id="thumbnail" >
     <div class="cardContent">