Skip to content

Commit

Permalink
New paper and blog post: Source and Target Contributions to NMT
Browse files Browse the repository at this point in the history
  • Loading branch information
lena-voita committed Oct 22, 2020
1 parent 7244779 commit 8ddeadc
Show file tree
Hide file tree
Showing 73 changed files with 694 additions and 0 deletions.
16 changes: 16 additions & 0 deletions _data/papers/2020.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,20 @@

-
layout: paper
paper-type: inproceedings
selected: y
year: 2020
img: src_dst_contributions-min.png
title: "Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation"
authors: "<u>Elena Voita</u>, Rico Sennrich, Ivan Titov"
doc-url:
conf_name:
conf_year:
url: "https://arxiv.org/pdf/2010.10907.pdf"
code: "https://github.com/lena-voita/the-story-of-heads"
blog: "https://lena-voita.github.io/posts/source_target_contributions_to_nmt.html"
abstract: "In Neural Machine Translation (and, more generally, conditional language modeling), the generation of a target token is influenced by two types of context: the source and the prefix of the target sequence. While many attempts to understand the internal workings of NMT models have been made, none of them explicitly evaluates relative source and target contributions to a generation decision. We argue that this relative contribution can be evaluated by adopting a variant of Layerwise Relevance Propagation (LRP). Its underlying 'conservation principle' makes relevance propagation unique: differently from other methods, it evaluates not an abstract quantity reflecting token importance, but the proportion of each token's influence. We extend LRP to the Transformer and conduct an analysis of NMT models which explicitly evaluates the source and target relative contributions to the generation process. We analyze changes in these contributions when conditioning on different types of prefixes, when varying the training objective or the amount of training data, and during the training process. We find that models trained with more data tend to rely on source information more and to have more sharp token contributions; the training process is non-monotonic with several stages of different nature."

-
layout: paper
paper-type: inproceedings
Expand Down
6 changes: 6 additions & 0 deletions _layouts/post.html
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,12 @@ <h1 style="font-size:38px;" class="post-title">{{ page.title }}</h1>
<p class="post-meta">{{ page.date | date: "%b %-d, %Y" }}{% if page.tag %} • {{ page.tag }}{% endif %}{% if page.author %} • {{ page.author }}{% endif %}{% if page.meta %} • {{ page.meta }}{% endif %}</p>
</header>

<style>
p {
text-align: justify;
}
</style>

<article class="post-content">
{{ content }}
</article>
Expand Down
Binary file added img/paper/src_dst_contributions-min.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/paper/src_dst_contributions.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions posts.html
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,52 @@
</style>



<div class="fullCard" id="thumbnail" >
<div class="cardContent">

<h1 style="font-size:28px;">Source and Target Contributions to NMT Predictions</h1>

<video width="300" height="auto" style="float: right; margin-left: 15px;" loop autoplay muted>
<source src="../resources/posts/src_dst_nmt/src_dst_main.mp4" type="video/mp4">
</video>

<span style="font-size:14px;">
This is a post for the paper
<a href="https://arxiv.org/pdf/2010.10907.pdf">
Analyzing the Source and Target Contributions to Predictions in Neural Machine Translation.
</a>
</span>


<br/>
<br/>
<span style="font-size:15px;">
In NMT, the generation of a target token is based on two types of context: the source and the prefix of the target sentence.
We show how to evaluate the relative contributions of source and target to NMT predictions and find that:
<ul>
<li>models suffering from exposure bias are more prone to over-relying on target history (and hence to hallucinating) than
the ones where the exposure bias is mitigated;</li>
<li>models trained with more data rely on the source more and do it more confidently;</li>
<li>the training process is non-monotonic with several distinct stages.</li>
</ul>
</span>

<a class="pull-right" href="/posts/source_target_contributions_to_nmt.html" onMouseOver="document.readmore5.src='../resources/posts/buttons/button_read_more_push-min.png';" onMouseOut="document.readmore5.src='../resources/posts/buttons/button_read_more-min.png';">
<img src="../resources/posts/buttons/button_read_more-min.png" name="readmore5" width=120px class="pull-right"></a>
<a class="pull-right" href="https://arxiv.org/pdf/2010.10907.pdf" onMouseOver="document.readpaper5.src='../resources/posts/buttons/button_read_paper_push-min.png';" onMouseOut="document.readpaper5.src='../resources/posts/buttons/button_read_paper-min.png';">
<img src="../resources/posts/buttons/button_read_paper-min.png" name="readpaper5" width=120px class="pull-right"></a>
<a class="pull-right" href="https://github.com/lena-voita/the-story-of-heads" onMouseOver="document.viewcode5.src='../resources/posts/buttons/button_view_code_push-min.png';" onMouseOut="document.viewcode5.src='../resources/posts/buttons/button_view_code-min.png';">
<img src="../resources/posts/buttons/button_view_code-min.png" name="viewcode5" width=120px></a>

<span style="font-size:15px; text-align: right; float: right; color:gray">October 2020</span>

</div>
</div>


<!-- ################################################################################### -->

<div class="fullCard" id="thumbnail" >
<div class="cardContent">

Expand Down
Loading

0 comments on commit 8ddeadc

Please sign in to comment.