-
Notifications
You must be signed in to change notification settings - Fork 2
/
pronoun-task.html
345 lines (260 loc) · 30.2 KB
/
pronoun-task.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Pronoun Translation Task - ACL 2016 First Conference on Machine Translation</title>
<style> h3 { margin-top: 2em; } </style>
</HEAD>
<body>
<center>
<script src="title.js"></script>
<p><h2>Shared Task: Cross-lingual Pronoun Prediction</h2></p>
<script src="menu.js"></script>
</center>
<H3>NEWS</H3>
<p>
<ul>
<li>5th May: We are pleased to announce the results of the shared task. Please click on the following links for a <a href="http://data.statmt.org/wmt16/pronoun-task/results/WMT2016_pronoun_task.pdf">PDF containing an overview of the results</a> and <a href="http://data.statmt.org/wmt16/pronoun-task/results/WMT2016-submissions-scored.zip">an archive containing the individual submissions and their scores</a></li>
<li>28th April: All participants are invited to submit a system description paper. Details can be found at the bottom of this web page.</li>
<li>22nd April: Participants are invited to submit up to two systems per task. If you submit more than one system for a given task, please indicate which system is the <b>primary system</b>.</li>
<li>6th April: The test data is now available for download <a href="http://data.statmt.org/wmt16/pronoun-task/">here</a></li>
<li>29th March: Test data will now be released two days later, on 6th April. Submission deadline extended to 22nd April to allow extra time as compensation for issues affecting the baseline language models (see note from 28th March). Systems may be submitted at any time during the period 6th-22nd April.</li>
<li>28th March: An issue was reported for the baseline classification models, whereby the TEDdev data was included in the data used to train the language models. This has now been resolved. If you downloaded the <b>mono+para.5.xx.lemma.trie.kenlm</b> files prior to 20:00 CET on 28th March, please re-download the latest files.</li>
<li>14th March: Baseline models are now available for all sub-tasks.</li>
<li>8th March: The scorer script is now available. See Evaluation section below for details.</li>
<li>25th February: An alignment issue was reported for the filtered data. This has now been resolved. If you downloaded the training data prior to 17:00 CET on 25th February, please re-download the latest filtered data files.</li>
</ul>
</p>
<H3>OVERVIEW</H3>
<p>Pronoun translation poses a problem for current state-of-the-art SMT systems as pronoun systems do not map well across languages, e.g., due to differences in gender, number, case, formality, or humanness, and to differences in where pronouns may be used. Translation divergences typically lead to mistakes in SMT, as when translating the English "it" into French ("il", "elle", or "cela"?) or into German ("er", "sie", or "es"?). One way to model pronoun translation is to treat it as a cross-lingual pronoun prediction task.</p>
<p>We propose such a task, which asks participants to predict a target-language pronoun given a source-language pronoun in the context of a sentence. We further provide a lemmatised target-language human-authored translation of the source sentence, and automatic word alignments between the source sentence words and the target-language lemmata. In the translation, the words aligned to a subset of the source-language third-person pronouns are substituted by placeholders. The aim of the task is to predict, for each placeholder, the word that should replace it from a small, closed set of classes, using any type of information that can be extracted from the documents.</p>
<p>The cross-lingual pronoun prediction task will be similar to the task of the same name at DiscoMT 2015:</p>
<a href="http://www.idiap.ch/workshop/DiscoMT/shared-task">http://www.idiap.ch/workshop/DiscoMT/shared-task</a>
<p>Participants are invited to submit systems for the English-French and English-German language pairs, for both directions.</p>
<H3>GOALS</H3>
<p>The goals of the cross-lingual pronoun prediction task are:
<ul>
<li>To investigate and raise awareness of the problems that pronoun translation poses for SMT</li>
<li>To highlight and examine the specific problems posed by the translation of different source-language pronouns into a given target-language</li>
<li>To provide a self-contained task, which will serve as a way to introduce newcomers to the field of <i>discourse in SMT</i></li>
</ul>
</p>
<H3>TASK DESCRIPTION</H3>
<p>In the cross-lingual pronoun prediction task, you are given a source-language document with a <b>lemmatised and POS-tagged</b> human-authored translation and a set of word alignments between the two languages. In the translation, the lemmatised tokens aligned to the source-language third-person pronouns are substituted by placeholders. Your task is to predict, for each placeholder, <b>the fully inflected word token</b> that should replace the placeholder from a small, closed set of classes. I.e., to provide the fully inflected (German|French) translation of the English pronoun in the context sketched by the lemmatised/tagged target side (in the case of English-to-German|French translation). You may use any type of information that you can extract from the documents.</p>
<p><b>Lemmatised and POS-tagged</b> target-language data is provided in place of fully inflected text. The provision of lemmatised data is intended both to provide a challenging task, and to simulate a scenario that is more closely aligned with working with machine translation system output. POS tags provide additional information which may be useful in the disambiguation of lemmas (e.g. noun vs. verb, etc.) and in the detection of patterns of pronoun use.</p>
<p>The pronoun prediction task will be run for the following sub-tasks:
<ul>
<li>English-to-German</li>
<li>German-to-English</li>
<li>English-to-French</li>
<li>French-to-English</li>
</ul>
</p>
<p>Details of the source-language pronouns and the prediction classes that exist for each of the above sub-tasks are provided in the following section (below). The different combinations of source-language pronoun and target-language prediction classes represent some of the different problems that SMT systems face when translating pronouns for a given language pair and translation direction.</p>
<p>The task will be evaluated automatically by matching the predictions against the words found in the reference translation by computing the overall accuracy and precision, recall and F-score for each class. The primary score for the evaluation is the macro-averaged F-score over all classes. Compared to accuracy, the macro-averaged F-score favours systems that consistently perform well on all classes and penalises systems that maximise the performance on frequent classes while sacrificing infrequent ones.</p>
<p>The data supplied for the classification task consists of parallel source-target text with word alignments. In the target-language text, a subset of the words aligned to source-language occurrences of a specified set of pronouns have been replaced by placeholders of the form REPLACE_xx, where xx is the index of the source-language word the placeholder is aligned to. Your task is to predict one of the classes listed in the relevant source-target section below, for each occurrence of a placeholder.</p>
<p>The training and development data is supplied in a file format with five tab-separated columns:
<ul>
<li>the class label</li>
<li>the word actually removed from the text (may be different from the class label for class OTHER and in some edge cases)</li>
<li>the source-language segment</li>
<li>the target-language segment with pronoun placeholders</li>
<li>the word alignment (a space-separated list of alignments of the form SRC-TGT, where SRC and TGT are zero-based word indices in the source and target-language segment, respectively)</li>
</ul>
</p>
<p>A single segment may contain more than one placeholder. In that case, columns 1 and 2 contain multiple space-separated entries in the order of placeholder occurrence. A document segmentation of the data is provided in separate files for each corpus. These files contain one line per segment, but the precise format varies depending on the type of document markup available for the different corpora. In the development and test data, the files have a single column containing the ID of the document the segment is part of.</p>
<p>Here is an example line from one of the training data files:</p>
<pre>elles Elles They arrive first . REPLACE_0 arriver|VER en|PRP premier|NUM .|. 0-0 1-1 2-2 2-3 3-4</pre>
<p>The test set will be supplied in the same format, but with columns 1 and 2 empty, so that each line starts with two tab characters. Your submission should have the same format as column 1 above, so a correct solution would contain the class label <b>elles</b> in this case. Each line should contain as many space-separated class labels as there are REPLACE tags in the corresponding segment. For each segment not containing any REPLACE tags, an empty line should be emitted. Additional tab-separated columns may be present in the submission, but will be ignored. Note in particular that you are not required to predict the second column. The submitted files should be encoded in UTF-8 (like the data we provide).</p>
<p>The training, development and test datasets have been <b>filtered</b> to remove non-subject position pronouns. The filtering for the test set will be manually checked to ensure that no non-subject position pronouns remain. For more information, please see the section on <b>data filtering</b> below.</p>
<p>The complete test data for the classification task, including reference translations and word alignments, will be released on <b>6th April 2016</b>. Your submission is due on <b>22nd April 2016</b>. Detailed submission instructions can be found at the end of this page.</p>
<H3>SOURCE-LANGUAGE PRONOUN SETS AND TARGET-LANGUAGE PREDICTION CLASS DETAILS</H3>
<p>The following sections describe the set of source-language pronouns and target-language classes to be predicted, for each of the four sub-tasks. Please note that the sub-tasks are asymmetric in terms of the source-language pronouns and prediction classes. The selection of the source-language pronouns and their target-language prediction classes for each sub-task is based on the variation that is possible when translating a given source-language pronoun. For example, when translating the English pronoun "it" into French, a decision must be made as to the gender of the French pronoun, with "il" and "elle" both providing valid options. The translation of the English pronouns "he" and "she" into French, however, does not require such a decision. These may simply be mapped 1-to-1, as "il" and "elle" respectively. The translation of "he" and "she" from English into French is therefore not considered an "interesting" problem and as such, these pronouns are excluded from the source-language set for the English->French sub-task. In the opposite translation, the French pronoun "il" may be translated as "it" or "he", and "elle" as "it" or "she". As a decision must be taken as to the appropriate target-language translation of "il" and "elle", these are included in the set of source-language pronouns for the French->English sub-task.</p>
<p>You should *always* predict either a word token or "OTHER". See prediction class lists below for a list of word tokens to predict for each sub-task.</p>
<H4>English-to-French</H4>
<p>This sub-task will concentrate on the translation of subject position "it" and "they" from English into French. The following prediction classes exist for this sub-task:</p>
<table>
<tr><td><b>ce</b></td><td>The French pronoun ce (sometimes with elided vowel as c') as in the expression c'est "it is"</td></tr>
<tr><td><b>elle</b></td><td>Feminine singular subject pronoun</td></tr>
<tr><td><b>elles</b></td><td>Feminine plural subject pronoun</td></tr>
<tr><td><b>il</b></td><td>Masculine singular subject pronoun</td></tr>
<tr><td><b>ils</b></td><td>Masculine plural subject pronoun</td></tr>
<tr><td><b>cela</b></td><td>Demonstrative pronouns. Includes "cela", "ça", the misspelling "ca", and the rare elided form "ç' "</td></tr>
<tr><td><b>on</b></td><td>Indefinite pronoun</td></tr>
<tr><td><b>OTHER</b></td><td>Some other word, or nothing at all, should be inserted</td></tr>
</table>
<H4>French-to-English</H4>
<p>This sub-task will concentrate on the translation of subject position "elle", "elles", "il", and "ils" from French into English. The following prediction classes exist for this sub-task:</p>
<table>
<tr><td><b>he</b></td><td>Masculine singular subject pronoun</td></tr>
<tr><td><b>she</b></td><td>Feminine singular subject pronoun</td></tr>
<tr><td><b>it</b></td><td>Non-gendered singular subject pronoun</td></tr>
<tr><td><b>they</b></td><td>Non-gendered plural subject pronoun</td></tr>
<tr><td><b>this</b></td><td>Demonstrative pronouns (singular). Includes both "this" and "that"</td></tr>
<tr><td><b>these</b></td><td>Demonstrative pronouns (plural). Includes both "these" and "those"</td></tr>
<tr><td><b>there</b></td><td>Existential "there"</td></tr>
<tr><td><b>OTHER</b></td><td>Some other word, or nothing at all, should be inserted</td></tr>
</table>
<H4>English-to-German</H4>
<p>This sub-task will concentrate on the translation of subject position "it" and "they" from English into German. The following prediction classes exist for this sub-task:</p>
<table>
<tr><td><b>er</b></td><td>Masculine singular subject pronoun</td></tr>
<tr><td><b>sie</b></td><td>Feminine singular subject pronoun</td></tr>
<tr><td><b>es</b></td><td>Neuter singular subject pronoun</td></tr>
<tr><td><b>man</b></td><td>Indefinite pronoun</td></tr>
<tr><td><b>OTHER</b></td><td>Some other word, or nothing at all, should be inserted</td></tr>
</table>
<H4>German-to-English</H4>
<p>This sub-task will concentrate on the translation of subject position "er", "sie" and "es" from German into English. The following prediction classes exist for this sub-task:</p>
<table>
<tr><td><b>he</b></td><td>Masculine singular subject pronoun</td></tr>
<tr><td><b>she</b></td><td>Feminine singular subject pronoun</td></tr>
<tr><td><b>it</b></td><td>Non-gendered singular subject pronoun</td></tr>
<tr><td><b>they</b></td><td>Non-gendered plural subject pronoun</td></tr>
<tr><td><b>you</b></td><td>Second person pronoun (with both generic or deictic uses)</td></tr>
<tr><td><b>this</b></td><td>Demonstrative pronouns (singular). Includes both "this" and "that"</td></tr>
<tr><td><b>these</b></td><td>Demonstrative pronouns (plural). Includes both "these" and "those"</td></tr>
<tr><td><b>there</b></td><td>Existential "there"</td></tr>
<tr><td><b>OTHER</b></td><td>Some other word, or nothing at all, should be inserted</td></tr>
</table>
<H3>DISCUSSION GROUP</H3>
<p>If you are interested in participating in the shared task, we recommend that you sign up to our discussion group to make sure you don't miss any important information. Feel free to ask any questions you may have about the shared task!</p>
<p><a href="https://groups.google.com/forum/#!forum/wmt-2016-cross-lingual-pronoun-prediction-shared-task">https://groups.google.com/forum/#!forum/wmt-2016-cross-lingual-pronoun-prediction-shared-task</a></p>
<H3>DATA FILTERING</H3>
<p>The task is to predict the translation of subject position pronouns for all sub-tasks. In order to ensure fair and accurate evaluation of system performance, <b>filtering has been applied to the source-language texts</b> of the training, development and test datasets, to select only those pronoun instances that are relevant to each sub-task. For example, in the case of English-to-French translation, which focusses on the translation of the English subject position pronouns "it" and "they", only subject position instances of "it" will be included in the development and test datasets.</p>
<H4>Training and development data</H4>
<p>Automatic filtering has been applied to the source-language texts of the development datasets to remove non-subject position instances of English "it" and the German pronouns "sie" and "es".</p>
<H4>Test data</H4>
<p>The test data files have been sentence aligned automatically, and the alignments have been checked manually. The same automatic filtering of non-subject position pronouns that was applied to the source-language texts of the development datasets, has also been applied to the source-language texts of the test dataset. In addition to this, manual checks have been made to ensure that no non-subject position pronouns remain following the automatic filtering. We had originally proposed manually checking those word-alignments that affected pronouns, however, this will not ne carried out. After considerable discussion we believe that this task is extremely difficult due to the presence of many borderline cases. We therefor believe that it would be difficult to apply manual corrections or exclusions of incorrectly aligned pronoun instances in a consistent manner.</p>
<H3>TRAINING DATA</H3>
<p>The training and development datasets can be downloaded from the following locations:</p>
<p><a href="http://data.statmt.org/wmt16/pronoun-task/">http://data.statmt.org/wmt16/pronoun-task/</a></p>
<p>Download alternative 1: <a href="http://stp.lingfil.uu.se/~joerg/DiscoMT2016/">http://stp.lingfil.uu.se/~joerg/DiscoMT2016/</a></p>
<p>Download alternative 2: <a href="http://opus.lingfil.uu.se/DiscoMT2016/">http://opus.lingfil.uu.se/DiscoMT2016/</a></p>
<p>The download folder contains many files. See the list below:</p>
<p><i>Classification data files:</i> *.data.gz</p>
<p><i>Filtered classification data files:</i> *.filtered.data.gz</p>
<p><i>List of prediction classes and their frequencies:</i> *.classes</p>
<p><i>Document ids (the document to which each sentence belongs):</i> *.doc-ids.gz</p>
<p>Please note that:</p>
<ul>
<li>An alignment issue was reported for the filtered data. This has now been resolved. If you downloaded the training data prior to 17:00 CET on 25th February, please re-download the latest filtered data files.</li>
<li>For the French-to-English task, filtered training data is not provided. This is because the French pronouns selected for the task ("elle", "elles", "il", and "ils") may only be used as subject-position pronouns.</li>
<li>A single doc-ids file exists for each language pair. The same file should be used for both translation directions. For example, the file TEDdev.de-en.doc-ids.gz should be used for both the German-to-English and English-to-German sub-tasks.</li>
</ul>
<H3>TOOLS</H3>
<p>Participants are encouraged to use any type of information that can be extracted from the source and target-language text. You may use any tool you like, but committee members have found the following ones useful:</p>
<ul>
<li><a href="http://nlp.stanford.edu/software/lex-parser.shtml">Stanford CoreNLP</a> (combines various NLP modules, including English coreference resolution) <b>[English; some support for parsing French, German]</b></li>
<li><a href="https://code.google.com/p/mate-tools/wiki/ParserAndModels">MATE parser</a> (with joint tagging+parsing model) <b>[English, French and German]</b></li>
<li><a href="https://github.com/rsennrich/parzu">Zurich Dependency Parser</a> <b>[German]</b></li>
<li><a href="http://alpage.inria.fr/statgram/frdep/fr_stat_dep_parsing.html">The BONSAI suite of models</a> (uses an external lexicon to improve parsing) <b>[French]</b></li>
<li><a href="https://code.google.com/p/nada-nonref-pronoun-detector/">NADA</a> (Non-anaphoric "it" detector) <b>[English]</b></li>
</ul>
<p>Adventurous people could also try using Sebastian Martschat's <a href="https://github.com/smartschat/cort">CORT</a>, which has an option to perform coreference resolution on unannotated text (using the Stanford tools for preprocessing). <b>[Disclaimer: the organisers have not used CORT themselves for that purpose]</b></p>
<p>In addition to the tools listed above, classification baselines are provided for each sub-task. See section below.</p>
<H3>CLASSIFICATION BASELINE</H3>
<p>We provide baseline models for each sub-task. Each baseline looks only at the language model scores from the relevant language model.</p>
<p>To use the baselines you will need to:
<ul>
<li>Download the <b>discomt_baseline.py</b> python script and <b>YAML</b> files from <a href="https://bitbucket.org/yannick/discomt_baseline/src">here</a></li>
<li>Download the relevant KenLM binary format file (<b>mono+para.5.xx.lemma.trie.kenlm</b>) from the <a href="http://data.statmt.org/wmt16/pronoun-task/">training data location</a></li>
<li>Install <a href="https://github.com/kpu/kenlm">KenLM</a></li>
<li>Install the KenLM python module</li>
</ul>
<p>Installing the KenLM python module: If you have pip installed, run <b>pip install https://github.com/kpu/kenlm/archive/master.zip</b>. Alternatively, after downloading KenLM, run <b>python setup.py install</b></p>
<p>You can get predictions for the baseline model by running, e.g.:</p>
<p><b>python discomt_baseline.py --fmt=replace --removepos --conf en-fr.yml TEDdev.data.en-fr.gz</b></p>
<p>These are in the format that the scorer requires, with predictions in the first column, the word it predicted in the second column (which is <b>always</b> ignored by the scorer, so don't worry if your system doesn't predict words), etc.</p>
<p>If you're interested in just using the marginal probabilities for each filler from the language model, you can also use:</p>
<p><b>python discomt_baseline --fmt=scores --removepos --conf en-fr.yml TEDdev.data.en-fr.gz</b></p>
<p>which will give you, for each input line, one with TEXT in the second column giving you the source/target text, and zero or more lines with ITEM 0, ITEM 1 etc. giving you a (partial) probability distribution over the fillers for each "REPLACE" position.</p>
<p>Other flags:
<ul>
<li>--lm LANGUAGE_MODEL: use another language model (otherwise it assumes the default name and the current directory)</li>
<li>--null-penalty PENALTY: use this penalty for predicting no filler at all (which counts as OTHER)</li>
</ul>
</p>
<p>Sample results with <b>default options on *filtered* en-fr TEDdev</b> (same data as tst2010):<br>
<pre>
ce : P = 129/ 182 = 70.88% R = 129/ 151 = 85.43% F1 = 77.48%
elle : P = 6/ 27 = 22.22% R = 6/ 25 = 24.00% F1 = 23.08%
elles : P = 0/ 0 = 0.00% R = 0/ 15 = 0.00% F1 = 0.00%
il : P = 38/ 138 = 27.54% R = 38/ 57 = 66.67% F1 = 38.97%
ils : P = 0/ 0 = 0.00% R = 0/ 140 = 0.00% F1 = 0.00%
cela : P = 28/ 40 = 70.00% R = 28/ 63 = 44.44% F1 = 54.37%
on : P = 3/ 37 = 8.11% R = 3/ 10 = 30.00% F1 = 12.77%
OTHER : P = 76/ 139 = 54.68% R = 76/ 102 = 74.51% F1 = 63.07%
</pre>
or a macro-averaged R of <b>40.63%</b>
</p>
<H3>DISCOMT 2015 RESOURCES</H3>
<p>You may also find the resources from the DiscoMT 2015 shared task on English-to-French cross-lingual pronoun prediction useful. These include annotations over the English source-language side of the test set, as well as raw training and development data. The data from the DiscoMT 2015 shared task can be downloaded from <a href="http://hdl.handle.net/11372/LRT-1611">LINDAT</a>. Please note that there are differences between the DiscoMT 2015 shared task and this year's task at WMT, namely the introduction of lemmatised + POS-tagged target-language data for this year's task.</p>
<H3>EVALUATION</H3>
<p>The predicted pronoun class labels will be <b>automatically</b> evaluated against the gold standard translations from the test set (see the example for the classification baseline below). The current version of the scorer is available here: <a href="http://data.statmt.org/wmt16/pronoun-task/WMT16_CLPP_scorer.pl">WMT16_CLPP_scorer.pl</a></p>
<p>The <b>WMT16_CLPP_scorer.pl</b> script contains instructions detailing how it should be used.</p>
<p>The script computes macro-averaged R scores. This is in contrast to the evaluation script for the pronoun prediction task at DiscoMT 2015, which computed macro-averaged F1. The justification for computing macro-averaged R, is that unlike with macro-averaged F1, each error is counted only once. With macro-averaged F1, the same error will be counted twice: once for precision and once again for recall, given that it averages over all prediction classes.</p>
<H3>SUBMISSION INSTRUCTIONS</H3>
<p>We will provide the input data in the same format as the training data, but with the first two columns empty. Your predictions should be submitted in the format recognised by the official scorer, see above for details. Please e-mail the file with the predictions, labelled with the name of your system, to <b>liane [at] cis.uni-muenchen.de</b> no later than <b>22nd April 2016</b> (any time zone). The following <a href="http://www.timeanddate.com/time/zones/aoe">website</a> provides the time <b>Anywhere on Earth (AOE)</b> which you may find useful.</p>
<p>Please note that each participant may submit up to two systems, per task. If you submit more than one system for a given task, please indicate which system is the <b>primary system</b></p>
<H3>TEST DATA</H3>
<p>The test data is now available for download from the following locations:</p>
<p><a href="http://data.statmt.org/wmt16/pronoun-task/">http://data.statmt.org/wmt16/pronoun-task/</a></p>
<p>Download alternative 1: <a href="http://stp.lingfil.uu.se/~joerg/DiscoMT2016/">http://stp.lingfil.uu.se/~joerg/DiscoMT2016/</a></p>
<p>Download alternative 2: <a href="http://opus.lingfil.uu.se/DiscoMT2016/">http://opus.lingfil.uu.se/DiscoMT2016/</a></p>
<H3>EVALUATION RESULTS</H3>
<p>We are pleased to announce the results of the shared task. Please click on the following links for a <a href="http://data.statmt.org/wmt16/pronoun-task/results/WMT2016_pronoun_task.pdf">PDF containing an overview of the results</a> and <a href="http://data.statmt.org/wmt16/pronoun-task/results/WMT2016-submissions-scored.zip">an archive containing the individual submissions and their scores</a>.</p>
<p>The archive contains a subfolder labelled "_gold" which contains a "solution" file for each language pair. These files may be useful in analysing the performance of your systems.</p>
<H3>SYSTEM DESCRIPTION PAPERS</H3>
<p>We would like to invite all groups who participated in the WMT2016 task on cross-lingual pronoun prediction to submit a system description paper detailing the design of their systems. Per the instructions on the WMT website, system description papers should be 4-6 pages in length and should conform to the ACL official style guidelines. These ACL guidelines are contained in the style files which can be downloaded from:
http://acl2016.org/files/acl2016.zip</p>
<p>System description papers are subject to review and may be rejected if the quality of the description is insufficient. However, the scores or ranking achieved in the shared task evaluation have no bearing on the acceptance decision. We strongly recommend that you write a system description paper and present your system(s) at the workshop no matter how successful your approach was in the evaluation.</p>
<p>At minimum, a system paper should:</p>
<ul>
<li>Contain a detailed description of the design of the system(s)</li>
<li>Describe the motivation behind the design of each system</li>
<li>Contain sufficient detail to make the results reproducible</li>
<li>Describe data sets and tools not included in the official data release</li>
</ul>
<p>In addition, we strongly recommend that the system paper contains:</p>
<ul>
<li>Any additional contrastive results that you feel are appropriate</li>
<li>A critical discussion of the system performance</li>
<li>Error analyses detailing where the system provided the results that you hoped for, where it didn't, and possible reasons why. This could be provided for the test set, or the development set.</li>
</ul>
<p>System papers do not need to provide a detailed description of the task itself or the data sets provided by the organisers. Instead, they may refer to the shared task overview paper, the bibliographic details of which will be announced prior to the camera-ready deadline. Unlike regular long and short papers, system description papers need not be anonymised.</p>
<p>We aim to release the results of the evaluation prior to the paper submission deadline on 15th May.</p>
<p>System description papers should be submitted electronically via the WMT START system: https://www.softconf.com/acl2016/WMT16/</p>
</p>
<H3>IMPORTANT DATES</H3>
<table>
<tr><td>Release of training data</td><td>2nd February 2016</td></tr>
<tr><td>Test data released</td><td>6th April 2016</td></tr>
<tr><td>System submission deadline</td><td>22nd April 2016</td></tr>
<tr><td>Paper submission deadline</td><td>15th May 2016</td></tr>
<tr><td>Notification of acceptance </td><td>5th June 2016</td></tr>
<tr><td>Camera-ready deadline</td><td>22nd June 2016</td></tr>
</table>
<H3>ORGANISERS</H3>
Liane Guillou (University of Edinburgh)
<br>
Christian Hardmeier (Uppsala University)
<br>
Preslav Nakov (Qatar Computing Research Institute)
<br>
Andrei Popescu-Belis (Idiap Research Institute)
<br>
Sara Stymne (Uppsala University)
<br>
Jörg Tiedemann (University of Helsinki)
<br>
Yannick Versley (University of Heidelberg)
<br>
Bonnie Webber (University of Edinburgh)
<br>
<H3>ACKNOWLEDGEMENTS</H3>
<p>The organisation of this task has received support from the following project:</p>
<ul>
<li>Discourse-Oriented Statistical Machine Translation funded by the Swedish Research Council (2012-916)</li>
</ul>
</body>
</HTML>