This repository contains all primary submissions for the WMT24 general mt task, and the human evaluations. The submissions are in the following directories
xml
: One xml file for each language pair, containing source, reference(s) and hypothesestxt-ts
: Sources, references and hypotheses in text files, including test suitestxt
: Same, but without test suites
Tools for extracting the raw text from the XML can be found here.
The human evaluations are in the humaneval
directory.
The scripts and data are self-contained and should be run from within the humeval
repository. For example:
cd humeval
# system clusters, produces output in humeval/tables
python3 calculate_clusters.py
The complete compiled data with this script can also be found in public JSONL. Note that this format is experimental and bound to change in the future. Each line in the data looks as follows:
{
"langs": "en-es",
"line_id": 731,
"src": "Initial applications to Harvard for a psychology masters were rejected, but was eventually admitted. The initial setbacks were due to Milgramm not taking any undergraduate courses in psychology at Queens College. In 1961, Milgram received a PhD in social psychology. He became an assistant professor at Yale around the same time.",
"tgt": "Las solicitudes iniciales para una maestría en psicología en Harvard fueron rechazadas, pero finalmente fueron admitidas. Los reveses iniciales se debieron a que Milgramm no tomó ningún curso universitario de psicología en Queens College. En 1961, Milgram recibió un doctorado en psicología social. Se convirtió en profesor asistente en Yale casi al mismo tiempo.",
"doc_id": "test-en-speech_8bbVFeTIIg8_000",
"domain": "speech",
"esa_spans": [
{
"start_i": 31,
"end_i": 42,
"severity": "minor",
"error_type": null
},
# omitted for brevity
],
"esa_score": "66",
"system": "ONLINE-B",
"annotator": "engspa7832",
"speech_info": {
"file": "_8bbVFeTIIg8_000.mp4",
"youtube": "https://www.youtube.com/watch?v=8bbVFeTIIg8"
}
}
Note that until November-29-2024, the ESA data used different indexing (off-by-one) than the MQM data.
This has been fixed. The new files are humeval/esa_generalMT2024_wave*.csv
and the old data are humeval/old/wave*.csv
.