Tools for handling the xml-formatted test sets and hypothesis files used in the WMT news task
Requires python >= 3.6
pip install git+https://github.com/wmt-conference/wmt-format-tools.git
- Download the xml file containing the source (e.g.
newsdev2021.ha-en.source.xml
) - Extract text from the source
wmt-unwrap -o newsdev2021.ha-en < newsdev2021.ha-en.source.xml
- Translate text to give (eg)
newsdev2021.ha-en.hypo.en
- Wrap translation in xml, including team name
wmt-wrap -s newsdev2021.ha-en.source.xml -t newsdev2021.ha-en.hypo.en -n UEDIN -l en > newsdev2021.ha-en.hypo.en.xml
You can also use the tools via the API. See test/test-wrap-unwrap.py
for a sample
- Added an optional field
supplemental
which may contain any data collected with tests. For example,type=clean_source
for human cleaned sources.