Skip to content

wmt-conference/wmt-format-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WMT Format Tools

Tools for handling the xml-formatted test sets and hypothesis files used in the WMT news task

Installation

Requires python >= 3.6

pip install git+https://github.com/wmt-conference/wmt-format-tools.git

Preparing a WMT submission

  1. Download the xml file containing the source (e.g. newsdev2021.ha-en.source.xml)
  2. Extract text from the source
  wmt-unwrap -o newsdev2021.ha-en < newsdev2021.ha-en.source.xml
  1. Translate text to give (eg) newsdev2021.ha-en.hypo.en
  2. Wrap translation in xml, including team name
  wmt-wrap -s newsdev2021.ha-en.source.xml -t newsdev2021.ha-en.hypo.en -n UEDIN -l en > newsdev2021.ha-en.hypo.en.xml

API Usage

You can also use the tools via the API. See test/test-wrap-unwrap.py for a sample

Changelog

Version 0.4

  • Added an optional field supplemental which may contain any data collected with tests. For example, type=clean_source for human cleaned sources.

About

Tools for formatting WMT hypothesis and test sets in XML

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages