MaLi In-class Machine Translation Shared Task 2017 Submission

This repository is a result of our participation in the shared task. We went through the process of building, analyzing, and improving the neural machine translation system.

Poster: link.

The shared task was for Estonian-English language pair. It included working with ~19.000.000 sentence pairs.

Shared task main page: link Shared task on course page: link

Sections below summarize key milestones we went through.

Baseline system

Our baseline system was the default OpenNMT-py model, which consists of a 2-layer LSTM with 500 hidden units on both the encoder/decoder.
As a result, we got 23.40 BLEU points on the shared dev set.

More details: Report 1

Baseline system manual evaluation

We manually analyzed 40 baseline translations.
Our main observation was that there were several errors regarding sense of words, missing parts and incorrect forms.
Take a look at our examples produced by baseline system:

The first example:

Source: Ungaris leiti, et peaaegu 96% lampidest on ohtlikud.

Human: In Hungary, nearly 96% of the lights were found to be hazardous.

Baseline: In Hungary, almost 96% of sheep were found to be dangerous.

Res: As we can see, we got sheep instead of lights.

The second example:

Source: 4.otsustamine, millal midagi vaadata ja mida vaadata.

Human: 4.deciding when to see something, and what to see.

Baseline: Four.

Res: All words are missed after the dot mark.

More details: Report 2

Final systems

In order to address translation issues found after our manual evaluation we used sockeye implementation of Transformer model to reduce sense errors, different sizes of beam search to find more appropriate words, and replaced dot marks by special symbol to keep words after dot in the middle of sentence.
At the end we have two models: with BPE splitting and SentencePieces splitting. The model with BPE shows better BLEU than the model with sentencepieces. Perhaps this is due to the number of splitting (70000 for BPE and 50000 for SentencePieces).
The best trained system gave us 26.88 BLEU points on the shared dev set that means significant increase over the baseline.

More details: Report 3, Report 4

Final system manual evaluation

Generally speaking, the system seems to got better. Looking at different sentences, fluency and sense has been improved and some missing words has been replaced.
Lets now look at how our examples looks with the final systems:

The first example:

Source: Ungaris leiti, et peaaegu 96% lampidest on ohtlikud.

Human: In Hungary, nearly 96% of the lights were found to be hazardous.

Baseline: In Hungary, almost 96% of sheep were found to be dangerous.

Model with SentencePiece: In Hungary, almost 96% of the lamps were found to be dangerous.

Model with BPE: In Hungary, almost 96% of sheep were found to be dangerous.

The second example:

Source: 4.otsustamine, millal midagi vaadata ja mida vaadata.

Human: 4.deciding when to see something, and what to see.

Baseline: Four.

Model with SentencePiece: 4. decide when to look at something and watch what.

Model with BPE: Deciding when to look at something and look at.
As a result you can see that the final system with sentencepieces better understood the sense of words; both systems translated the whole sentence not only the part of it.

Do not forget to check our poster: poster

What we also tried or wanted to try

We also wanted to try back translation to improve quality and fluency of the translation. This was not possible because of schedule reasons. Also, we wanted to run model with larger number of wordsegments, but we did not have powerful enough servers (our machines) and it was the huge line to HPC machines.

Final words:

For the shared task we used the model with BPE and got BLEU fill in.
New translations look better and more interpretable than the translation with the baseline model.
The main difficulties: a huge queue to HPC machines, sometimes human translation was not correct (see example in Report 2)
We have built our models by using sockeye and OpenNMT-py, we used with BPE and sentencepiece splitting, we used Moses scripts, so we learned a lot of new things.

Team members:

Project board: Project board A

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
OpenNMT-py		OpenNMT-py
data		data
hyps		hyps
models		models
reports		reports
scripts		scripts
test-final		test-final
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaLi In-class Machine Translation Shared Task 2017 Submission

Baseline system

Baseline system manual evaluation

Final systems

Final system manual evaluation

What we also tried or wanted to try

Final words:

Team members:

About

Releases

Packages

Contributors 3

Languages

mt2017-tartu-shared-task/nmt-system-A

Folders and files

Latest commit

History

Repository files navigation

MaLi In-class Machine Translation Shared Task 2017 Submission

Baseline system

Baseline system manual evaluation

Final systems

Final system manual evaluation

What we also tried or wanted to try

Final words:

Team members:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages