Bytenet with masking

A Machine Translation Tensorflow Implementation Paper: Neural Machine Translation in Linear Time

Notes

Few model structures are different from the paper
- I used the IWSLT 2016 de-en dataset and the code to process the dataset has been changed slightly from the original code of Kyubyung
- I didn't implement 'Dynamic Unfolding'
- I apply the masking for all residual blocks to eliminate the influence of pad embedding
- I apply dropout just before the summation of residual block output.

Requirements

Tensorflow >= 1.0.0
Numpy >= 1.11.1
nltk > 3.2.2

Steps

Download IWSLT 2016 German–English parallel corpus and extract it to data/ folder.
Run train.py with specific hyper parameters.
Run translate.py with same hyper parameters as above.

Results

I got the Bleu Score 8.44 after 20 epochs. However, I got the Bleu score 44.69 by in-sampled data with embedding size 512, and I think it means that the model was trained well but overfitted. Therefore I suggest that you should try to run this model with larger dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bytenet with masking

Notes

Requirements

Steps

Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bytenet with masking

Notes

Requirements

Steps

Results