Neural Machine Translation French to English

Code for home project of IN9550 UiO.

Training

Tokenizer

Run the following command to train a tokenizer.

python -m nmt.tokenizer --vocab_size <vocabulary size>

Seq2Seq Models

The training configuration is defined in a YAML file and can be overrided using command line arguments. Each time the command is run the configuration is saved under data/runs/{model_name}.meta.yaml

Transformer-based

python -m nmt.main -c config/transformer_default_config.yaml

RNN-based

python -m nmt.main -c config/rnn_default_config.yaml

Evaluation

The configuration file used to run evaluation is a YAML file with the following entries. An example is available here.

tokenizer: Path to the tokenizer
test_data: Path to test dataset
model: Path to Transformer-based Seq2Seq model
output_file: Path to save predictions
debug: Whether or not to run in debug mode

Run the following command to evaluate your Transformer-based model:

python -m nmt.evaluation.transformer_evaluation -c <config_file>

Run the following command to evaluate your RNN-based model:

python -m nmt.evaluation.rnn_evaluation -c <config_file>

Note

The code in this repository is mainly based on Pytorch tutorials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Neural Machine Translation French to English

Training

Tokenizer

Seq2Seq Models

Transformer-based

RNN-based

Evaluation

Note

Files

README.md

Latest commit

History

README.md

File metadata and controls

Neural Machine Translation French to English

Training

Tokenizer

Seq2Seq Models

Transformer-based

RNN-based

Evaluation

Note