Skip to content

Latest commit

 

History

History
55 lines (35 loc) · 1.4 KB

README.md

File metadata and controls

55 lines (35 loc) · 1.4 KB

Neural Machine Translation French to English

Code for home project of IN9550 UiO.

Training

Tokenizer

Run the following command to train a tokenizer.

python -m nmt.tokenizer --vocab_size <vocabulary size>

Seq2Seq Models

The training configuration is defined in a YAML file and can be overrided using command line arguments. Each time the command is run the configuration is saved under data/runs/{model_name}.meta.yaml

Transformer-based

python -m nmt.main -c config/transformer_default_config.yaml

RNN-based

python -m nmt.main -c config/rnn_default_config.yaml

Evaluation

The configuration file used to run evaluation is a YAML file with the following entries. An example is available here.

  • tokenizer: Path to the tokenizer
  • test_data: Path to test dataset
  • model: Path to Transformer-based Seq2Seq model
  • output_file: Path to save predictions
  • debug: Whether or not to run in debug mode

Run the following command to evaluate your Transformer-based model:

python -m nmt.evaluation.transformer_evaluation -c <config_file>

Run the following command to evaluate your RNN-based model:

python -m nmt.evaluation.rnn_evaluation -c <config_file>

Note

The code in this repository is mainly based on Pytorch tutorials.