Code for home project of IN9550 UiO.
Run the following command to train a tokenizer.
python -m nmt.tokenizer --vocab_size <vocabulary size>
The training configuration is defined in a YAML file and can be overrided using command line arguments. Each time the command is run the configuration is saved under data/runs/{model_name}.meta.yaml
python -m nmt.main -c config/transformer_default_config.yaml
python -m nmt.main -c config/rnn_default_config.yaml
The configuration file used to run evaluation is a YAML file with the following entries. An example is available here.
- tokenizer: Path to the tokenizer
- test_data: Path to test dataset
- model: Path to Transformer-based Seq2Seq model
- output_file: Path to save predictions
- debug: Whether or not to run in debug mode
Run the following command to evaluate your Transformer-based model:
python -m nmt.evaluation.transformer_evaluation -c <config_file>
Run the following command to evaluate your RNN-based model:
python -m nmt.evaluation.rnn_evaluation -c <config_file>
The code in this repository is mainly based on Pytorch tutorials.