This repo contains the code we used to reproduce the paper Prefix-Tuning: Optimizng Continuous Prompts for Generation.
The main backbone of this code is the Prefix-Tuning repo published by the authors of the original paper. We cloned this and created our own version here.
Some changes we made included the following:
- adjust some file paths
- add processed datasets for low-data experiment settings
We recommend looking at the the following notebooks, which cover all the steps, including cloning repos, downloading dependencies, running the training scripts, and doing evaluation.
Alternatively, below are instructions to reproduce:
- Clone the following repos:
$ git clone https://github.com/sedrickkeh/PrefixTuning.git
$ git clone https://github.com/sedrickkeh/dart.git
$ git clone https://github.com/tuetschek/e2e-metrics.git
- Navigate into the
transformers
folder inPrefixTuning
and install the following dependencies:
cd PrefixTuning/transformers
pip install -e .
pip install git+https://github.com/PyTorchLightning/pytorch-lightning
pip install gitpython
pip install rouge_score
pip install sacrebleu
pip install unidecode
- Run experiment code (example below):
python train_e2e.py --preseqlen 5 --learning_rate 0.00008 --seed 88 --epoch 5
- Run evaluation code
bash ./dart/evaluation/run_eval_on_webnlg.sh
For 3. and 4. above, note that you may need to modify some of the file paths.
There are three datasets here, namely E2E, WebNLG, and DART.
1. E2E
The script below automatically does evaluation after it trains the model.
python train_e2e.py --preseqlen 5 --learning_rate 0.00007 --seed 22 --epoch 5 --notes earlystop
2. WebNLG
Training
python train_e2e.py --mode webnlg --preseqlen 5 --learning_rate 0.00005 --bsz 5 --seed 222 --epoch 5 --notes earlystop
Evaluation
bash ./dart/evaluation/run_eval_on_webnlg.sh
3. DART
Training
python train_e2e.py --mode triples --preseqlen 20 --seed 9 --bsz 5 --epoch 5 --learning_rate 0.00008
Evaluation is quite long and may require installing some libraries. Please refer to the DART notebook.
cd seq2seq
python train_bart.py --mode xsum --preseqlen 200 --do_train yes --fp16 yes --bsz 2 --epoch 15 --gradient_accumulation_step 3 --learning_rate 0.00005 --mid_dim 800
Before running these experiments, first construct the low-data datasets using this script.
Dataset size 50
python train_e2e.py --preseqlen 5 --learning_rate 8e-5 --seed 88 --bsz 10 --lowdata_token 'table-to-text-restaurant:' --epoch 100 --warmup_steps 300 --notes earlystoplowdata_88_50
Dataset size 100
python train_e2e.py --preseqlen 5 --learning_rate 7e-5 --seed 88 --bsz 10 --lowdata_token 'table-to-text-restaurant:' --epoch 100 --warmup_steps 100 --notes earlystoplowdata_88_100
Experiments were done for dataset sizes 50, 100, 200, and 500. Scripts for dataset sizes 200 and 500 are analogous to the ones above. Exact hyperparameters can be found in the appendix of our submitted report.
We conduct two ablation studies:
- Prefix Length
This builds on the experiments for DART.
python train_e2e.py --mode triples --preseqlen (prefix_length) --seed 9 --bsz 5 --epoch 5 --learning_rate 8e-5
- Prefix Initialization
This builds on the experiments for low-data E2E settings.
python train_e2e.py --preseqlen 5 --lowdata_token (insert_initialization_here) --learning_rate 7e-5 --seed 88 --bsz 10 --epoch 100 --warmup_steps 100 --notes earlystoplowdata_88_500