Build Language Model and test it on GLUE dataset.
We recommend creating a new conda
environment before running this exepriment.
pip install -r requirements.txt
wandb login
source .env
- Copy
.env.template
to.env
and edit it if necessary. - We use
wandb
to track our running experiment. This program is still runnable without it. - At the time we run this experiement, we use
python=3.8
We provide several script options to perform LM Pretraining. All script would need to receive config file in .yaml
format. Please check the example file accordingly.
Run pretrain LM from scratch given a dataset.
python -m newlm run_pretrain --config_file="examples/configs_gcloud/run-100-percent.bert-causal.yaml"
Run pretrain LM with prebuild tokenizer.
python -m newlm run_pretrain_model --config_file="examples/configs_gcloud/run-100-percent.bert-causal-r2l.yaml"
Run finetuning on GLUE dataset.
python -m newlm run_glue --config_file="examples/configs_gcloud/run-ft.bert.yaml"
Run pretrain LM, follow by finetuning with GLUE dataset.
python -m newlm run_all --config_file="examples/configs/run.yaml"
python -m newlm run_glue_predict --config_file="examples/configs/run-predict-ensemble.yaml"
python -m newlm run_predict_ensemble --config_file="examples/configs/run-predict-ensemble.yaml"
python -m newlm run_ensemble --config_file="examples/configs/run-predict-ensemble.yaml" --l2r_r2l_ratio=[1,1]
We use huggingface's transformers library as our base library, so most of the config would follow it. But, we do have some additional config to ease our training process.
Here are some further details and explanation for the config file.
Batch Size (Training)
lm.hf_trainer.total_batch_size
To run with desired total batch. We would automatically calculate the num_device * accum_step * batch_per_device
to match the total_batch_size
Resume Training
lm.model.create_params.train_params.resume_from_checkpoint: latest
We prevent a pretrain model to be saved in a non-empty directory. To resume training from the latest checkpoint, set the following config.
We also provide several script for performing grid search. Please adjust accordingly.
Grid search finetune GLUE
./run_grid-bert-all.sh
Grid search ratio of BERT (L2R:R2L)
./run_grid_ensemble.sh
TBD
TBD
TBD
- Run prediction on test dataset
- Prepare and create file to be submit to gluebenchmark
Steps
- Prepare the dir that contains vocab.txt file.
- Can use the one from pretrained model
- or use one best_model's checkpoint
- Prepare your best model & Put it under 1 directory. Expected structure:
best_model_dir
` cola
`checkpoint-123
` mrpc
`checkpoint-456
`checkpoint-789
etc
- Open script
run_predict_glue_test
and modify necessary fields - Run script and wait until finish
- It would generate
submission.zip
under output_dir/glue-predict - Submit the file to gluebenchmark web
- Run Grid Ensemble from pre-predict files
- Can be used for both dev/test set
Steps
- Create folder with following format:
ensemble_dir
` model_type (ex. bert-causal)
` 0
` glue-predict
` cola
` mnli
` mrpc
` ...
` 1
` glue-predict
` model_type-2 (ex. bert-causal-r2l)
` 0
` 1
# This example would ensemble 4 models
- Open
examples/configs/run-predict-ensemble-v2.yaml
. Modifyoutput_dir: ensemble_dir
ONLY! - Run
./run_grid_ensemble_v2.sh 100 ensemble_dir
For Glue Submission (for test set only)
- Open
script_glue_submission_ens.py
- Modify the necessary field
- Run:
python script_glue_submission_ens.py
- Run prediction on dev dataset
- Run ensemble with multiple weight ratio
- Summarize the ensemble
Steps A: Run Prediction to Dev Set
For all the models that you want to ensemble
- Prepare the dir that contains vocab.txt file.
- Can use the one from pretrained model
- or use one best_model's checkpoint
- Prepare your best model & Put it under 1 directory. Expected structure:
best_model_dir
` cola
`checkpoint-123
` mrpc
`checkpoint-456
`checkpoint-789
etc
- Open script
run_predict_glue_dev.sh
and modify necessary fields - Run script and wait until finish
Steps B: Run Ensemble for a combination you want to test
- Open
run_grid_ensemble_from_prediction.sh
- Modify necessary fields
- Run
./run_grid_ensemble_from_prediction.sh
- Open
example/configs_gcloud/run-ft.elmo-bert-causal-l2r-r2l-v1.yaml
- Modify
- output_dir
- tokenizer.pretrained
- lm.pretrained_l2r
- lm.pretrained_r2l
If you want to run grid finetune
- Open
./run-grid-elmo-v1.sh
- Modify this line:
python summarize_tuning.py outputs/en.100-percent.elmo-bert-causal-v1-finetune
- Change
outputs/en.100-percent.elmo-bert-causal-v1-finetune
to your output dir
- Modify this line:
- , check
./run-grid-elmo-v1-all.sh
If you want to run just some task / some hyperparams
- Run:
CUDA_VISIBLE_DEVICES=_gpu_id_ python -m newlm run_glue \ --config_file="examples/configs_gcloud/run-ft.elmo-bert-causal-l2r-r2l-v1.yaml" \ --bs=_batch_size_ \ --lr=_learning_rate \ --seed=_seed_ \ --tasks=[_task_name_]
- After finish, can run this for summary:
python summarize_tuning.py _output_dir_ _task_name_
For v4
, change all -v1 to -v4