BERT-Based Multi-Task Learning for Offensive Language Detection

Paper accepted at the SemEval-2020 (COLING 2020):

Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection, by Wenliang Dai*, Tiezheng Yu*, Zihan Liu, Pascale Fung.

[ACL Anthology][ArXiv][Semantic Scholar]

If your work is inspired by our paper, or you use any code snippets in this repo, please cite this paper, the BibTex is shown below:

@article{Dai2020KungfupandaAS,
  title={Kungfupanda at SemEval-2020 Task 12: BERT-Based Multi-Task Learning for Offensive Language Detection},
  author={Wenliang Dai and Tiezheng Yu and Zihan Liu and Pascale Fung},
  journal={ArXiv},
  year={2020},
  volume={abs/2004.13432}
}

Abstract

Nowadays, offensive content in social media has become a serious problem. Transfer learning and multi-task learning are two major techniques that are widely employed in machine learning fields. With transfer learning, we can effectively learn a related problem from a well pre-trained model. In addition, one of the benefits of using multi-task learning is to have more supervision signals and a better generalization ability. In the task of Multilingual Offensive Language Identification in Social Media, we propose to use both techniques to make use of pre-trained feature representations and better leverage the information in the hierarchical dataset. Our contribution is two-fold. Firstly, we propose a multi-task transfer learning model for this problem and we provide an empirical analysis to explain why this method is very effective. Secondly, the model achieves a performance (91.51% F1) comparable to the first place (92.23% F1) in the competition with only the OLID dataset.

Dataset

Offensive Language Identification Dataset (OLID)

Requirements

Python 3.5 +
PyTorch 1.3 +
Huggingface transformers 2.x
We use one GTX 1080Ti to train

Command Line Arguments

usage: train.py [-h] -bs BATCH_SIZE -lr LEARNING_RATE [-wd WEIGHT_DECAY] -ep
                EPOCHS [-tr TRUNCATE] [-pa PATIENCE] [-cu CUDA] -ta TASK -mo
                MODEL [-ms MODEL_SIZE] [-cl] [-fr FREEZE]
                [-lw LOSS_WEIGHTS [LOSS_WEIGHTS ...]] [-sc] [-se SEED]
                [--ckpt CKPT] [-ad ATTENTION_DROPOUT] [-hd HIDDEN_DROPOUT]
                [-dr DROPOUT] [-nl NUM_LAYERS] [-hs HIDDEN_SIZE]
                [-hcm HIDDEN_COMBINE_METHOD]

BERT-Based Multi-Task Learning for Offensive Language Detection

optional arguments:
  -h, --help            show this help message and exit
  -bs BATCH_SIZE, --batch-size BATCH_SIZE
                        Batch size
  -lr LEARNING_RATE, --learning-rate LEARNING_RATE
                        Learning rate
  -wd WEIGHT_DECAY, --weight-decay WEIGHT_DECAY
                        Weight decay
  -ep EPOCHS, --epochs EPOCHS
                        Number of epochs
  -tr TRUNCATE, --truncate TRUNCATE
                        Truncate the sequence length to
  -pa PATIENCE, --patience PATIENCE
                        Patience to stop training
  -cu CUDA, --cuda CUDA
                        Cude device number
  -ta TASK, --task TASK
                        Which subtask to run
  -mo MODEL, --model MODEL
                        Which model to use
  -ms MODEL_SIZE, --model-size MODEL_SIZE
                        Which size of model to use
  -cl, --clip           Use clip to gradients
  -fr FREEZE, --freeze FREEZE
                        Freeze the embedding layer or not to use less GPU
                        memory
  -lw LOSS_WEIGHTS [LOSS_WEIGHTS ...], --loss-weights LOSS_WEIGHTS [LOSS_WEIGHTS ...]
                        Weights for all losses
  -sc, --scheduler      Use scheduler to optimizer
  -se SEED, --seed SEED
                        Random seed
  --ckpt CKPT
  -ad ATTENTION_DROPOUT, --attention-dropout ATTENTION_DROPOUT
                        transformer attention dropout
  -hd HIDDEN_DROPOUT, --hidden-dropout HIDDEN_DROPOUT
                        transformer hidden dropout
  -dr DROPOUT, --dropout DROPOUT
                        dropout
  -nl NUM_LAYERS, --num-layers NUM_LAYERS
                        num of layers of LSTM
  -hs HIDDEN_SIZE, --hidden-size HIDDEN_SIZE
                        hidden vector size of LSTM
  -hcm HIDDEN_COMBINE_METHOD, --hidden-combine-method HIDDEN_COMBINE_METHOD
                        how to combbine hidden vectors in LSTM

Usage Examples

Train single-task model for subtask-A

python train.py -bs=32 -lr=3e-6 -ep=20 -pa=3 --model=bert --task=a --clip --cuda=1

Train multi-task model

python train.py -bs=32 -lr=3e-6 -ep=20 -pa=3 --model=bert --task=all --clip --loss-weights 0.4 0.3 0.3 --cuda=1

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
OLIDv1.0		OLIDv1.0
img		img
models		models
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
config.py		config.py
data.py		data.py
datasets.py		datasets.py
prediction_generator.py		prediction_generator.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT-Based Multi-Task Learning for Offensive Language Detection

Abstract

Dataset

Requirements

Command Line Arguments

Usage Examples

About

Releases

Packages

Contributors 2

Languages

License

wenliangdai/multi-task-offensive-language-detection

Folders and files

Latest commit

History

Repository files navigation

BERT-Based Multi-Task Learning for Offensive Language Detection

Abstract

Dataset

Requirements

Command Line Arguments

Usage Examples

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages