BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers
Authors:
- Martin Fajčík
- Lukáš Burget
- Pavel Smrž
In case of any questions, please mail to [email protected].
This is a official implementation we have used in the SemEval-2019 Task 7. Our publication is available here. All models have been trained on RTX 2080 Ti (with 12 GB memory).
@inproceedings{fajcik2019but,
title={BUT-FIT at SemEval-2019 Task 7: Determining the Rumour Stance with Pre-Trained Deep Bidirectional Transformers},
author={Fajcik, Martin and Smrz, Pavel and Burget, Lukas},
booktitle={Proceedings of the 13th International Workshop on Semantic Evaluation},
pages={1097--1104},
year={2019}
}
Since each trained model is saved in checkpoints of size 1.3GB, we do not provide these online. To replicate the ensemble results from paper, we provide a set of pre-calculated predictions from these trained models per validation set and per test set. The predictions on validation and test sets are saved as numpy arrays in predictions folder.
Running replicate_ensemble_results.py directly replicates ensemble results.
- Make sure the value of
"active_model"
in configurations/config.json is set to"BERT_textonly"
- Run solver.py
Note: Mind that BERT often gets stuck in local minima. In our experiments, we took only results with 55 F1 on validation data or better.
For the sake of convenience, you may want to modify last line of method create_model
found in solutionsA.py file to call
modelframework.fit_multiple
instead of modelframework.fit
to run model training multiple times.
Duration of 1 training: ~ 30 minutes
- Change value of
"active_model"
in configurations/config.json to"self_att_with_bert_tokenizer"
- Run solver.py
Duration of 1 training: ~ 2.7 minutes
tsv
file containing predictions, ground truth, confidence and model inputs of trained BiLSTM+SelfAtt
model is available HERE.
tsv
file containing predictions, ground truth, confidence and model inputs of trained TOP-N_s
ensemble (our best published result) is available HERE.
The images of multi-head attention from all heads and layers from trained BERT model for a fixed data point are available for download HERE.
xlsx
file containing attention visualisation per each input of validation set in trained BiLSTM+SelfAtt
model is available HERE.
The column description is shown in its first row.
For each example, column 'text'
contains numerical values of attention and visualisations of average over all attention "heads" and attention of each "head" (in this row order). Note, that at time attention is made, the input is already passed via 1-layer BiLSTM (see original paper for more details).
This table shows a relative F1 difference per 1 sample in case of each class misclassification (in other words increase in F1 score, if 1 more example of this class is classified correctly)
Class | F1 difference in % |
---|---|
Query | 0.219465 |
Support | 0.1746285 |
Deny | 0.2876426 |
Comment | 0.0849897 |