This repository contains implementations of baseline models on the MuMiN dataset, introduced in the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset (2021).
To perform the baselines we have centralised all the training scripts into the src/train.py
script. This can be called with many different parameters, of which the mandatory ones are the following:
model_type
: This picks the type of model you want to benchmark. Can be 'claim', 'tweet', 'image' or 'graph.size
: The size of the MuMiN dataset to perform the benchmark.task
: Only relevant ifmodel_type=='graph'
, in which case it determines whether you want to benchmark the graph model on the claim classification task or the tweet classification task.
Call python src/train.py --help
for a more detailed list of all the arguments
that can be used.
The random and majority baselines are calculated based on the proportion of
misinformation
labels in the dataset. See the
random_majority_macro_f1.ipynb
notebook for details.
- MuMiN website, the central place for the MuMiN ecosystem, containing tutorials, leaderboards and links to the paper and related repositories.
- MuMiN, containing the paper in PDF and LaTeX form.
- MuMiN-build,
containing the scripts for the Python package
mumin
, used to compile the dataset and export it to various graph machine learning frameworks. - MuMiN-trawl, containing all the scripts to build MuMiN from scratch.