Official data repository of English and Turkish misinformation detection datasets from the LREC-COLING 2024 paper "MiDe22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection".
The dataset comprises 10,348 tweets: 5,284 for English and 5,064 for Turkish. Tweets in the dataset cover different topics: the Russia-Ukraine war, the COVID-19 pandemic, Refugees, and additional miscellaneous events. Three misinformation labels of the tweet are also given. Since we follow Twitter's Terms and Conditions, we publish tweet IDs, not the tweet content directly. Explanations of the columns of the file are as follows:
Column Name | Description |
---|---|
Topic | Topic of the tweet: Ukraine, Covid, Refugees or Misc |
Event | Event of the tweet: EN01-EN40 in English and TR01-TR40 in Turkish |
Label | Label of the tweet: True, False, or Other |
Tweet_id | Twitter ID of the tweet |
The distribution of tweet counts in the dataset is as follows:
Lang | Topic | True | False | Other | Total |
---|---|---|---|---|---|
EN | Ukraine Covid Refugees Misc Total |
320 167 94 146 727 |
393 514 328 494 1,729 |
618 663 796 751 2,828 |
1,331 1,344 1,218 1,391 5,284 |
TR | Ukraine Covid Refugees Misc Total |
129 190 61 289 669 |
338 558 202 634 1,732 |
477 816 298 1,072 2,663 |
944 1,564 561 1,995 5,064 |
If you make use of the datasets and codes, please cite the following paper:
@inproceedings{toraman-etal-2024-mide22,
title = "{M}i{D}e22: An Annotated Multi-Event Tweet Dataset for Misinformation Detection",
author = "Toraman, Cagri and
Ozcelik, Oguzhan and
Sahinuc, Furkan and
Can, Fazli",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italia",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.986",
pages = "11283--11295"}