- The dataset can be found in the
jsonl
format in thedataset
folder. - It contains three columns:
id
,label
, andsource
id
indicates the unique comment id that can be used to get the text through the Reddit/YouToube API.label
indicates the final label assigned to each of the textsource
indicates whether the comment/text is from Reddit or YouTube- In case the full dataset with the text is required please email the corresponding author of the paper at [email protected]
- The code and configuration parameters for training the large language models
LLAMA2
andMistralAI
are in thellm.py
file under thetraining
folder. - The usage and experimentations in their raw format are in the
notebooks
folder.
- If you use the dataset or code please cite the given paper:
@inproceedings{chowdhury2024infrastructure,
title={Infrastructure Ombudsman: Mining Future Failure Concerns from Structural Disaster Response},
author={Chowdhury, Md Towhidul Absar and Datta, Soumyajit and Sharma, Naveen and KhudaBukhsh, Ashiqur R.},
booktitle={Proceedings of the ACM Web Conference 2024},
year={2024}
}