On the Role of Images for Analyzing Claims in Social Media

The repository for the source code and the dataset used in the paper

Gullal S. Cheema, Sherzod Hakimov, Eric Müller-Budack and Ralph Ewerth “On the Role of Images for Analyzing Claims in Social Media“, Proceedings of the 2nd International Workshop on Cross-lingual Event-centric Open Analytics co-located with the 30th The Web Conference (WWW 2021).

The paper is available here: http://ceur-ws.org/Vol-2829/paper3.pdf

Extended dataset can be downloaded from here: https://zenodo.org/record/4592249

Setup

For SVM Training and BERT Fine-tuning

Cuda 10.2
conda env create -f environment.yml python=3.6.12
To install ThunderSVM on linux system,
pip install wheel https://github.com/Xtra-Computing/thundersvm/releases/download/v0.3.4/thundersvm_cuda10.1-0.3.4-cp36-cp36m-linux_x86_64.whl
Alternatively, ThunderSVM release for windows can be found here.

For VilBert Extraction and Fine-tuning

Install dependencies and vilbert in a different environment by following instructions here.
Download vilbert pretrained models either from here or here in folder named vilbert-multi-task/data/pretrained/.
Download detectron model in vilbert-multi-task/data/.
wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_model.pth
wget https://dl.fbaipublicfiles.com/vilbert-multi-task/detectron_config.yaml

Data and Feature Extraction

Download data from the zenodo repository
Extract each zip file in data/
Download pretrained places and hybrid models.
Extract Textual Features,
- English : python feature_extraction/extract_bert_en.py -d clef_en -m bertbase -p clean
- Arabic : python feature_extraction/extract_bert_ar.py -m arabert
Extract Visual Features
- Visual Sentiment : python feature_extraction/extract_sent_feats.py -d clef_en -m vgg19_finetuned_all -b 32
- Visual Scene : python feature_extraction/extract_visual_feats.py -v resnet152 -t imgnet -d clef_en
For VilBert
- Extract bottom-up faster-RCNN features from images:
  - python vilbert-multi-task/script/extract_features.py --model_file vilbert-multi-task/data/detectron_model.pth --config_file vilbert-multi-task/data/detectron_config.yaml --image_dir data/lesa/images/ --output_folder data/lesa/rcnn_feats/
  - python vilbert-multi-task/script/convert_to_lmdb.py --features_dir data/lesa/rcnn_feats/ --lmdb_file data/lesa/rcnn_lmdbs/
- Some images have no detectable objects. For those images, we take random crops and extract ResNet-152 last layer features. python vilbert_code/extract_missing_feats.py --dset lesa
- Extract VilBert features: python vilbert_code/extract_features.py --model multi_task --dset lesa
Use -h to see options

SVM Training and Evaluation

Text-based,
python svm_training/svm_textfeats.py --tfeat sumavg --tmodel bertbase --ttype clean --dset lesa --split 0
Image-based,
python svm_training/svm_imgfeats.py --vfeat feats --vmodel resnet152 --vtype imgnet --dset lesa --split 0
Image and Text based,
python svm_training/svm_imgText.py --tfeat sumavg --tmodel bertbase --ttype clean --vfeat feats --vmodel resnet152 --vtype imgnet --split 0
VilBert based, python svm_training/svm_vilbertfeats.py --normalize 1 --feat pooled --model multi_task --dset lesa

BERT Fine-tuning

For ClEF_En: python finetune_bert.py --btype bertweet --dset clef_en --bs 4
For LESA: python finetune_bert.py --btype bertweet --dset lesa
For MediaEval: python finetune_bert.py --btype covid_twitter --dset mediaeval
For CLEF_Ar: python finetune_bert.py --btype arabert --dset clef_ar

VilBert Fine-tuning

Using pooled token embeddings, python finetune_vilbert.py --model image_ret --dset lesa --split 0 --un_fr 2 --pool add
Using averaged token embeddings, python finetune_vilbert2.py --model image_ret --dset lesa --split 0 --un_fr 2

If you find the shared resources useful, please cite:

@inproceedings{DBLP:conf/www/CheemaHME21,
  author    = {Gullal S. Cheema and
               Sherzod Hakimov and
               Eric M{\"{u}}ller{-}Budack and
               Ralph Ewerth},
  title     = {On the Role of Images for Analyzing Claims in Social Media},
  booktitle = {Proceedings of the 2nd International Workshop on Cross-lingual Event-centric
               Open Analytics co-located with the 30th The Web Conference {(WWW}
               2021), Ljubljana, Slovenia, April 12, 2021 (online event due to {COVID-19}
               outbreak)},
  series    = {{CEUR} Workshop Proceedings},
  volume    = {2829},
  pages     = {32--46},
  publisher = {CEUR-WS.org},
  year      = {2021},
  url       = {http://ceur-ws.org/Vol-2829/paper3.pdf}
}

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
arabert @ a98063f		arabert @ a98063f
feature_extraction		feature_extraction
svm_training		svm_training
vilbert-multi-task @ 9d64508		vilbert-multi-task @ 9d64508
vilbert_code		vilbert_code
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
finetune_bert.py		finetune_bert.py
translate_ar2en.py		translate_ar2en.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

On the Role of Images for Analyzing Claims in Social Media

Setup

For SVM Training and BERT Fine-tuning

For VilBert Extraction and Fine-tuning

Data and Feature Extraction

SVM Training and Evaluation

BERT Fine-tuning

VilBert Fine-tuning

About

Releases

Packages

Contributors 2

Languages

License

cleopatra-itn/image_text_claim_detection

Folders and files

Latest commit

History

Repository files navigation

On the Role of Images for Analyzing Claims in Social Media

Setup

For SVM Training and BERT Fine-tuning

For VilBert Extraction and Fine-tuning

Data and Feature Extraction

SVM Training and Evaluation

BERT Fine-tuning

VilBert Fine-tuning

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages