A Cross-Lingual Study of Homotransphobia on Twitter

Davide Locatelli · Greta Damo · Debora Nozza

This repository contains data and code used in the paper A Crosslingual Analysis of Homotransphobia on Twitter.

Data

In accordance with Twitter's policy, we have provided the tweet IDs for analysis. There are seven files, each containing tweet IDs for tweets in one of the seven languages: English, Italian, German, French, Spanish, Portuguese, and Norwegian.

Code

The code consists of three files:

data.py - to process the data
topics.py - to run the contextualized topic modeling analysis
sentiment.py - to run the sentiment analysis

Instructions

To reproduce our study:

Retrieve the tweets. To do this, you will need Twitter API keys. Once you have those, you can use the twarc library as follows:

twarc hydrate data/LANG.txt > LANG.jsonl

Preprocess the data:

python data.py -l LANG

Run topic modeling analysis:

python topics.py -l LANG

Run sentiment analysis:

python sentiment.py -l LANG

Where LANG is an ISO 639-1 language code. For example, for Norwegian it's NO.

Pre-trained models

The following pre-trained models are used for the analysis:

CTM: distiluse-base-multilingual-cased-v1, distiluse-base-multilingual-cased-v2
Sentiment analysis: twitter-xlm-roberta-base-sentiment

Results

The results of the analysis will be stored in the results folder. There will be three files per language:

LANG_topics.txt - contains the results of the topic modeling analysis with the top words for 5, 10, 15, 20 topics
LANG_topics.csv - contains the results of the topic modeling analysis with each tweet assigned to a topic
LANG_sentiment.csv - contains the results of the sentiment analysis with each tweet assigned to a sentiment class

Reference

If you use the data or code please cite the following paper:

@inproceedings{locatelli-etal-2023-cross,
title = "A Cross-Lingual Study of Homotransphobia on {T}witter",
author = "Locatelli, Davide  and
  Damo, Greta  and
  Nozza, Debora",
booktitle = "Proceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP)",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.c3nlp-1.3",
pages = "16--24",
abstract = "We present a cross-lingual study of homotransphobia on Twitter, examining the prevalence and forms of homotransphobic content in tweets related to LGBT issues in seven languages. Our findings reveal that homotransphobia is a global problem that takes on distinct cultural expressions, influenced by factors such as misinformation, cultural prejudices, and religious beliefs. To aid the detection of hate speech, we also devise a taxonomy that classifies public discourse around LGBT issues. By contributing to the growing body of research on online hate speech, our study provides valuable insights for creating effective strategies to combat homotransphobia on social media.",

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
LICENSE		LICENSE
README.md		README.md
data.py		data.py
requirements.txt		requirements.txt
sentiment.py		sentiment.py
topics.py		topics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Cross-Lingual Study of Homotransphobia on Twitter

Data

Code

Instructions

Pre-trained models

Results

Reference

About

Releases

Packages

Contributors 2

Languages

License

MilaNLProc/crosslingual-analysis-homotransphobia

Folders and files

Latest commit

History

Repository files navigation

A Cross-Lingual Study of Homotransphobia on Twitter

Data

Code

Instructions

Pre-trained models

Results

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages