Project realized by Eleonora Mancini and Eleonora Misino as a part of the Natural Language Processing exam of the Master's degree in Artificial Intelligence @ University of Bologna (A.A. 2019-2020).
The purpose of this project is the analysis of labeling strategies aimed at identifying depression phenomena among users’ tweets. The tweets used in this analysis refer to a specific period of the COVID19 pandemic. In particular, the objective is to try to understand if the strategies studied allow to identify evident phenomena of depression among users during the pandemic period. 4 different strategies were developed and analyzed. It was not possible to arrive at a robust solution, but this project highlights some interesting aspects that could be the starting point for a more in-depth analysis.
- COVID19 Tweets
- Exploratory Data Analysis
- Preprocessing
- Topic Modeling: Latent Dirichlet Allocation
- Tweets Labelling through 3 strategies that we call TWINT, VADER, NRCLex
- Labelling Comparison
- Unsupervised Analysis (LSA and Clustering)
- CLPSych Dataset
- Exploratory Data Analysis
- Features Extraction
- Tweets Classification
Please, refer to the notebooks folder for a more detailed description.
To reproduce our results:
- Download the data (please, note that the CLPsych Dataset is not publicly available)
- Download the notebooks from here
- Run first the NLP_Project.ipynb notebook and then the CLPsych.ipynb notebook
A detailed analysis of the results can be found here.
Eleonora Mancini, Eleonora Misino
This project is licensed under the MIT License - see the LICENSE file for details.