RedditQuarantineNLP

Investigating the effects of Reddit quarantine on user content and sentiment

The goal of this project is to use NLP techniques to understand the impact of a Reddit “quarantine”. The project will focus on the language used within the quarantined subreddits before and after quarantine took place.

Data

Our dataset was obtained using Google BigQuery. The dataset contains 10,015,586 records, where each record is a comment (which we refer to as “posts”) posted in r/The_Donald (TD) in 2019.

(see: https://www.reddit.com/r/bigquery/comments/3cej2b/17_billion_reddit_comments_loaded_on_bigquery/)

Analysis

We applied machine learning models to the text in order to produce meaningful summaries about the content of posts in TD and to determine whether there are any observable changes in the data around the time of quarantine. This is primarily an unsupervised learning problem since we do not have labels that are helpful in answering this question.

We use pretrained sentiment analysis models to examine the polarity of posts (positive, negative, or neutral). We then use topic modeling to examine themes within the posts. The output of the sentiment and topic analysis become input to our clustering approach, which attempts to find similar groups of posts along a variety of interesting features.

Language Requirements: Python 3.7

Required Libraries: See requirements.txt

To create our environment:

create a venv (must have venv installed): PYTHON -m venv env
activate env: source env/bin/activate
install packages from requirements.txt: pip install -r requirements.txt

Processed Data

Data for our project can be found on our university box account

Analysis Notebooks

To run our various models, see: core/analysis

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
core		core
.gitignore		.gitignore
README.md		README.md
readme.png		readme.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RedditQuarantineNLP

Data

Analysis

About

Releases

Packages

Contributors 3

Languages

npg-mmilosh-cji/RedditQuarantineNLP

Folders and files

Latest commit

History

Repository files navigation

RedditQuarantineNLP

Data

Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages