Skip to content
This repository has been archived by the owner on Sep 9, 2021. It is now read-only.

Vancouver-Datajam/project_5

Repository files navigation

Project 5: Plastics Crisis Twitter Sentiment Analysis

Project description

The problem

Since the middle of the 20th century, the rapidly increasing global production of plastics (322 million metric tons per year in 2016) has been accompanied by an unprecedented accumulation of plastic litter in our oceans (Jambeck et al. 2015). In response, there has been an increase in public concern regarding the plastics crisis, along with the growth of the zero waste movement and increase in new plastic-ban policy decisions (i.e. plastic bag and straw bans). These decisions are often met with mixed public opinion. Do people have positive, neutral, or negative opinions towards different sustainability initiatives? How do we find and measure these sentiments?

Dataset

We used Twitter data to conduct a sentiment analysis regarding different sustainability topics. Using Tweepy, the Python library for accessing the Twitter API, we scraped some Tweets that contained the following hashtags:

  • #noplastic
  • #plasticpollutes
  • #plasticpollution
  • #sustainability
  • #zerowaste

The folder "hashtags" contains all the datasets for this project. Each CSV file name corresponds to the Tweets' hashtag. The CSVs with the filename "updated" contain an aditional location column. Because of the API's restrictions, we were only able to scrape week-old data (from August 30, 2020 to September 10, 2020).

Files included

Data cleaning

  • stopwords_punctuation_removal_and_wordcloud.ipynb: Removes stop words and punctation from Tweet text to prepare for sentiment analysis. Creates a wordcloud of most common words for each hashtag group.
  • extract_english.ipynb: Extracts English-only Tweets.

Data analysis

  • sentiment_ratio.ipynb: Uses Textblob to calculate the positive and negative percentage and ratio based on Tweet texts. Finds bigrams and trigrams for each hashtag group.
  • workflow.ipynb: Combines all cleaning and analysis all into one workflow. Does not include wordclouds.

Tweet bot

  • tweet_generator.ipynb: A neural network (with Gated Recurrent Units) was trained on the obtained tweets. Given the starting word, it generates ‘new’ tweets! Click here to test it out!

Libraries used

  • pandas
  • numpy
  • textblob
  • collections
  • re
  • nltk
  • sklearn
  • emot
  • wordcloud
  • contractions
  • tensorflow
  • os

Results

In each hashtag group, there are generally more positive than negative sentiments in users' Tweets. The ratio of positive to negative sentiments are very high for each group.

The following wordcloud demonstrates the most common words of all Tweets containing the #noplastic hashtag.

Team members:

  • Jeanette Andrews (Team lead)
  • Madan Krishnan (Mentor)
  • Jerry Chen
  • Willy Deng
  • Sayemin Naheen
  • Juanita Palomar
  • Rita Zeng

Datajam Schedule

Time Description
8:30am Opening Ceremony
9:30am Official hack start time! Meet together as team and get to know each other
9:45am Project brainstorming & defining tasks
10:00am Optional Git workshop
10:30am Hack & work on tasks
12:30pm Check-in #1: meet back up as a team
1:00pm Hack & work on tasks
3:30pm Check-in #2: meet back up as a team
Debugging, prioritizing remaining tasks
5:00pm Final repository merging
Prepare demo or slides
6:30pm Project deadline & final Presentation!
7:30pm Career Panel & Q&A
8:30pm Awards Ceremony & Closing

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published