-
Notifications
You must be signed in to change notification settings - Fork 0
dean-krueger/2020-Election-Twitter-Sentiment
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
/////////////////////////////////////////////////////////////////// / / / 2020 Election Twitter Sentiment / / Dean Krueger / / / /////////////////////////////////////////////////////////////////// Brief Description: ------------------ This project is based off of an intro to computer science project I did while in school at the University of Colorado, and its purpose is to use the information available through the twitter API to learn about how people in a certain region feel about a set of given topics. I started this project near the end of the 2020 Presidential election, and so I chose the two candidates (Donald Trump and Joe Biden) as the subjects for my analysis, and set to work with the goal of being able to learn about how people were feeling about them leading in to the election. I planned on doing a very simple analysis, explained later in the README, to assign scores to each tweet made about either candidate, and then to figure out which state the tweet came from. Once tweets were sorted by state, the scores would be summed up and a total score assigned to each state for which there was data. Finally a map would be generated in the style of a political map, like you might see on election night, showing the overall political slant of each state based on the sum of all scores from within that state. Analysis: --------- The analysis performed was a very basic form of sentiment analysis, and involved comparing tweets made about each candidate to a list of words to which "sentiment scores" were assigned. The file I used for scoring is the same one from my computer science project when I was in school, and so its origin is sadly lost to me. However, many similar lists of words and sentiment scores are available, such as the Sentiment Lexicon from the university of Pittsburgh. Tweets from the stream were searched for a list of keywords designed to pick out tweets made about each candidate (one list for each candidate), and then each word in each relevant tweet was scored by comparing it to the sentiment list. This was done by indexing a dictionary (because of the speed of that operation), and if a word was not found, the score for the tweet overall was unchanged. Finally, because the map displays red colors on the negative scale, and blue on the positive, tweets for which the subject was Donald Trump had the sign of their score flipped so that scores favoring the then president displayed red on the map, and those favoring Joe Biden displayed blue. A consequence of this is that tweets made about the president which had a negative sentiment added to Biden's score, and vice versa (which is what we wanted). Files: ------ twitter_listener.py: uses tweepy and the twitter API to listen in on the twitter stream and write tweets from the United States to a file. The contents of the tweet, time the tweet was made, and lat/long geo data are all recorded in the json file format. Filtering of tweets by keyword cannot be done by the listener, sadly, since if a geo filter, and a keyword filter are applied they are treated as being filtered by either or, and not both. tweet_scorer.py: reads in json files filled with tweets as well as a file containing sentiment scores. Filters the tweets by keyword, then assigns a score to each tweet. Finally creates and writes to a file, storing the score of each tweet, the time it was tweeted, and the lat/long. map_maker.ipynb: jupyter notebook file which sorts the tweets by state, then makes the map using folium. sentiments.csv: the file which contains all the sentiment scores I used in this project. NOTE: Because of doxing worries, it felt irresponsible to upload the files containing the tweets and their locations. This also includes the scored tweets file, however I have uploaded the maps which resulted from them, and may in the future modify the code to mask the location of the tweets and just display from which state they were created. The tweets I gathered were made between Oct 27, 2020 and Nov 6, 2020. Future Plans: ------------- Modify code to mask the location of the tweets earlier (in the scored tweets file) so that can be posted. Find a more up to date sentiment dictionary with more certain origins Modify scoring algorithm to reflect more complex and modern sentiment analysis techniques to improve accuracy. Add features to the folium map to display each state's score, as well as the margin by which the winner of the state won in the election.
About
Scores tweets made about the 2020 presidential candidates using word sentiment analysis, then maps the data in the style of a political map.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published