A spark based twitter streaming application that was used to process valentines tweets from all over India during the Valentines week of 2016 for analyzing the cities and states of India in love!
This app live streams tweets from twitter and filters tweets from India, then determines the city from each tweet and accumulates city names and corresponding tweet counts for all the cities found. These tuples of city names and corresponding counts are stored in a text file for further analysis.
LoveIsInTheAir-Stats-Plotter
is used to map each city to a state in India and plot a pie chart distribution of tweets for all Indian states.
- 27.20 Million tweets from India processed during the valentines week of 2016.
- State wise distribution of tweets:
- Upon normalization of tweets by area of state, Delhi scores 89.704192
- Only 0.0418588% of tweets from India were with a location embedded
-
Install
sbt
-
Clone LoveIsInTheAir:
git clone https://github.com/sahilsareen/LoveIsInTheAir.git
-
Create a new twitter app and generate an access token.
-
Run
cd LoveIsInTheAir && sbt package run <consumer key> <consumer secret> <access token> <access token secret> [<twitter love filters>]
-
After collecting sufficient data, use LoveIsInTheAir-Stats-Plotter to visualize results.
- Generate a pull request, OR
- Email patches to
sahil [DOT] sareen [AT] hotmail [DOT] com
- Stick to the scala style guide
See License
- Sahil Sareen (sahil [DOT] sareen [AT] hotmail [DOT] com)