Update 2018.06.09: Demo website is not avaliable because Microsoft Azure does not provide free MySQL Database anymore 😞(http://purduetweets.azurewebsites.net)
I try to do some cool stuffs 😈 with Twitter data, and this is a dashboard like Twitter Analysis platform built with JavaScript and PHP, and it is sourced by Twitter data in West Lafayette, home of Purdue University, between 2014 and 2015.
There are two projects in the repository:
- Individual User Pattern Analysis
- Event Detection
This project tries to find out the pattern of the most active users in the campus, spatial, temporal and textual patterns.
Dashboard
-
Group tweets of individual user by hour, then apply DBSCAN to detect cluters.
-
Analyze the probability of the apperance of user in different clusters with similar method like Huang's Work, as well as calculate center of cluster, radius, keywords and other metadata of the clusters.
-
Construct the tweeting frency bar chart with gathering clusters in the same type, then we could know the structure. If "Frequency" dominates the bar chart, the user is likely to be a nerd who would like to stay at specific places like office or apartments (Like me 😂), while if "Rarely" dominates the bar chart, the user would like to appear at different places as a social butterfly~ (What I want to be !)
Sample Cluster
Frequency Bar Chart
This one tries to detect events in the campus. The idea for event detection is based on this definition of event:
Some people around a place in specific period talking about something realted to a topic (or topics)
Dashboard
-
Group tweets by day, generate line chart about number of tweets and users monthly
-
Different from DBSCAN for individual pattern, I apply ST-DBSCAN to do cluster the tweets every day. Then we could know its spatial and temporal pattern.
-
Count word frequency. Apply LDA to do find out potential topics in the cluster and analyze the structure of every tweets. Although some clusters only contain rambling words (even after using a list of stop-words as a filter), some important events, like Gunshot at Campus (1.21.2104), Super Bowl (2.2.2014) and Graduation Ceremony (5.16.2014 ~ 5.18.2014), are really significant in the textual information. And it is also able to detect unknown events.
Monthly Pattern
Pick a cluster
Its spatial pattern showed in heatmap
Its temporal pattern
Dynamic Map of sptial pattern in different periods (It is not playable in GitHub, highly recommend you to have a look at the DEMO !!)
Word frequncy in descending order
Sample original texts
LDA topics and structure of Tweets
Acknowledgement: Thanks CanvasJS to provide chart API.
Enjoy! 💥
PS: Fork and Star are reallllllllllly welcome ~~~