Skip to content

revature-scalawags/Project2-Group4

Repository files navigation

Twitter Batch & Streaming Analysis

Project Description

Analysis of historical and live tweets and hashtags from Twitter using Apache Spark.

Technologies Used

Scala / Spark

  • Scala - version 2.12
  • Apache Spark - version 3.1.0
    • Spark SQL
    • Spark Streaming
  • twitter4j-stream - version 4.0.7
  • JonSnowLabs NLP - version 2.7.1
  • scala-csv - 1.3.6

Python

  • Python - version 3.8.3
  • snscrape - version 0.3.5

Amazon Web Services

  • S3

Features

  • Collects up to the last 10,000 tweets for a given hashtag and stores them in a .tsv file.
  • Calculates the impact of the hashtag using the following metrics:
    • Total number of users using the hashtag.
    • Total combined follower count of all users of hashtag.
    • Average follower count per user of the hashtag.
    • Median follower count per user of the hashtag.
  • Analyzes tweets streaming from Twitter to determine positive versus negative sentiments in those tweets.
  • Determines top hashtags on Twitter at any given moment from a stream of tweets.
  • Writes analysis results to a .tsv file in an Amazon S3 bucket.

TODO:

  • Add sentiment analysis to historical hashtag data.
  • Add influence analysis to streaming tweet data.
  • Set up an Amazon EMR instance for running .jar files on Spark in the cloud.

Getting Started

(include git clone command) (include all environment setup steps)

Be sure to include BOTH Windows and Unix command
Be sure to mention if the commands only work on a specific platform (eg. AWS, GCP)

  • All the code required to get started
  • Images of what it should look like

Usage

Here, you instruct other people on how to use your project after they’ve installed it. This would also be a good place to include screenshots of your project in action.

Contributors

License

This project is licensed under the Apache License, Version 2.0.

About

Jeroen Wolfe, Andrew Pepin, Rastal

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •