Skip to content

revature-scalawags/Project2-Group2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Twitter Account Sentiment Analysis Program

Overview

A scala-based program that reads both Twitter batch data and streaming data and runs sentiment analysis on the data.

User Story

Almost all businesses today run their social media accounts, hoping that it will bring them love and popularity among users in some way. Yet, there has been an interesting question posed as to whether businesses actually benefit from running social media accounts or do nothing but damage themselves. This program is designed to look into businesses' Twitter accounts and run sentiment analysis on each tweet and response generated by the businesses.

Technologies / Resources

  • sbt
  • Apache Spark
    • Spark SQL
    • Spark Streaming
  • Docker
  • Twitter API v2
  • Apache Parquet
  • Subjectivity Lexicon (Link)

Features

  • Read Twitter batch data of a selected business account for the past 7 days.
  • Load the batch data using Apache Spark and convert the data into Spark DataFrame.
  • Manipulate the converted DataFrame, so that only tweet texts are fed to the sentiment analysis program.
  • Run sentiment analysis on each tweet and response generated by businesses and returns one of followings responses: Positive, Negative, or Mixed.
  • Read and process live Twitter-Stream data using Spark Structured Streaming in order to find the most popular topics of discussion on Twitter at a given moment.
  • Read Twitter streaming data of a selected business account in real-time and save every 10 new lines as a csv file in Datalake1.
  • Read newly generated csv files in Datalake1 in real-time using Spark Streaming, convert them into DataFrame, extract only tweet texts from the files, and save them as a parquet file in Datalake2.
  • Load the parquet files using Apache Spark and run sentiment anlaysis on each tweet and response generated in real-time to return one of followings responses: Positive, Negative, or Mixed.

Getting Started / Usage

In order to run this program properly, you will need to do the following prerequisites:

  • Be sure to create Twitter API v2 key.
  • Be sure to download Subjectivity Lexicon from the link above and upload it to the cluster where you will run your jar files.

If all of the prerequisites above are met, go ahead and clone this repo by using the command below:

git clone https://github.com/spark131008/Twitter_Account_Sentiment_Analysis_Program.git

In order to create a jar file of each program, use the command below:

sbt package

Once all jar files are created, copy the files located within /target/scala-2.12 directory and paste them to JVM or a local cluster. If you are running your cluster in a Docker container, use the command below:

docker cp ./target/scala-2.12/<Name of the jar file>.jar spark-master:/<Name of the jar file>.jar

In order to run a jar file using Apache Spark in a Docker container, use the command below:

docker exec spark-master bash -c "./spark/bin/spark-submit --class "<Name of the class>" --master local[4] /<Name of the jar file>.jar"

If you want to run sentiment analysis on filtered Twitter streaming data, please follow the order below when spark-sbumitting jar files.

1. docker exec spark-master bash -c "./spark/bin/spark-submit --class "TwitterStreamingDataProcessing" --master local[4] /filtered_twitter_stream.jar"

2. docker exec spark-master bash -c "./spark/bin/spark-submit --class "SparkStreaming" --master local[4] /spark_streaming.jar"

3. docker exec spark-master bash -c "./spark/bin/spark-submit --class "SentimentAnalysis" --master local[4] /sentiment_analysis.jar"

Contributors

Sundoo, Chase, Trenton, Josh

Example Results

Wendys

Chick-Fil-A

Google Slides powerpoint

https://docs.google.com/presentation/d/1vG7IgBXfc0gUOD-RylH3TJpXcLts6diCY8UIR92Tm0M/edit?usp=sharing

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages