Skip to content

rohith28/MovieInsights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Enigneering Project : Movie Insights

Analytics platform for Movie Investors and Movie Lovers

Motivation :

  • The motion picture industry is growing at a rapid growth rate, likely due to the acceleration of online and mobile distribution, lower admission prices, and government policy initiatives.
  • This industry is also rich in data, thus making it extremely exciting for statisticians and Analysts.
  • Stakeholders are looking for a ‘magic formula’ to better understand and predict box office success are turning to statisticians and data scientists to help with this challenge.
  • To increase their profits, producers and directors need to understand what raises the curiosity of their target audience. This is where analytics can play an effective role.
  • Analyzing IMDB/Rotten Tomatoes, monthly collection reports of similar genre, and star cast help to take better decision where profit can be maximized.

Tools and technologies used:

  1. S3
  2. Apache Spark
  3. AWS Redshift
  4. Node JS
  5. Canvas JS
  6. Airflow

Data Extraction

  • Movies basic data like release date, Movie Name, Genre, Production Company etc. are scraped from IMDB/Rotten tomatoes.
  • I used scrapy from scraping the data and stored the data in Amazon s3.

Data Pipeline alt text

Database Schema alt text

About

Analytics platform for movies producers and movie lovers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published