Skip to content

revature-scalawags/Page-Project1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Movie Answers

CLI application built to use Hive for querying a Yarn cluster running on a local container to answer 4 interesting questions about popular and not so popular movies.


Table of contents

Description

GroupLens Research has collected and made available their ratings datasets available at their MovieLens Website. The MovieLens 25M movie ratings stable benchmark dataset describes 5-star ratings and free-text tagging activity. 25,000,095 ratings and 1,093,360 tag applications are applied to 62,423 movies by 162,541 users. It includes tag genome data with 15 million relevance scores across 1,129 tags. The data was generated between January of 1995 and November of 2019. Released 11/2019.

This application uses Hive ontop of a Yarn cluster to query the MovieLens dataset and answers the following questions:

  • What are the most popular movies ever?
  • What are the 'worst' popular movies?
  • What are some good however, unpopular movies?
  • What movies correlate closely to their tag descriptions?

Screen:

company-drectory-optimize

Tech Used and Required

Usage

These datasets can be acquired from movielens.
the dataset used was their 25M Dataset. The README for this data can be viewed here.

View files needed in your hdfs:
  • ratings.csv
  • tags.csv
  • genome-scores.csv (rqd for question 4 only)
  • genome-tags.csv (rqd for question 4 only)
  • This application will look for your Hive cluster running on the default http://localhost:10000.
    No username or password is required.

    Project:

    Repo
    My Github
    Email: [email protected]

    Known Issues:

    None known at the moment.
    If any are discovered, please feel free to contact me. Cheers. 😄

    About

    Query data with Hive and Hadoop

    Resources

    License

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published

    Languages