Skip to content

Big Data project: Designed and answered good data science questions for the Yelp unstructured big dataset.

Notifications You must be signed in to change notification settings

a-khelifi/Yelp-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

Yelp-dataset

Big Data project: Designed and answered good data science questions for the yelp unstructured big dataset.

• Scope: Designed and answered good data science questions for the Yelp unstructured big dataset (~8 GB).
• The packages used: Apache Spark/Hadoop (PySpark).
• Environment: Linux Bash.
• Challenge: accessing JSON-objects due to nesting.
• Solution: dictionaries made indexing simple.
• Results: gained useful insights about reviews/businesses/users, e.g.: number of reviews is weakly correlated with the number of fans.

About

Big Data project: Designed and answered good data science questions for the Yelp unstructured big dataset.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published