About

Authors: Farris Atif & Zafir Momin

About

This repository contains an implementation of Spark's built in implicit ALS matrix factorization , allowing us to create an implicit reccomender system using the Million Song Dataset https://www.kaggle.com/c/msdchallenge. The 200+ gb dataset is housed on NYU's High Performance Computing Cluster (HPC) : Peele , where all computation was performed. Lastly, this was completed for credit as part of the final project for DS-GA 1004 (Big Data) @ NYU CDS.

GITHUB ORGANIZATION

The following files were run sequentially to obtain the final results from the ALS Model (ie. 500 recommendations per user)

——branch MAIN: ALS MODEL———————————————

Build_Hash.py : .py file that creates a uniform integer hash key for the train, test, and validation sets. This key is then saved locally on HDFS
Parquet_Build.py: .py file that loads in the uniform hash key from HDFS, applies it to each of the datasets, and then writes the new files back out to our local HDFS
GridSearch_All.py: .py file that performs grid search on the ALS model
GridSearchFinal: Folder that contains the results of our grid search and the corresponding Jupiter notebook
FinalModel.py: .py file that contains our final model run, with the optimal hyper parameters (running to a high level of iterations)

——branch MAIN: EXTENSION———————————————

Subsample.py: .py file that subsamples from train & test user/track/count data (.5%)
Lenskit_Extension.ipynb: Extension results

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Figures		Figures
GridSearchFinal		GridSearchFinal
.DS_Store		.DS_Store
.gitignore		.gitignore
Build_Hash.py		Build_Hash.py
FinalModel.py		FinalModel.py
FinalReport.pdf		FinalReport.pdf
GridSearch_All.py		GridSearch_All.py
Lenskit_Extension.ipynb		Lenskit_Extension.ipynb
Parquet_Build.py		Parquet_Build.py
README.md		README.md
Subsample.py		Subsample.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Authors: Farris Atif & Zafir Momin

About

GITHUB ORGANIZATION

——branch MAIN: ALS MODEL———————————————

——branch MAIN: EXTENSION———————————————

About

Releases

Packages

Contributors 3

Languages

farris/recommender-system

Folders and files

Latest commit

History

Repository files navigation

Authors: Farris Atif & Zafir Momin

About

GITHUB ORGANIZATION

——branch MAIN: ALS MODEL———————————————

——branch MAIN: EXTENSION———————————————

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages