A movie recommendation is important to our life because of its strength in providing enhanced entertainment. Such a system can suggest a set of movies to users based on their interest, or the popularity of the movies. In this work, we emphasize on building a recommendation system using graph based machine learning. Besides, we also analyze data from Movielens 100k to find out the hidden network structures of movies and users.
Python 3.6
for Matrix Factorization andPython 2.7
for Graph Convolutional Matrix CompletionKeras == 2.2.0
Pandas == 0.24.2
matplotlib == 3.1.2
seaborn == 0.9.0
Numpy == 1.14.0
Tensorflow == 1.4.0
h5py == 2.10.0
networkx == 2.4
wordcloud == 1.6.0
- Movie Graph: colored via modularity shown by Gephi
- User Graph: colored via modularity shown by Gephi
We have two recommendation systems (five models). Here are the steps to reproduce their results:
- Download the data through this Google Drive links and put them in
recommenders/mf-dnn/data
- Download the trained models through this Google Drive links and put them in
recommenders/mf-dnn/models
cd recommenders/mf-dnn
- Run one of the three scripts to get our testing results:
bash mf.sh
- run the testing code with MF model (latent dimension=16, with ratings normalization)bash dnn.sh
- run the testing code with MF + DNN model (latent dimension=64, no ratings normalization, 3 layers with 256 hidden size, dropout=0.5)bash dnn_w_info.sh
- run the testing code with MF + DNN with features model (latent dimension=64, no ratings normalization, 3 layers with 256 hidden size, dropout=0.5)
- You can also train the three models from scratch:
python train.py --normal --dim 16
- train the Matrix Factorizaton modelpython train.py --dim 64 --dnn
- train the Matrix Factorization + DNN modelpython train.py --dim 64 --dnn_w_info
- train the Matrix Factorization + DNN with features model
- The data should be automatically download if you run the training or testing script. But if it was not downloaded, you can download the data through this Google Drive links and put the folder in
gc-mc/data
- Download the trained models through this Google Drive links and put them in
gc-mc/models
cd recommenders/gc-mc
- Run one of the two scripts to get our testing results:
bash test_no_features.sh
- run the testing code with the GC-MC model (no additional features)bash test_with_features.sh
- run the testing code with the GC-MC model (with features)
- You can also train (and test) the two models from scratch:
train_test_no_features.sh
- train and test the GC-MC model (no additional features)train_test_with_features.sh
- train and test the GC-MC model (with features)
recommenders
|_ gc-mc
|_ data/: folder for dataset files.
|_ logs/: folder for log files.
|_ models/: folder for models.
|_ data_utils.py: data utility functions, like downloading datasets from the internet.
|_ initializations.py: different initialization methods for layers.
|_ layers.py: handles the computations of graph layers.
|_ metrics.py: different metrics for model evaluation.
|_ model.py: handles model related tasks, like saving and loading models.
|_ plot_rmse.py: plots history of training and validation rmse.
|_ preprocessing.py: preprocessing helper functions.
|_ test_no_features.sh: script to run the testing code with the GC-MC model (no additional features).
|_ test_with_features.sh: script to run the testing code with the GC-MC model (with features).
|_ test.py: testing codes for GC-MC models.
|_ train_test_no_features.sh: script to train and test the GC-MC model (no additional features).
|_ train_test_with_features.sh: script to train and test the GC-MC model (with features).
|_ train.py: experiment runner for GC-MC models.
|_ utils.py: utility function for constructing feed dict for tensorflow model.
|_ mf-dnn
|_ data/: folder for dataset files.
|_ logs/: folder for log files.
|_ models/: folder for models.
|_ utils/: folder for utility functions codes.
|_ dnn_w_info.sh: script to run the testing code with MF + DNN with features model.
|_ dnn.sh: script to run the testing code with MF + DNN model.
|_ mf.sh: script to run the testing code with MF model.
|_ model.py: builds model and create history class.
|_ plot_loss.py: plots history of training and validation loss.
|_ test.py: testing codes for MF-DNN models.
|_ train.py: experiment runner for MF-DNN models.
|_ parse_data.ipynb: parse Movielens 100k data for mf-dnn codes
Kuan Tung | Chun-Hung Yeh | Hiroki Hayakawa | Jinhui Guo |
---|---|---|---|
dinotuku |
yehchunhung |
hirokihayakawa07 |
eternalbetty233 |
- MovieLens 100k (paper)
- MovieLens 100k (dataset)
- Graph Convolutional Matrix Completion (paper)
- Graph Convolutional Matrix Completion (GitHub repository)
- Matrix Factorization (lecture given by Hung-yi Lee)
This project is licensed under the MIT License - see the LICENSE.md
file for details