Pyreclab: Recommendation lab for Python

Overview

Pyreclab is a recommendation library designed for training recommendation models with a friendly and easy-to-use interface, keeping a good performance in memory and CPU usage.

In order to achieve this, Pyreclab is built as a Python module to give a friendly access to its algorithms and it is completely developed in C++ to avoid the lack of performace of the interpreted languages.

At this moment, the following recommendation algorithms are supported:

Rating Prediction

User Avgerage
Item Average
Slope One
User Based KNN
Item Based KNN
Funk's SVD

Item Recommendation

Most Popular

Although Pyreclab can be compiled on most popular operating system, it has been tested on the following distributions.

Operating System	Version
Ubuntu	16.04
CentOS	6.4
Mac OS X	10.11 ( El Capitan )
Mac OS X	10.12 ( Sierra )

Citations

If you use this library, please cite:

@inproceedings{1706.06291v2, author = {Gabriel Sepulveda and Vicente Dominguez and Denis Parra}, title = {pyRecLab: A Software Library for Quick Prototyping of Recommender Systems}, year = {2017}, month = {August}, eprint = {arXiv:1706.06291v2}, keywords = {Recommender Systems, Software Development, Recommender Library, Python Library} }

Check out our paper

Build and install

1.- Before starting, verify you have libboost-dev and cmake installed on your system. If not, install it through your distribution's package manager, as shown next.

Debian based OS's ( Ubuntu )

$ sudo apt-get install cmake
$ sudo apt-get install libboost-dev

RedHat based OS's ( CentOS )

$ yum install cmake
$ yum install boost-devel

MAC OS X

$ brew install cmake
$ brew install boost

2.- Clone the source code of Pyreclab in a local directory.

$ git clone https://github.com/gasevi/pyreclab.git

3.- Build the Python module ( default: Python 2.7 ).

$ cd pyreclab
$ cmake .
$ make

By default, PyRecLab will be compiled for Python 2.7. If you want to build it for Python 3.x, you can execute the following steps:

$ cd pyreclab
$ cmake -DCMAKE_PYTHON_VERSION=3.x .
$ make

4.- Install PyRecLab.

$ sudo make install

API Documentation

Pyreclab provides the following classes for representing each of the recommendation algorithm currenly supported:

pyreclab.UserAvg
pyreclab.ItemAvg
pyreclab.SlopeOne
pyreclab.UserKnn
pyreclab.ItemKnn
pyreclab.SVD
pyreclab.MostPopular

So, you can import any of them as follows:

>>> from pyreclab import <RecAlg>

or import the entire module as you prefer

>>> import pyreclab

Afer that, to create an instance of any of these clases, you must provide a dataset file with the training information, which must contain the fields user_id, item_id and rating.

The following example shows the generic format for creating one of these instances.

>>> obj = pyreclab.RecAlg( dataset = filename,
                           dlmchar = b'\t',
                           header = False,
                           usercol = 0,
                           itemcol = 1,
                           ratingcol = 2 )

Where RecAlg represents the recommendation algorithm chosen from the previous list, and its parameters are presented in the next table.

Parameter	Type	Default value	Description
dataset	mandatory	N.A.	Dataset filename with fields: userid, itemid and rating
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Whether dataset filename contains a header line to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file

Due to the different nature of each algorithm, their train methods can have different parameters. For this reason, they have been described for each class as shown below.

pyreclab.UserAvg

Training

>>> obj.train()

Rating prediction

>>> prediction = obj.predict( userId, itemId )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
itemId	mandatory	N.A.	Item identifier

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing and evaluation for prediction

>>> predictionList, mae, rmse = obj.test( input_file = testset,
                                          dlmchar = b'\t',
                                          header = False,
                                          usercol = 0,
                                          itemcol = 1,
                                          ratingcol = 2,
                                          output_file = 'predictions.csv' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
output_file	optional	N.A.	Output file to write predictions

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      itemcol = 1,
                                      ratingcol = 2,
                                      topn = 10,
                                      output_file = 'ranking.json' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
topn	optional	10	Top N items to recommend
output_file	optional	N.A.	Output file to write predictions

pyreclab.ItemAvg

Training

>>> obj.train()

Rating prediction

>>> prediction = obj.predict( userId, itemId )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
itemId	mandatory	N.A.	Item identifier

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing and evaluation for prediction

>>> predictionList, mae, rmse = obj.test( input_file = testset,
                                          dlmchar = b'\t',
                                          header = False,
                                          usercol = 0,
                                          itemcol = 1,
                                          ratingcol = 2,
                                          output_file = 'predictions.csv' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
output_file	optional	N.A.	Output file to write predictions

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      itemcol = 1,
                                      ratingcol = 2,
                                      topn = 10,
                                      output_file = 'ranking.json' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
topn	optional	10	Top N items to recommend
output_file	optional	N.A.	Output file to write predictions

pyreclab.SlopeOne

Training

obj.train()

Rating prediction

prediction = obj.predict( userId, itemId )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
itemId	mandatory	N.A.	Item identifier

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing and evaluation for prediction

>>> predictionList, mae, rmse = obj.test( input_file = testset,
                                          dlmchar = b'\t',
                                          header = False,
                                          usercol = 0,
                                          itemcol = 1,
                                          ratingcol = 2,
                                          output_file = 'predictions.csv' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
output_file	optional	N.A.	Output file to write predictions

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      itemcol = 1,
                                      ratingcol = 2,
                                      topn = 10,
                                      output_file = 'ranking.json' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
topn	optional	10	Top N items to recommend
output_file	optional	N.A.	Output file to write predictions

pyreclab.UserKnn

Training

>>> obj.train( knn, similarity )

Parameter	Type	Default value	Description
knn	optional	10	K nearest neighbors
similarity	optional	'pearson'	Similarity metric

Rating prediction

>>> prediction = obj.predict( userId, itemId )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
itemId	mandatory	N.A.	Item identifier

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing and evaluation for prediction

>>> predictionList, mae, rmse = obj.test( input_file = testset,
                                          dlmchar = b'\t',
                                          header = False,
                                          usercol = 0,
                                          itemcol = 1,
                                          ratingcol = 2,
                                          output_file = 'predictions.csv' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
output_file	optional	N.A.	Output file to write predictions

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      itemcol = 1,
                                      ratingcol = 2,
                                      topn = 10,
                                      output_file = 'ranking.json' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
topn	optional	10	Top N items to recommend
output_file	optional	N.A.	Output file to write predictions

pyreclab.ItemKnn

Training

>>> obj.train( knn, similarity )

Parameter	Type	Default value	Description
knn	optional	10	K nearest neighbors
similarity	optional	'pearson'	Similarity metric

Rating prediction

>>> prediction = obj.predict( userId, itemId )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
itemId	mandatory	N.A.	Item identifier

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing and evaluation for prediction

>>> predictionList, mae, rmse = obj.test( input_file = testset,
                                          dlmchar = b'\t',
                                          header = False,
                                          usercol = 0,
                                          itemcol = 1,
                                          ratingcol = 2,
                                          output_file = 'predictions.csv' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
output_file	optional	N.A.	Output file to write predictions

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      itemcol = 1,
                                      ratingcol = 2,
                                      topn = 10,
                                      output_file = 'ranking.json' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
topn	optional	10	Top N items to recommend
output_file	optional	N.A.	Output file to write predictions

pyreclab.SVD

Training

>>> obj.train( factors = 1000, maxiter = 100, lr = 0.01, lamb = 0.1 )

Parameter	Type	Default value	Description
factors	optional	1000	Number of latent factors in matrix factorization
maxiter	optional	100	Maximum number of iterations reached without convergence
lr	optional	0.01	Learning rate
lamb	optional	0.1	Regularization parameter

Rating prediction

>>> prediction = obj.predict( userId, itemId )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
itemId	mandatory	N.A.	Item identifier

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing and evaluation for prediction

>>> predictionList, mae, rmse = obj.test( input_file = testset,
                                          dlmchar = b'\t',
                                          header = False,
                                          usercol = 0,
                                          itemcol = 1,
                                          ratingcol = 2,
                                          output_file = 'predictions.csv' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
output_file	optional	N.A.	Output file to write predictions

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      itemcol = 1,
                                      ratingcol = 2,
                                      topn = 10,
                                      output_file = 'ranking.json' )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
itemcol	optional	1	Item column position in dataset file
rating	optional	2	Rating column position in dataset file
topn	optional	10	Top N items to recommend
output_file	optional	N.A.	Output file to write predictions

pyreclab.MostPopular

Training

>>> obj.train()

Top-N item recommendation

>>> ranking = obj.recommend( userId, topN, includeRated )

Parameter	Type	Default value	Description
userId	mandatory	N.A.	User identifier
topN	optional	10	Top N items to recommend
includeRated	optional	false	Include rated items in ranking generation

Testing for recommendation

>>> recommendationList = obj.testrec( input_file = testset,
                                      dlmchar = b'\t',
                                      header = False,
                                      usercol = 0,
                                      output_file = 'ranking.json',
                                      topN = 10 )

Parameter	Type	Default value	Description
input_file	mandatory	N.A.	Testset filename
dlmchar	optional	tab	Delimiter character between fields (userid, itemid, rating)
header	optional	False	Dataset filename contains first line header to skip
usercol	optional	0	User column position in dataset file
output_file	optional	N.A.	Output file to write rankings
topN	optional	10	Top N items to recommend

On roadmap

Add ranking evaluation metrics.
Add Windows support.

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
algorithms		algorithms
cmake_scripts		cmake_scripts
datahandlers		datahandlers
dataio		dataio
eval_metrics		eval_metrics
pyinterface		pyinterface
pypackage		pypackage
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pyreclab: Recommendation lab for Python

Overview

Citations

Build and install

Debian based OS's ( Ubuntu )

RedHat based OS's ( CentOS )

MAC OS X

API Documentation

pyreclab.UserAvg

pyreclab.ItemAvg

pyreclab.SlopeOne

pyreclab.UserKnn

pyreclab.ItemKnn

pyreclab.SVD

pyreclab.MostPopular

On roadmap

About

Releases

Packages

Languages

License

lalanne/pyreclab

Folders and files

Latest commit

History

Repository files navigation

Pyreclab: Recommendation lab for Python

Overview

Citations

Build and install

Debian based OS's ( Ubuntu )

RedHat based OS's ( CentOS )

MAC OS X

API Documentation

On roadmap

About

Topics

Resources

License

Stars

Watchers

Forks

Languages