Adversarial Autoencoders for Recommendation Tasks
- torch
- numpy
- scipy
- sklearn
- gensim
- pandas
- joblib
If possible, numpy and scipy should be installed as system packages.
The dependencies gensim
and sklearn
can be installed via pip
.
For pytorch, please refer to their installation
instructions that depend on the python/CUDA setup you are
working in.
To use pretreined word-embeddings, the word2vec
Google News corpus should be download.
You can install this package and all necessary dependencies via pip.
pip install -e .
The main.py
file is an executable to run an evaluation of the specified models on the PubMed or EconBiz
dataset (see the Concrete datasets section below).
The dataset
and year
are mandatory arguments. The dataset
is expected to be a path to a tsv-file,
of which the format is described next.
The eval/aminer.py
file is an executable to run an evaluation of the specified models on the AMiner datasets (see the Concrete datasets section below). The dataset
and year
are mandatory arguments. The dataset
argument is expected to be either dblp
or acm
, and the DATA_PATH
constant in the script needs to be set to the path to a folder which contains both datasets.
The eval/rcv.py
file is an executable to run an evaluation of the specified models on the Reuters RCV1 dataset (see the Concrete datasets section below). The DATA_PATH
constant in the script needs to be set to the path to a tsv-file,
of which the format is described next.
Further scripts in the eval
folder were used to perform experiments for other datasets which we are not allowed to redistribute (see the Concrete datasets section below).
The expected dataset Format is a tab-separated with columns:
- owner id of the document
- set comma separated list of items
- year year of the document
- title of the document
The columns 'owner' and 'set' are expected to be the first two ones, since they are mandatory.
An arbitrary number of supplementary information columns can follow.
The current implementation, however, makes use of the year
property for splitting the data into train and test sets.
Also, title-enhanced recommendation models rely on the title
property to be present.
The format of the ACM and DBLP datasets is described here.
We worked with the PubMed citations dataset from
CITREC. We converted the
provided SQL dumps into the dataset format above.
The references in the CITREC TREC Genomics dataset are not disambiguated.
Therefore we operate only the PubMed dataset for citation recommendation.
For subject label recommendation, we used the the economics dataset EconBiz
, provided by ZBW.
The PubMed and EconBiz
datasets are available here.
For EconBiz
, only titles are available and we are currently asserting that copyright issues do not prevent us from publishing the further metadata of the documents that we have used.
Further public datasets used were the DBLP-Citation-network V10 and ACM-Citation-network V9 datasets from the AMiner project, and the Reuters RCV1 corpora.
We converted the provided XML dumps into the dataset format above, using the parse_reuters.py
script.
We also run experiments with the Million Playlist Dataset (MPD), provided by Spotify, and IREON, provided by FIV, but we are not allowed to redistribute them. The MPD dataset was used only to participate to the RecSys Challenge 2018 (see more information here).
Please see our papers for additional information on the models implemented and the experiments conducted:
-
Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels
-
Using Adversarial Autoencoders for Multi-Modal Automatic Playlist Continuation
If you use our code in your own work please cite one of these papers:
@article{Vagliano:2022,
author = {Iacopo Vagliano and
Lukas Galke and
Ansgar Scherp},
title = {Recommendations for Item Set Completion: On the Semantics of Item
Co-Occurrence With Data Sparsity, Input Size, and Input Modalities},
journal = {Inf Retrieval J},
year = {2022},
publisher = {Springer Nature},
url = {https://doi.org/10.1007/s10791-022-09408-9},
doi = {10.1007/s10791-022-09408-9}
}
@inproceedings{Vagliano:2018,
author = {Vagliano, Iacopo and Galke, Lukas and Mai, Florian and Scherp, Ansgar},
title = {Using Adversarial Autoencoders for Multi-Modal Automatic Playlist Continuation},
booktitle = {Proceedings of the ACM Recommender Systems Challenge 2018},
series = {RecSys Challenge '18},
year = {2018},
isbn = {978-1-4503-6586-4},
location = {Vancouver, BC, Canada},
pages = {5:1--5:6},
articleno = {5},
numpages = {6},
url = {http://doi.acm.org/10.1145/3267471.3267476},
doi = {10.1145/3267471.3267476},
acmid = {3267476},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {adversarial autoencoders, automatic playlist continuation, multi-modal recommender, music recommender systems, neural networks},
}
@inproceedings{Galke:2018,
author = {Galke, Lukas and Mai, Florian and Vagliano, Iacopo and Scherp, Ansgar},
title = {Multi-Modal Adversarial Autoencoders for Recommendations of Citations and Subject Labels},
booktitle = {Proceedings of the 26th Conference on User Modeling, Adaptation and Personalization},
series = {UMAP '18},
year = {2018},
isbn = {978-1-4503-5589-6},
location = {Singapore, Singapore},
pages = {197--205},
numpages = {9},
url = {http://doi.acm.org/10.1145/3209219.3209236},
doi = {10.1145/3209219.3209236},
acmid = {3209236},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {adversarial autoencoders, multi-modal, neural networks, recommender systems, sparsity},
}