Skip to content

MaxYurch/MANGOLEAF-APP

Repository files navigation

Mangoleaf

streamlit

Welcome to MANGOLEAF, your ultimate guide to discovering the best books and manga tailored to your tastes. Whether you're a seasoned reader or just starting, MANGOLEAF provides personalized recommendations to help you find your next favorite read.

Personal recommendations for books and mangas implemented using collaborative filtering based recommender systems (popularity, user-based & item-based).


 
 

Project

The goal of this project was to familiarize ourselves with and develop different recommender systems during a limited time of 2.5 weeks and clearly defined deliverable using agile methods. The recommender systems include item popularity based, item-based collaborative filtering, and user-based collaborative filtering.

The deliverable is a functional web app including user profiles for personalized recommendation available to anyone. For the sake of demonstration the datasets are limited to around 2000 items (around 1500 books and 500 manga) and the personalized recommendations are updated only at certain intervals (every 24 hours).

To avoid spam and abuse in this demo project, user ratings are reset and user profiles are deleted every five days. To offset this limitation, user ratings can be exported and downloaded as CSV file at any time.

Authors

Contributors

Recommender implementation

We trained and evaluated different recommenders for both the book and manga dataset. Below user is an individual, item refers to either a book or a manga, and a rating is a user score for each user-item combination.

  1. Popularity recommender: The ratings of all users are queried from the database and aggregated by average and count grouped by the items. Given a threshold of minimum number of ratings, the best average ratings are selected as the most popular items. In order of their rating they make up the popularity recommendation.

  2. Item-based collaborative filtering recommender: A collaborative filtering model is trained using the item ratings and their similarity matrix. The K-nearest neighbor (k-NN) inspired algorithm with a baseline ratings showed the most accuracy during model validation. For each item, the nearest neighbors are determined. These neighbors make up the the item-based, "you-might-also-like"-recommendation.

  3. User-based collaborative filtering recommender: Here, another baseline k-NN model is trained on the user ratings and their similarity matrix. For each user, the missing ratings constitute a testing set. The highest predicted ratings make up the user-based, personalized recommendation.

Each of the recommendations were subsequently filtered to remove the items that a (logged-in) user has already rated to display only novel, meaningful reading suggestions on the user interface.

Key learning

  • Project planning and collaborative working using agile methods
  • Balancing limited time against a working product
  • Working with different datasets and bringing them into a consistent format
  • Deploying a Streamlit app online
  • Implementing and maintaining a PostgreSQL database
  • Implementing user authentication with hashed and salted passwords and base64-encoded, cropped user pictures
  • Automated scheduling with GitHub Action workflows

Languages, tools, and libraries

  • scikit-surprise
  • streamlit
  • pandas
  • SQLAlchemy
  • bcrypt
  • pillow
  • Postgres SQL

See requirements.txt for all used Python packages.

Schedule

The project was implemented based on a well devised schedule of two and a half weeks. Implementation was done using agile methods including daily stand-ups, iterative implementation of minimally working examples, and weekly sprints/milestones.

schedule

Database schema

The database structure is separated into static tables, dynamic tables, and semi-dynamic tables, for both books and manga.

  • The static tables (left and right: books and mangas) remain filled with the book and manga datasets. They are read-only.
  • The dynamic tables (center: users and user_data, *_ratings) are altered through user interactions.
  • The semi-dynamic tables (bottom row: *_popular, *_item_based, *_user_based) are updated through scheduled GitHub Actions and are otherwise read-only.

schema

Repository structure

The repository contains the exploratory data analysis, the implementation of the recommenders, the database schema and SQL operations, and the code of the Streamlit web application. The core code of the project is organized into a Python package mangoleaf.

├── mangoleaf/               <- Source code of the Python package
│   │
│   ├── connection.py        <- Connection and interface with the database
│   ├── query.py
│   │
│   ├── authentication.py    <- Authentication functions for the user accounts
│   │
│   ├── frontend.py          <- Functions for frontend components
│   │
│   └── recommend.py         <- Functions to predict the recommendations
│
├── notebooks/               <- Jupyter notebooks with EDA and initial recommenders
│
├── requirements.txt         <- Dependencies for reproducing the environment
│
├── .streamlit/              <- Streamlit configuration
│
├── Home.py                  <- Pages, CSS, and images for the Streamlit app
├── pages/
├── style/
├── images/
│
├── schema.sql               <- SQL scripts for creating and truncating the database structure
├── reset_dynamic_tables.sql
│
├── create_schema.py         <- Python scripts to create, update, and reset the database
├── reset_database.py
├── update_database.py
│
└── .github/workflows/       <- Scheduled GitHub Action workflows to update/reset the database

Data sources

The datasets fueling the recommendations were modified from

The repository szapp/Mangoleaf is an adjacent implementation.