This repository was part of the CogSys Master's compulsory module (BM2: Intelligente Datenanalyse & Maschinelles Lernen I), which I took in the Summer Semester 2022 at the University of Potsdam, Germany.
The goal of this project is to build an algorithm that classify emails as spam or non-spam with maximum number of spam classified correctly while no more than 0.02% of non-spam emails is classified as spam. The problem must be approached with machine learning methods discussed in the course.
The project is complete in one notebook without data uploaded here. The notebook is divided into the following parts: problem setting, data exploration and preprocessing, training models (NB, DT and RF), and summary.