The group is composed by: Alessandra Anna Griesi, Hannes Engelhardt and Federica Spoto.
In this assignment we solved two different task: in the first one we perform a clustering analysis of house announcements, taking the data through web scraping; in the second one we define two hash functions to check the presence of duplicates in a list of passwords.
For the first task, we used the data of the Immobiliare.it website, taking into account more than 10k announcements. For the second task, tha password are given as an input in the file passwords2.txt.
In this repository you will find:
Homework_4.ipynb
: the Jupyter file contains all the work done in light of the achievement of the final results:
- Implementation and comments of Task 1;
- Implementation and comments of Task 2;
- Bonus step: implementation of the K-means algorithm from scratch.
function.py
: the file contains all the functions created during the study and used inHomework_4.ipynb
.