Skip to content

MaGiiK02/AirBnB_score_prediction

Repository files navigation

AirBnB Predictions

Introduction

This project have be develop on data collected from InsideAirBnB, that are separated in listings and reviews (comments on the listings).

Folder structure

  • data_analysis: contains notebooks with the analysis of the data.
  • dataset: contains the data and some custom classes to work with them.
    • listings: contains the data relative the listings.
    • comments: contains the data relative the listing’s reviews of comments.
  • embeddings: contains the processed data in pikles that we obtain in the intermediate steps of the data_processing.
  • model/models: contains custom Neural Networks model developed for the project.
  • processing: contains notebooks used to preprocess the data cleaning them or generate embeddings using Sentence Models.
  • utils: contains various utils to process the data, special note for the amenities a special field present in the listing processed using clusters.
  • visualization: contains custom modules to visualize the data.

Running

0. Requirements

This code have been developed on python 3.11.3, we recommend an equal mayor related version

1. Install the environment

Environment setup

  1. Using Pip
pip install -r requirements.txt
  1. Using Conda
conda env create -f environment.yml

3. Dataset - Setup

We need to placed the dataset downloaded from InsideAirBnB, inside the dataset folders.

  • The lising.csv must be placed inside dataset/listings folder, you can place more than one all the csv files in the folder will be used (already in the folder).
  • The reviews.csv must be placed inside dataset/comments folder, you can place more than one all the csv files in the folder will be used (already in the folder).

4. Data Processing

Run in order the processing steps:

  1. step1_merge_listings_comments.ipynb: connects the listings and reviews togheter.
  2. step2_process_columns.ipynb: generates the embeddings for the comments-review for the listings.
  3. step3_process_comments.ipynb: generates the embeddings, as well as, the processed ordinal and numeric data. E.G., prices or listing type (Apartment, Home, etc...).
  4. step4_extraction_of_test_set.ipynb: merge the embeddings and pre-processed data and generate the train and test dataset files.

5. Data Analysis

Simply open any notebook in the data_analysis and run it, only remember that the analysis requires the embeddings to be computed. As such you need the data pre-processing first.

6. Experiments

You can simply run any notebook to evaluate our models on the data provided the experiments are separated as follow:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •