This project is a solution to the Kaggle competition "House Prices: Advanced Regression Techniques", through a collaborative effort with fellow students, conducted under the guidance of Prof. Vered Aharonson and achieved a grade of 90%.
This notebook is intended to be a practical project for an introductory machine learning course, predicting the sale prices of each home in Ames, Iowa, using 79 explanatory variables describing various aspects of residential homes. The dataset used is the Ames Housing dataset, compiled by Dean De Cock for use in data science education.
- Data Cleaning and Pre-processing: Handling missing values, encoding categorical variables, and feature scaling.
- Feature Engineering: Creating new features from existing ones to improve model performance.
- Model Selection: Implementing and comparing linear regression, K-Nearest Neighbours (KNN), Random Forest, and blended models such as Ridge and Lasso.
- Model Evaluation: Using KPI of Root Mean Square Error to evaluate and compare model performance.
- Visualization: Employing various visualization techniques to explore and present findings.
- Python
- Jupyter Notebook
- Pandas
- NumPy
- Scikit-Learn
- Matplotlib
- Seaborn
- Scipy
- OS
The final model achieved a rank of 550th in the Kaggle competition at the time of submission, demonstrating effective teamwork, understanding, and application of machine learning techniques.