The following repository contains code for predicting house prices in Ames, Iowa from the Kaggle Competition: https://www.kaggle.com/c/house-prices-advanced-regression-techniques
python codes: boxcoxplot.py - boxcox transformation and visualization function for feature engineering outliers.py - function that detects outliers used in feature engineering
The following is a description of each folder in the repository:
There are two parts: the first python notebook is only from one hot encoding variables and the second feature engineering file is for label encoding of categorical variables.
The Data folder contains both featured engineered data sets:
- train_x.csv, train_y.csv, and text_x.csv all correspond to the first feature engineering notebook: FeatureEngineeringRound1.ipynb
- train_x2.csv, train_y2.csv, and text_x2.csv all correspond to the second feature engineering notebook: FeatureEngineeringRound2.ipynb
The best scoring Kaggle predictions are in this folder as well called:
- Final_Prediction_1.csv - Prediction run from Elastic Net and SVR Average. Scored 0.1169 on Public Leaderboard
- Final_PredictionBestScore.csv - Prediction run from weighted average of Elastic Net, SVR, KRR, and GBR Scored 0.1154 on Public Leaderboard which was in the top 9%
In this folder each model prediction was ran in a separate ipython notebook listed by title:
- Lin_Reg_nb.ipynb - all linear regression model hyper parameter tuning and predictions: Ridge, Elastic Net, Lasso, and Kernel Ridge Regression
- RandomForest.ipynb - random forest model tuning and prediction
- SVR.ipynb - support vector regression model hyper parameter tuning and predictions
- Boosting Models.ipynb - gradient boosting tree model hyper parameter tuning and predictions
- Ensembling_v1.ipynb - ensembling Averaged and Stacking models for feature engineering notebook: FeatureEngineeringRound1.ipynb
- Ensembling_v2.ipynb - ensembling Averaged and Stacking models for feature engineering notebook: FeatureEngineeringRound2.ipynb
- Ensembling_Weighted_Averages.ipynb - "Kaggle Hacking" average weight ensembling method
This folder is an extra folder containing extra EDA and Data visualizations