This project utilize multiple regression to analyse driving factors for medical care cost. The data structure can be found in the relational schema. The code is written using R and contains four parts:
- Data cleaning and reshaping, feature extraction.
- Exploratory data analysis.
- Divide train and test dataset. Multiple regression, multiple regression with transformation, AIC model selection using traning dataset.
- Model prediction on test dataset.