For this lab, you will be using the CSV files provided in the files_for_lab
folder.
- Apply the Random Forest algorithm to predict the
TARGET_B
. Please note that this column suffers from class imbalance. Fix the class imbalance using upsampling. - Discuss the model predictions and it's impact in the bussiness scenario. Is the cost of a false positive equals to the cost of the false negative? How much the money the company will not earn because of missclassifications made by the model?
- Sklearn classification models are trained to maximize the accuracy. However, another error metric will be more relevant here. Which one? Please checkout make_scorer alongside with GridSearchCV in order to train the model to maximize the error metric of interest in this case.