An X Education needs help to select the most promising leads,
i.e. the leads that are most likely to convert into paying customers.
The company requires us to build a model wherein you need to
assign a lead score to each of the leads such that the customers
with higher lead scores have a higher conversion chance and
the customers with lower lead scores have a lower conversion chance.
The CEO, in particular, has given a ballpark of
the target lead conversion rate to be around 80%.
There are quite a few goals for this case study.
-
Build a logistic regression model to assign a lead score between 0 and 100 to each of the leads which can be used by the company to target potential leads.
A higher score would mean that the lead is hot, i.e. is most likely to convert whereas a lower score would mean that the lead is cold and will mostly not get converted.
-
There are some more problems presented by the company which your model should be able to adjust to if the company's requirement changes in the future so you will need to handle these as well.
These problems are provided in a separate doc file. Please fill it based on the logistic regression model you got in the first step. Also, make sure you include this in your final PPT where you'll make recommendations.
- Reading Data
- Cleaning Data
- EDA
- Creating Dummy
- Splitting data into train and test set
- Building Model
- Making Predictions
- Model Evaluation
- ROC Curve
- Prediction on test set
- Precision-Recall
- Lead Score Case Study.ipynb: The Python file showing coding and data analysis
- Assignment Subjective Questions.pdf: Some subjective questions answered
- Lead Score Case Study.pdf: Final Presentation
- Leads.csv: Data to work on
- Leads Data Dictionary.xlsx: Data Dictionary
- Summary.pdf: Summary of what's done in the entire py file