This repository contains a churn prediction model tailored for the telecom industry. Churn prediction is a critical task for telecom companies to identify customers at risk of leaving their services. The model is built using logistic regression, a common technique for binary classification problems.
The dataset used for training and testing the model was obtained from Kaggle.Link for the dataset is given below. Link:https://www.kaggle.com/datasets/abhinav89/telecom-customer
This data set consists of 100 variables and approx 100 thousand records. It contains different variables explaining the attributes of telecom industry and various factors considered important while dealing with customers of telecom industry. The target variable here is churn which explains whether the customer will churn or not.
The prepocessing steps for this model includes:
- Dropping rows with missing values
- Label Encoding
- Standard scaling
We instantiated a logistic regression model with the following parameters:
random_state
: 42 (for reproducibility)max_iter
: 1000 (maximum number of iterations for optimization)
The model achieved an accuracy of 0.59 and a precision of 0.57.
The following Python libraries and modules are required to run this project:
- pandas: Used for data manipulation and analysis.
- scikit-learn: Provides tools for machine learning, including model selection and preprocessing.
- numpy: Used for numerical operations and data manipulation.
- matplotlib: Used for data visualization (if applicable).
You can install these dependencies using pip
with the following command:
pip install pandas scikit-learn numpy matplotlib