This repository contains the data science notebooks and datasets for my internship.
The project is organized into the following main folders:
-
Task_1: Contains Jupyter notebooks with the Admission Prediction analyses, ML models and Dataset in CSV file for Task 1.
-
Task_2: Contains Jupyter notebooks with the Clustering Models' analyses, Clustering models and Dataset in CSV file for Task 2.
-
Task_3: Contains Jupyter notebooks with the Customer Segmentation analysis, ML models and Dataset in CSV file for Task 3.
-
Notebook 1 - Admission Prediction using Linear Regression & Ridge Regression:
- Filename:
t1.ipynb
- Description: This notebook implements linear regression and ridge regression models to predict admission outcomes based on various features. It covers data preprocessing, model training, and evaluation.
- Filename:
-
Notebook 2 - Clustering Analysis of different algorithms on customer dataset:
- Filename:
t2.ipynb
- Description: This notebook explores clustering algorithms such as k-means, agglomerative hierarchy, spectral clustering, and mean shift on a customer dataset. It includes data preprocessing, model implementation, and visualization of clustering results.
- Filename:
-
Notebook 3 - Customer Segmentation Analysis:
- Filename:
t3.ipynb
- Description: This notebook focuses on customer segmentation analysis. It includes exploratory data analysis (EDA), feature engineering, and the application of clustering techniques to identify distinct customer segments.
- Filename:
-
Dataset 1 - Admission Predict Ver1.1:
- Filename:
Admission_Predict_Ver1.1.csv
- Description: Dataset containing features related to admission predictions.
- Filename:
-
Dataset 2 - Mall Customers:
- Filename:
Mall_Customers.csv
- Description: Dataset used for clustering analysis to understand customer behavior in a mall.
- Filename:
-
Dataset 3 - Segmentation Data:
- Filename:
segmentation_data.csv
- Description: Dataset used for customer segmentation analysis to identify distinct customer segments.
- Filename:
This project is developed using the following technologies:
-
Python 3.x: The primary programming language for data analysis and machine learning.
-
Jupyter Notebook: Utilized for interactive and exploratory coding.
-
Libraries: Various Python libraries including NumPy, Pandas, Matplotlib, Seaborn, and Scikit-Learn for data manipulation, visualization, and machine learning tasks.