Skip to content

Latest commit

 

History

History
55 lines (32 loc) · 2.47 KB

README.md

File metadata and controls

55 lines (32 loc) · 2.47 KB

InternSavy

This repository contains the data science notebooks and datasets for my internship.

Project Structure

The project is organized into the following main folders:

  • Task_1: Contains Jupyter notebooks with the Admission Prediction analyses, ML models and Dataset in CSV file for Task 1.

  • Task_2: Contains Jupyter notebooks with the Clustering Models' analyses, Clustering models and Dataset in CSV file for Task 2.

  • Task_3: Contains Jupyter notebooks with the Customer Segmentation analysis, ML models and Dataset in CSV file for Task 3.

Notebooks

  1. Notebook 1 - Admission Prediction using Linear Regression & Ridge Regression:

    • Filename: t1.ipynb
    • Description: This notebook implements linear regression and ridge regression models to predict admission outcomes based on various features. It covers data preprocessing, model training, and evaluation.
  2. Notebook 2 - Clustering Analysis of different algorithms on customer dataset:

    • Filename: t2.ipynb
    • Description: This notebook explores clustering algorithms such as k-means, agglomerative hierarchy, spectral clustering, and mean shift on a customer dataset. It includes data preprocessing, model implementation, and visualization of clustering results.
  3. Notebook 3 - Customer Segmentation Analysis:

    • Filename: t3.ipynb
    • Description: This notebook focuses on customer segmentation analysis. It includes exploratory data analysis (EDA), feature engineering, and the application of clustering techniques to identify distinct customer segments.

Datasets

  • Dataset 1 - Admission Predict Ver1.1:

    • Filename: Admission_Predict_Ver1.1.csv
    • Description: Dataset containing features related to admission predictions.
  • Dataset 2 - Mall Customers:

    • Filename: Mall_Customers.csv
    • Description: Dataset used for clustering analysis to understand customer behavior in a mall.
  • Dataset 3 - Segmentation Data:

    • Filename: segmentation_data.csv
    • Description: Dataset used for customer segmentation analysis to identify distinct customer segments.

Tech Stack

This project is developed using the following technologies:

  • Python 3.x: The primary programming language for data analysis and machine learning.

  • Jupyter Notebook: Utilized for interactive and exploratory coding.

  • Libraries: Various Python libraries including NumPy, Pandas, Matplotlib, Seaborn, and Scikit-Learn for data manipulation, visualization, and machine learning tasks.