Includes all the assignments and project.
This repository contains Python code that demonstrates how to handle outliers in a dataset using two different scaling methods: Standardization (Z-score normalization) and Min-Max scaling. It includes examples using a dataset from Seaborn's built-in datasets.
Outliers in datasets can significantly impact statistical analyses and machine learning models. This repository provides methods to replace outliers using two common scaling techniques:
Standardization (Z-score normalization): Adjusts data to have a mean of 0 and a standard deviation of 1. Less sensitive to outliers compared to Min-Max scaling.
Min-Max Scaling: Scales data to a fixed range, typically [0, 1]. Preserves the original distribution but can be sensitive to outliers.
The code includes functions to visualize boxplots before and after outlier replacement using both methods.
Ensure you have the following Python libraries installed:
-> seaborn -> pandas -> numpy -> matplotlib -> scikit-learn
Handling outliers using standardization (z-score) helps in scaling the data such that outliers are replaced with a threshold value, preserving the distribution of the data. This technique is useful for improving the robustness of statistical analyses and machine learning models that are sensitive to outliers.
Before Scaling: Outliers are clearly visible beyond the whiskers in the boxplot. Data shows high variance in values.
After Scaling: Outliers are minimized and adjusted within the specified threshold range. Data is normalized, making it easier to compare features on a common scale.
Using Min-Max Scaling effectively reduces the impact of outliers, ensuring that all features contribute equally to the analysis or machine learning model.