-
-
Notifications
You must be signed in to change notification settings - Fork 213
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #676 from adi271001/ecommerce-trends
Ecommerce trends Analysis
- Loading branch information
Showing
26 changed files
with
1,288 additions
and
0 deletions.
There are no files selected for viewing
1,001 changes: 1,001 additions & 0 deletions
1,001
E-Commerce Trends Analysis/Dataset/ecommerce_product_dataset.csv
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# E-commerce Product Sales Prediction | ||
|
||
# Models | ||
|
||
## Table of Contents | ||
|
||
``` | ||
* Goal | ||
* Dataset | ||
* Description | ||
* Libraries | ||
* Models and Results | ||
* Conclusion | ||
``` | ||
## Goal | ||
|
||
#### To predict sales of e-commerce products using various machine learning models. | ||
|
||
## Dataset | ||
|
||
#### Link: The dataset is provided within the notebook and contains various e-commerce product details. | ||
|
||
## Description | ||
|
||
#### * This folder contains the code and resources for predicting sales of e-commerce products using various machine learning models. | ||
|
||
#### * The prediction is based on product details such as product name, category, price, rating, number of reviews, stock quantity, discount, and sales. | ||
|
||
## Libraries Needed | ||
|
||
#### * pandas | ||
|
||
#### * numpy | ||
|
||
#### * matplotlib | ||
|
||
#### * seaborn | ||
|
||
#### * plotly | ||
|
||
#### * scikit-learn | ||
|
||
## Models and Results | ||
|
||
#### The project explores the following machine learning models to predict sales: | ||
|
||
## 1. Linear Regression | ||
|
||
#### Linear Regression is a basic and commonly used predictive analysis model. The model attempts to find the linear relationship between the input features and the target variable (sales). | ||
|
||
``` | ||
Results: RMSE: 593.23 R² Score: -0.0170 | ||
``` | ||
### 2. Decision Tree Regressor | ||
|
||
#### Decision Tree Regressor builds a model in the form of a tree structure. It breaks down the dataset into smaller subsets while at the same time an associated decision tree is incrementally | ||
|
||
#### developed. | ||
|
||
``` | ||
Results: RMSE: 855.12 R² Score: -1.1131 | ||
``` | ||
### 3. Random Forest Regressor | ||
|
||
#### Random Forest Regressor improves the performance of decision trees by building multiple trees and combining their predictions. It reduces overfitting and improves accuracy. | ||
|
||
``` | ||
Results: RMSE: 621.81 R² Score: -0.1173 | ||
``` | ||
### 4. Gradient Boosting Regressor | ||
|
||
#### Gradient Boosting Regressor builds an ensemble of trees in a sequential manner, where each tree attempts to correct the errors of the previous one. This model is powerful and effective for regression tasks. | ||
|
||
``` | ||
Results: RMSE: 609.70 R² Score: -0.0742 | ||
``` | ||
### 5. Support Vector Regressor (SVR) | ||
|
||
#### Support Vector Regressor uses Support Vector Machines for regression tasks. It aims to fit the best line within a threshold value (epsilon) and is effective in high-dimensional spaces. | ||
|
||
``` | ||
Results: RMSE: 588.42 R² Score: -0.0006 | ||
``` | ||
|
||
### 6. Logistic Regression | ||
|
||
#### Logistic Regression is typically used for classification tasks, but here it was included for comparative purposes. Its performance indicates it is not suitable for regression tasks like sales prediction. | ||
|
||
``` | ||
Results: Accuracy: 0.495 | ||
``` | ||
|
||
* ![Accuracy Comparison](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___42_0.png?raw=true) | ||
* ![Logistic REgression Accuracy](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___43_0.png?raw=true) | ||
|
||
|
||
## Conclusion | ||
|
||
#### * Among the models tested, the Support Vector Regressor (SVR) performed the best with the lowest RMSE (588.42) and the least negative R² score (-0.001), making it the most accurate model | ||
for predicting e-commerce product sales in this analysis. | ||
|
||
#### * Best Performing Model: Support Vector Regressor (SVR) | ||
|
||
#### * Next Steps: Consider tuning hyperparameters and exploring other algorithms for potentially better performance. |
1 change: 1 addition & 0 deletions
1
E-Commerce Trends Analysis/Models/ecommerce-trends-eda-models (2).ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,168 @@ | ||
# E-commerce Product Trends Analysis | ||
|
||
## Table of Contents | ||
|
||
``` | ||
* Goal | ||
* Dataset | ||
* Description | ||
* What I Had Done | ||
* Installation | ||
* Libraries | ||
* EDA Results | ||
* Models and Results | ||
* Conclusion | ||
* Contributing | ||
* Signature | ||
``` | ||
## Goal | ||
|
||
#### To analyze e-commerce product trends and predict sales using various machine learning models. | ||
|
||
## Dataset | ||
|
||
#### Link: https://www.kaggle.com/datasets/muhammadroshaanriaz/e-commerce-trends-a-guide-to-leveraging-dataset | ||
|
||
#### details. | ||
|
||
## Description | ||
|
||
#### * This folder contains the code and resources for analyzing e-commerce product trends and predicting sales using various machine learning models. | ||
|
||
#### * The analysis is based on product details such as product name, category, price, rating, number of reviews, stock quantity, discount, and sales. | ||
|
||
## What I Had Done | ||
|
||
## Installation | ||
|
||
#### Clone the repository using the following command: | ||
|
||
``` | ||
git clone https://github.com/yourusername/ecommerce-product-trends.git cd | ||
ecommerce-product-trends | ||
``` | ||
#### To run the notebook and reproduce the results, you need to have Python installed along with the necessary libraries. You can install the required libraries using the following | ||
|
||
#### command: | ||
|
||
``` | ||
pip install -r requirements.txt | ||
``` | ||
#### Run the Jupyter notebook: | ||
|
||
``` | ||
jupyter notebook ecommerce-trends-eda-models.ipynb | ||
``` | ||
## Libraries Needed | ||
|
||
#### * pandas | ||
|
||
#### * numpy | ||
|
||
#### * matplotlib | ||
|
||
#### * seaborn | ||
|
||
#### * plotly | ||
|
||
#### * scikit-learn | ||
|
||
|
||
## Exploratory Data Analysis Results | ||
|
||
#### * The dataset contains a wide range of product categories with varying prices, ratings, number of reviews, stock quantities, discounts, and sales. | ||
|
||
#### * Initial visualizations indicate significant trends and correlations among these features. | ||
|
||
### Graphs and Analysis | ||
|
||
#### 1. Relationship Graphs | ||
|
||
#### Insights: There are clear trends between price and sales, rating and sales, and discount and sales. | ||
|
||
#### 2. Cluster Graph | ||
|
||
#### Insights: Products are clustered into different groups based on their features, which helps in segmenting the data. | ||
|
||
#### 3. Correlation Matrix | ||
|
||
#### Pearson correlation Matrix | ||
|
||
#### Insights: The Pearson correlation matrix shows the linear correlation between different features. | ||
|
||
#### 4. Predictive Power Score | ||
|
||
#### Insights: This score helps in identifying the predictive power of different features for the target variable. | ||
|
||
#### 5. Line of Best Fit Graphs | ||
|
||
#### Insights: These graphs show the trends and best fit lines for key relationships in the data. | ||
|
||
* ![description of dataset](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___7_1.png?raw=true) | ||
* ![Distribution of ratings](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___8_1.png?raw=true) | ||
* ![Distribution of other features](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___9_1.png?raw=true) | ||
* ![correlation matrix](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___13_0.png?raw=true) | ||
* ![Top 10 Products](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___15_0.png?raw=true) | ||
* ![price vs sales clustering graph](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___17_0.png?raw=true) | ||
* ![pairplot of numerical features](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___18_2.png?raw=true) | ||
* ![Word Cloud](https://github.com/adi271001/ML-Crate/blob/ecommerce-trends/E-Commerce%20Trends%20Analysis/Images/__results___31_1.png?raw=true) | ||
|
||
## Models and Results | ||
|
||
#### The project explores the following machine learning models to predict sales: | ||
|
||
### 1. Linear Regression | ||
|
||
``` | ||
Results: RMSE: 593.23 R² Score: -0.0170 | ||
``` | ||
### 2. Decision Tree Regressor | ||
|
||
``` | ||
Results: RMSE: 855.12 R² Score: -1.1131 | ||
``` | ||
### 3. Random Forest Regressor | ||
|
||
``` | ||
Results: RMSE: 621.81 R² Score: -0.1173 | ||
``` | ||
### 4. Gradient Boosting Regressor | ||
|
||
``` | ||
Results: RMSE: 609.70 R² Score: -0.0742 | ||
``` | ||
### 5. Support Vector Regressor (SVR) | ||
|
||
``` | ||
Results: RMSE: 588.42 R² Score: -0.0006 | ||
``` | ||
### 6. Logistic Regression | ||
|
||
``` | ||
Results: Accuracy: 0.495 | ||
``` | ||
## Conclusion | ||
|
||
#### * Based on the evaluation of various machine learning models, the Support Vector Regressor (SVR) emerged as the best-performing model with the lowest RMSE (588.42) and the least negative R² score (-0.001), indicating it provides the most accurate predictions among the tested models. | ||
|
||
#### * Best Performing Model: Support Vector Regressor (SVR) due to its lowest RMSE and R² scores. | ||
|
||
#### * Next Steps: Verify the data processing steps and re-evaluate the models to identify any issues. Consider tuning model hyperparameters and exploring other algorithms for improved performance. | ||
|
||
#### * Important Features: Features such as price, rating, and discount were expected to be influential in predicting sales. | ||
|
||
## Contributing | ||
|
||
#### Contributions are welcome! Please read the contribution guidelines first. | ||
|
||
## Signature | ||
|
||
#### Aditya D | ||
|
||
#### Github: https://www.github.com/adi271001 | ||
|
||
#### LinkedIn: https://www.linkedin.com/in/aditya-d-23453a179/ | ||
|
||
#### Topmate: https://topmate.io/aditya_d/ | ||
|
||
#### Twitter: https://x.com/ADITYAD29257528 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
Model,RMSE,R^2 Score,Accuracy | ||
Linear Regression,593.2257019066986,-0.01700355866153469, | ||
Decision Tree,855.1219971442671,-1.1131908015070708, | ||
Random Forest,621.8057745031804,-0.11735726372963051, | ||
Gradient Boosting,609.695323177347,-0.07425722239816679, | ||
SVR,588.4236152648417,-0.0006051707827330333, | ||
Logistic Regression,,,0.495 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
pandas==1.3.3 | ||
numpy==1.21.2 | ||
matplotlib==3.4.3 | ||
seaborn==0.11.2 | ||
plotly==5.3.1 | ||
scikit-learn==0.24.2 | ||
|