Skip to content

pugalkmc/NM-AI-Project

Repository files navigation

Market Basket Insights: This project focuses on analyzing retail data and building a predictive model for pricing. The goal is to extract insights from the data and create a model that can predict prices based on various features. The project is divided into several phases, each with specific tasks and objectives.

Problem Statement The problem at hand is to analyze a retail dataset and build a predictive model to estimate prices. The dataset contains information about products, customers, and sales transactions. The main challenge is to preprocess the data, select the most suitable machine learning algorithm, train the model, and predict prices accurately.

Design Thinking Process:

Data Collection: The project begins with collecting the retail dataset, which includes information about products, customers, sales, and countries. Data Preprocessing: The data is cleaned and transformed to make it suitable for analysis. This includes handling missing values, converting data types, and one-hot encoding categorical variables. Machine Learning Model Selection: Several regression algorithms are considered for predicting prices, including Linear Regression, Ridge Regression, Lasso Regression, Decision Tree, and Random Forest. Model Training: The selected machine learning model, Random Forest Regression, is trained on the preprocessed data. Model Evaluation: The model's performance is evaluated using metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), and R-squared (R^2). Model Deployment: The trained Random Forest model is saved to a file for future use. Phases of Development

Phase 1: Data Collection In this phase, the retail dataset is collected, which includes information about products, customers, and sales transactions. The data is stored in an Excel file (ai_dataset.xlsx).

Phase 2: Data Preprocessing The data preprocessing phase focuses on cleaning and transforming the dataset. The 'preprocessing.py' script is used to read the data, convert date columns, and perform one-hot encoding on country information. The preprocessed data is saved to 'preprocessed_data.xlsx'.

Phase 3: Model Selection Several regression algorithms are considered for predicting prices. The 'model_selection.py' script is used to train these models and evaluate their performance. Random Forest Regression is selected as the most suitable algorithm.

Phase 4: Model Training The chosen Random Forest Regression model is trained on the preprocessed data. This phase also includes splitting the data into training and testing sets.

Phase 5: Model Deployment The trained Random Forest Regression model is saved to a file as 'random_forest_model.pkl'. The model can be loaded and used for making price predictions.

Dataset: The dataset is structured as follows

  • BillNo
  • Itemname
  • Quantity
  • Date
  • Price
  • CustomerID
  • Country (one-hot encoded with individual columns for each country)
  • Code Files

preprocessing.py: Handles data preprocessing, including date conversion and one-hot encoding. It saves the preprocessed data to 'preprocessed_data.xlsx'.

model_selection.py: Selects the most suitable machine learning algorithm for price prediction, trains the model, and evaluates its performance.

market_basket_analysis.py: Contains the code for training and saving the Random Forest Regression model.

Conclusion: The project concludes that Random Forest Regression is the most suitable algorithm for predicting prices. The trained model demonstrates good performance in estimating prices accurately. The 'random_forest_model.pkl' file can be used for price predictions.

For more details on the project phases, data preprocessing, model selection, and deployment, refer to the corresponding code files and this README.

About

Naan Mudhalvan project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages