ML.NET version | API type | Status | App Type | Data type | Scenario | ML Task | Algorithms |
---|---|---|---|---|---|---|---|
v0.9 | Dynamic API | Up-to-date | ASP.NET Core web app and Console app | SQL Server and .csv files | Sales forecast | Regression | FastTreeTweedie Regression |
eShopDashboardML is a web app with Sales Forecast predictions (per product and per country) using Microsoft Machine Learning .NET (ML.NET).
This end-to-end sample app highlights the usage of ML.NET API by showing the following topics:
- How to train, build and generate ML models
- Implemented as a console app using .NET Core.
- How to predict the next month of Sales Forecasts by using the trained ML model
- Implemented as a single, monolithic web app using ASP.NET Core Razor.
The app is also using a SQL Server database for regular product catalog and orders info, as many typical web apps using SQL Server. In this case, since it is an example, it is, by default, using a localdb SQL database so there's no need to setup a real SQL Server. The localdb database will be created, along with sample populated data, the first time you run the web app.
If you want to use a real SQL Server or Azure SQL Database, you just need to change the connection string in the app.
Here's a sample screenshot of the web app and one of the forecast predictions:
Learn how to set it up in Visual Studio plus further explanations on the code:
-
Setting up eShopDashboard in Visual Studio and running the web app
-
Create and Train your ML models
- This step is optional as the web app is already configured to use a pre-trained model. But you can create your own trained model and swap the pre-trained model with your own.
This problem is centered around country and product forecasting based on previous sales
To solve this problem, you build two independent ML models that take the following datasets as input:
Data Set | columns |
---|---|
products stats | next, productId, year, month, units, avg, count, max, min, prev |
country stats | next, country, year, month, max, min, std, count, sales, med, prev |
ML task - Regression
The ML Task for this sample is a Regression, which is a supervised machine learning task that is used to predict the value of the next period (in this case the sales prediction) from a set of related features/variables.
To solve this problem, first we will build the ML models while training each model on existing data, evaluate how good it is, and finally you consume the model to predict sales.
Note that the sample implements two independent models:
- Model to predict product's demand forecast for the next period (month)
- Model to predict country's sales forecast for the next period (month)
However, when learning/researching the sample, you can focus just on one of the scenarios/models.
STEP 1: Define the schema of data in a class type and refer that type while loading data using TextLoader. Here the class type is ProductData.
public class ProductData
{
// next,productId,year,month,units,avg,count,max,min,prev
//The index of column in LoadColumn(int index) should be matched with the position of columns in file.
[LoadColumn(0)]
public float next;
[LoadColumn(1)]
public string productId;
[LoadColumn(2)]
public float year;
[LoadColumn(3)]
public float month;
[LoadColumn(4)]
public float units;
[LoadColumn(5)]
public float avg;
[LoadColumn(6)]
public float count;
[LoadColumn(7)]
public float max;
[LoadColumn(8)]
public float min;
[LoadColumn(9)]
public float prev;
}
Build the pipeline transformations and to specify what trainer/algorithm you are going to use. In this case you are doing the following transformations:
- Concat current features to a new Column named NumFeatures
- Transform productId using one-hot encoding
- Concat all generated fetures in one column named 'Features'
- Copy next column to rename it to "Label"
- Specify the "Fast Tree Tweedie" Trainer as the algorithm to apply to the model
After designing the pipeline, you can load the dataset into the DataView, although this step is just configuration, it is lazy and won't be loaded until training the model in the next step.
var trainingPipeline = mlContext.Transforms.Concatenate(outputColumn: "NumFeatures", "year", "month", "units", "avg", "count", "max", "min", "prev" )
.Append(mlContext.Transforms.Categorical.OneHotEncoding(inputColumn:"productId", outputColumn:"CatFeatures"))
.Append(mlContext.Transforms.Concatenate(outputColumn: "Features", "NumFeatures", "CatFeatures"))
.Append(mlContext.Transforms.CopyColumns("next", "Label"))
.Append(trainer = mlContext.Regression.Trainers.FastTreeTweedie("Label", "Features"));
var trainingDataView = mlContext.Data.ReadFromTextFile<ProductData>(dataPath, hasHeader: true, separatorChar:',');
In this case, the evaluation of the model is performed before training the model with a cross-validation approach, so you obtain metrics telling you how good is the accuracy of the model.
var crossValidationResults = mlContext.Regression.CrossValidate(trainingDataView, trainingPipeline, numFolds: 6, labelColumn: "Label");
ConsoleHelper.PrintRegressionFoldsAverageMetrics(trainer.ToString(), crossValidationResults);
After building the pipeline, we train the forecast model by fitting or using the training data with the selected algorithm. In that step, the model is built, trained and returned as an object:
var model = trainingPipeline.Fit(trainingDataView);
Once the model is created and evaluated, you can save it into a .ZIP file which could be consumed by any end-user application with the following code:
using (var file = File.OpenWrite(outputModelPath))
model.SaveTo(mlContext, file);
Basically, you can load the model from the .ZIP file create some sample data, create the "prediction function" and finally you make a prediction.
ITransformer trainedModel;
using (var stream = File.OpenRead(outputModelPath))
{
trainedModel = mlContext.Model.Load(stream);
}
var predictionEngine = trainedModel.CreatePredictionEngine<ProductData, ProductUnitPrediction>(mlContext);
Console.WriteLine("** Testing Product 1 **");
// Build sample data
ProductData dataSample = new ProductData()
{
productId = "263",
month = 10,
year = 2017,
avg = 91,
max = 370,
min = 1,
count = 10,
prev = 1675,
units = 910
};
// Predict the nextperiod/month forecast to the one provided
ProductUnitPrediction prediction = predictionEngine.Predict(dataSample);
Console.WriteLine($"Product: {dataSample.productId}, month: {dataSample.month + 1}, year: {dataSample.year} - Real value (units): 551, Forecast Prediction (units): {prediction.Score}");
eShopDashboardML dataset is based on a public Online Retail Dataset from UCI: http://archive.ics.uci.edu/ml/datasets/online+retail
Daqing Chen, Sai Liang Sain, and Kun Guo, Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining, Journal of Database Marketing and Customer Strategy Management, Vol. 19, No. 3, pp. 197–208, 2012 (Published online before print: 27 August 2012. doi: 10.1057/dbm.2012.17).