Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to mlops #750

Merged
merged 4 commits into from
Apr 21, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,87 @@
"\n",
"Moving Machine Learning models into production is as important as building them, sometimes even harder. Maintaining data quality and model accuracy over time are just a few of the challenges. To achieve end-to-end system productionization as a whole, the various components and designs need to be identified, from defining a problem to serving the model as a service.\n",
"\n",
"According to the Algorithmia statistics, 55% of businesses working on ML models have yet to get them into production. "
]
},
{
"cell_type": "markdown",
"id": "b13ac488",
"metadata": {},
"source": [
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/the%202020%20State%20of%20Enterprise%20ML%20by%20Algorithmia%20based%20on%20750%20businesses.png\n",
"---\n",
"name: the 2020 State of Enterprise ML by Algorithmia based on 750 businesses\n",
"---\n",
"the 2020 State of Enterprise ML by Algorithmia based on 750 businesses\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "294cbab7",
"metadata": {},
"source": [
"This is because there are many problems and challenges between the theoretical study of the model and the actual production deployment:\n",
"\n",
"(1)**POC** to production gap:\n",
"\n",
"There is a huge gap from Proof of Concept (POC) to actual final product or service deployment in production, with only a tiny fraction of the complete machine learning service model actually invested consisting of ML code, and the surrounding infrastructure required for this is large and complex. At the same time, this gap may also involve challenges in technology, resources, security, stability and other aspects.\n"
]
},
{
"cell_type": "markdown",
"id": "2bd5ba4a",
"metadata": {},
"source": [
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/the%20portion%20of%20ML%20code%20that%20is%20part%20of%20a%20complete%20ML%20system.jpg\n",
"---\n",
"name: the portion of ML code that is part of a complete ML system\n",
"---\n",
"the portion of ML code that is part of a complete ML system\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "92c348bc",
"metadata": {},
"source": [
"(2) data drift and concept drift:\n",
"\n",
"Models do not last forever and sometimes degrade over time, even if the data itself is of good quality. Sometimes, model performance degrades due to data quality degradation:\n",
"\n",
"**Data drift** usually means that the variable distribution of the input data (**x**) changes, and the trained model is not related to this new data, so the performance will decline. For example, an e-commerce platform sets up a predictive model to predict the purchase possibility of users to push personalized offers, but at the beginning, the training and application of the model are based on the user data of spontaneous paid search. When the e-commerce platform launches a new advertising campaign, the users attracted by the new influx of advertisements do not adapt to the model previously analyzed.\n"
]
},
{
"cell_type": "markdown",
"id": "23a0c31b",
"metadata": {},
"source": [
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/Continuous%20blue%20squares%20in%20the%20Data%20stream%20indicate%20the%20start%20of%20a%20Data%20Drift.jpg\n",
"---\n",
"name: Continuous blue squares in the Data stream indicate the start of a Data Drift\n",
"---\n",
"Continuous blue squares in the Data stream indicate the start of a Data Drift\n",
":::\n"
]
},
{
"cell_type": "markdown",
"id": "fb685489",
"metadata": {},
"source": [
"**Concept drift** usually means that the mapping between input and output changes (**x->y**), the pattern learned by the model is no longer valid, and what changes is not the data itself, but the statistical properties of the target domain have changed over time, that is, the so-called \"world has changed\". Sometimes these changes happen very quickly or even unexpectedly, as in the case of the COVID-19 outbreak, the Black Swan event, which dramatically increases the demand for gowns and masks in response to changes in government policies; Sometimes it is a slow change, for example, customers' online shopping preferences change with changes in personal interests, merchants' reputation, and service types.\n",
"\n",
"These data changes will affect the performance of the model and cause serious problems in the actual project landing process, so the model needs to be monitored and continuously deployed."
]
},
{
"cell_type": "markdown",
"id": "02964563",
"metadata": {},
"source": [
"This chapter combines the foundational concepts of Machine Learning with the functional expertise of modern software development and engineering to help you develop production-ready Machine Learning knowledge.\n",
"\n",
"Productionization of a Machine Learning solution is not a one-time thing. It is always under improving one-time through the iterative process continuously."
Expand Down
Loading