Skip to content

Commit

Permalink
Merge pull request #750 from Aqwqqq/changes_to_mlops
Browse files Browse the repository at this point in the history
Changes to mlops
  • Loading branch information
Nicole-ying authored Apr 21, 2024
2 parents d4c9a90 + 7228982 commit 2af8bfe
Showing 1 changed file with 81 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,87 @@
"\n",
"Moving Machine Learning models into production is as important as building them, sometimes even harder. Maintaining data quality and model accuracy over time are just a few of the challenges. To achieve end-to-end system productionization as a whole, the various components and designs need to be identified, from defining a problem to serving the model as a service.\n",
"\n",
"According to the Algorithmia statistics, 55% of businesses working on ML models have yet to get them into production. "
]
},
{
"cell_type": "markdown",
"id": "b13ac488",
"metadata": {},
"source": [
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/the%202020%20State%20of%20Enterprise%20ML%20by%20Algorithmia%20based%20on%20750%20businesses.png\n",
"---\n",
"name: the 2020 State of Enterprise ML by Algorithmia based on 750 businesses\n",
"---\n",
"the 2020 State of Enterprise ML by Algorithmia based on 750 businesses\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "294cbab7",
"metadata": {},
"source": [
"This is because there are many problems and challenges between the theoretical study of the model and the actual production deployment:\n",
"\n",
"(1)**POC** to production gap:\n",
"\n",
"There is a huge gap from Proof of Concept (POC) to actual final product or service deployment in production, with only a tiny fraction of the complete machine learning service model actually invested consisting of ML code, and the surrounding infrastructure required for this is large and complex. At the same time, this gap may also involve challenges in technology, resources, security, stability and other aspects.\n"
]
},
{
"cell_type": "markdown",
"id": "2bd5ba4a",
"metadata": {},
"source": [
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/the%20portion%20of%20ML%20code%20that%20is%20part%20of%20a%20complete%20ML%20system.jpg\n",
"---\n",
"name: the portion of ML code that is part of a complete ML system\n",
"---\n",
"the portion of ML code that is part of a complete ML system\n",
":::"
]
},
{
"cell_type": "markdown",
"id": "92c348bc",
"metadata": {},
"source": [
"(2) data drift and concept drift:\n",
"\n",
"Models do not last forever and sometimes degrade over time, even if the data itself is of good quality. Sometimes, model performance degrades due to data quality degradation:\n",
"\n",
"**Data drift** usually means that the variable distribution of the input data (**x**) changes, and the trained model is not related to this new data, so the performance will decline. For example, an e-commerce platform sets up a predictive model to predict the purchase possibility of users to push personalized offers, but at the beginning, the training and application of the model are based on the user data of spontaneous paid search. When the e-commerce platform launches a new advertising campaign, the users attracted by the new influx of advertisements do not adapt to the model previously analyzed.\n"
]
},
{
"cell_type": "markdown",
"id": "23a0c31b",
"metadata": {},
"source": [
":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/Continuous%20blue%20squares%20in%20the%20Data%20stream%20indicate%20the%20start%20of%20a%20Data%20Drift.jpg\n",
"---\n",
"name: Continuous blue squares in the Data stream indicate the start of a Data Drift\n",
"---\n",
"Continuous blue squares in the Data stream indicate the start of a Data Drift\n",
":::\n"
]
},
{
"cell_type": "markdown",
"id": "fb685489",
"metadata": {},
"source": [
"**Concept drift** usually means that the mapping between input and output changes (**x->y**), the pattern learned by the model is no longer valid, and what changes is not the data itself, but the statistical properties of the target domain have changed over time, that is, the so-called \"world has changed\". Sometimes these changes happen very quickly or even unexpectedly, as in the case of the COVID-19 outbreak, the Black Swan event, which dramatically increases the demand for gowns and masks in response to changes in government policies; Sometimes it is a slow change, for example, customers' online shopping preferences change with changes in personal interests, merchants' reputation, and service types.\n",
"\n",
"These data changes will affect the performance of the model and cause serious problems in the actual project landing process, so the model needs to be monitored and continuously deployed."
]
},
{
"cell_type": "markdown",
"id": "02964563",
"metadata": {},
"source": [
"This chapter combines the foundational concepts of Machine Learning with the functional expertise of modern software development and engineering to help you develop production-ready Machine Learning knowledge.\n",
"\n",
"Productionization of a Machine Learning solution is not a one-time thing. It is always under improving one-time through the iterative process continuously."
Expand Down

1 comment on commit 2af8bfe

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.