From 89242797cf82c9b483c9fcdb92eec8acface3902 Mon Sep 17 00:00:00 2001 From: Xu MR Date: Fri, 22 Mar 2024 14:01:54 +0800 Subject: [PATCH 1/4] comiit --- .../classification/applied-ml-build-a-web-app.ipynb | 1 + 1 file changed, 1 insertion(+) diff --git a/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb b/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb index 01b4c81b5d..be18aff867 100644 --- a/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb +++ b/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb @@ -17,6 +17,7 @@ "# Install the necessary dependencies\n", "\n", "import os\n", + "\n", "import sys \n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython\n" ] From b481ad213ac6d9124151fe5a28cedd0cb723e17d Mon Sep 17 00:00:00 2001 From: Xu MR Date: Fri, 22 Mar 2024 14:22:30 +0800 Subject: [PATCH 2/4] data --- .../machine-learning-productionization/data-engineering.ipynb | 1 + 1 file changed, 1 insertion(+) diff --git a/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb b/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb index 2f7dc3b178..351f980ec7 100644 --- a/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb +++ b/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb @@ -14,6 +14,7 @@ "# Install the necessary dependencies\n", "\n", "import os\n", + "\n", "import sys\n", "!{sys.executable} -m pip install --quiet seaborn pandas scikit-learn numpy matplotlib jupyterlab_myst ipython" ] From aac34c3e3fd87c515458a270afe8679fc947af27 Mon Sep 17 00:00:00 2001 From: Xu MR Date: Sat, 20 Apr 2024 17:31:42 +0800 Subject: [PATCH 3/4] commit --- .../overview.ipynb | 81 +++++++++++++++++++ 1 file changed, 81 insertions(+) diff --git a/open-machine-learning-jupyter-book/machine-learning-productionization/overview.ipynb b/open-machine-learning-jupyter-book/machine-learning-productionization/overview.ipynb index 49fc1d7e5f..7901825bf9 100644 --- a/open-machine-learning-jupyter-book/machine-learning-productionization/overview.ipynb +++ b/open-machine-learning-jupyter-book/machine-learning-productionization/overview.ipynb @@ -48,6 +48,87 @@ "\n", "Moving Machine Learning models into production is as important as building them, sometimes even harder. Maintaining data quality and model accuracy over time are just a few of the challenges. To achieve end-to-end system productionization as a whole, the various components and designs need to be identified, from defining a problem to serving the model as a service.\n", "\n", + "According to the Algorithmia statistics, 55% of businesses working on ML models have yet to get them into production. " + ] + }, + { + "cell_type": "markdown", + "id": "b13ac488", + "metadata": {}, + "source": [ + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/the%202020%20State%20of%20Enterprise%20ML%20by%20Algorithmia%20based%20on%20750%20businesses.png\n", + "---\n", + "name: the 2020 State of Enterprise ML by Algorithmia based on 750 businesses\n", + "---\n", + "the 2020 State of Enterprise ML by Algorithmia based on 750 businesses\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "294cbab7", + "metadata": {}, + "source": [ + "This is because there are many problems and challenges between the theoretical study of the model and the actual production deployment:\n", + "\n", + "(1)**POC** to production gap:\n", + "\n", + "There is a huge gap from Proof of Concept (POC) to actual final product or service deployment in production, with only a tiny fraction of the complete machine learning service model actually invested consisting of ML code, and the surrounding infrastructure required for this is large and complex. At the same time, this gap may also involve challenges in technology, resources, security, stability and other aspects.\n" + ] + }, + { + "cell_type": "markdown", + "id": "2bd5ba4a", + "metadata": {}, + "source": [ + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/the%20portion%20of%20ML%20code%20that%20is%20part%20of%20a%20complete%20ML%20system.jpg\n", + "---\n", + "name: the portion of ML code that is part of a complete ML system\n", + "---\n", + "the portion of ML code that is part of a complete ML system\n", + ":::" + ] + }, + { + "cell_type": "markdown", + "id": "92c348bc", + "metadata": {}, + "source": [ + "(2) data drift and concept drift:\n", + "\n", + "Models do not last forever and sometimes degrade over time, even if the data itself is of good quality. Sometimes, model performance degrades due to data quality degradation:\n", + "\n", + "**Data drift** usually means that the variable distribution of the input data (**x**) changes, and the trained model is not related to this new data, so the performance will decline. For example, an e-commerce platform sets up a predictive model to predict the purchase possibility of users to push personalized offers, but at the beginning, the training and application of the model are based on the user data of spontaneous paid search. When the e-commerce platform launches a new advertising campaign, the users attracted by the new influx of advertisements do not adapt to the model previously analyzed.\n" + ] + }, + { + "cell_type": "markdown", + "id": "23a0c31b", + "metadata": {}, + "source": [ + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/Continuous%20blue%20squares%20in%20the%20Data%20stream%20indicate%20the%20start%20of%20a%20Data%20Drift.jpg\n", + "---\n", + "name: Continuous blue squares in the Data stream indicate the start of a Data Drift\n", + "---\n", + "Continuous blue squares in the Data stream indicate the start of a Data Drift\n", + ":::\n" + ] + }, + { + "cell_type": "markdown", + "id": "fb685489", + "metadata": {}, + "source": [ + "**Concept drift** usually means that the mapping between input and output changes (**x->y**), the pattern learned by the model is no longer valid, and what changes is not the data itself, but the statistical properties of the target domain have changed over time, that is, the so-called \"world has changed\". Sometimes these changes happen very quickly or even unexpectedly, as in the case of the COVID-19 outbreak, the Black Swan event, which dramatically increases the demand for gowns and masks in response to changes in government policies; Sometimes it is a slow change, for example, customers' online shopping preferences change with changes in personal interests, merchants' reputation, and service types.\n", + "\n", + "These data changes will affect the performance of the model and cause serious problems in the actual project landing process, so the model needs to be monitored and continuously deployed." + ] + }, + { + "cell_type": "markdown", + "id": "02964563", + "metadata": {}, + "source": [ "This chapter combines the foundational concepts of Machine Learning with the functional expertise of modern software development and engineering to help you develop production-ready Machine Learning knowledge.\n", "\n", "Productionization of a Machine Learning solution is not a one-time thing. It is always under improving one-time through the iterative process continuously." From 7228982886dadda9abbf3815d52b8a379f28a369 Mon Sep 17 00:00:00 2001 From: Xu MR Date: Sat, 20 Apr 2024 20:08:42 +0800 Subject: [PATCH 4/4] commit --- .../machine-learning-productionization/data-engineering.ipynb | 1 - .../classification/applied-ml-build-a-web-app.ipynb | 1 - 2 files changed, 2 deletions(-) diff --git a/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb b/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb index 351f980ec7..2f7dc3b178 100644 --- a/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb +++ b/open-machine-learning-jupyter-book/machine-learning-productionization/data-engineering.ipynb @@ -14,7 +14,6 @@ "# Install the necessary dependencies\n", "\n", "import os\n", - "\n", "import sys\n", "!{sys.executable} -m pip install --quiet seaborn pandas scikit-learn numpy matplotlib jupyterlab_myst ipython" ] diff --git a/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb b/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb index be18aff867..01b4c81b5d 100644 --- a/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb +++ b/open-machine-learning-jupyter-book/ml-fundamentals/classification/applied-ml-build-a-web-app.ipynb @@ -17,7 +17,6 @@ "# Install the necessary dependencies\n", "\n", "import os\n", - "\n", "import sys \n", "!{sys.executable} -m pip install --quiet pandas scikit-learn numpy matplotlib jupyterlab_myst ipython\n" ]