Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ML documentation #6828

Merged
merged 2 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 42 additions & 5 deletions docs/integration/MachineLearning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,46 @@ sidebar_position: 5

# Machine Learning (Enterprise only)

Nussknacker can infer ML models using the Machine Learning enrichers. The ML enrichers are Enterprise components of Nussknacker and require a separate license. Please contact <[email protected]> for license terms and more details.
Nussknacker can infer ML models using the Machine Learning enrichers. We support both in-process inference and inference
using a dedicated ML runtime.

We support the inference of the following ML technologies:
- native Python models discovered using the [MLflow](https://mlflow.org/) model registry and executed with our ML runtime
- models exported in the [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) format
- models exported in the [H2O](https://h2o.ai/) format
Nussknacker ML enrichers can infer **in-process** models exported in the following formats:
- [ONNX](https://onnx.ai/)
- [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)
- [H2O](https://h2o.ai/)

Executing ML models in the same process as the Nussknacker scenario is a matter of configuring an ML enricher. No
additional components are required to infer models. Also, invoking exported models using their native runtimes
should result in both minimal latency and low resource requirements. However, this approach has some limitations.
Namely, not all models and not all popular ML libraries are covered by these formats. The export process itself
can feel cumbersome.

Nussknacker can discover exported models from file (e.g. local directory, NFS) or HTTP based registries.
We also plan to support MLflow as a registry for exported models in these formats.

The diagram below shows the interactions for ML enricher inferring an exported model:

![alt_text](img/mlExportedModels.png "Inferring ML exported models")

Because of some restrictions with exporting models, we also support inference of **any Python ML model** using our
*Nussknacker ML runtime* component. ML models are discovered and fetched from the [MLflow](https://mlflow.org/)
model registry which also serves as an experiment tracking tool for data scientists. This way, data scientists do not
have to worry about exporting trained models to other formats, they just log the models in the MLflow model registry.
They also have the flexibility to use various Python ML libraries.

Nussknacker ML runtime offers higher throughput at the cost of higher inference latency. Since the ML runtime
is a separate component of Nussknacker installation, a GPU can be also harnessed to infer ML models.

The diagram below shows the interactions for MLflow enricher and ML runtime:

![alt_text](img/mlMlflowAndNussknackerMlRuntime.png "Inferring MLflow models with Nussknacker ML runtime")

A Kubernetes cluster is recommended for installing the Nussknacker ML runtime. However, the ML runtime can also be
installed as a standalone application.

You can read more about this approach in our [blog post](https://nussknacker.io/blog/ml-models-inference-in-fraud-detection/)
where we present a complete path from analysing a dataset to training a model and finally inferring the model
in a Nussknacker scenario.

The ML enrichers are Enterprise components of Nussknacker and require a separate license. Please contact
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the nussknacker.io page we point to the [email protected]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I have changed the email address on the relevant pages.

<[email protected]> for license terms and more details.
Binary file added docs/integration/img/mlExportedModels.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 12 additions & 8 deletions docs/scenarios_authoring/Enrichers.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,15 +152,19 @@ Similarly, information about field names and types returned by the OpenAPI servi
## ML enricher
**(Enterprise only)**

Nussknacker can infer ML models using the Machine Learning enrichers. The ML enrichers are Enterprise components of Nussknacker and require a separate license. Please contact <[email protected]> for license terms and more details.
Similarly to SQL and OpenAPI enrichers, as ML model input and output vectors are known to Designer, when you double-click
the ML Enricher node in the scenario you will see entry fields required by the selected model and version;
data type hints and syntax error checking functionality will be active.

We support the inference of the following ML technologies:
- native Python models discovered using the [MLflow](https://mlflow.org/) model registry and executed with our ML runtime
- models exported in the [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language) format
- models exported in the [H2O](https://h2o.ai/) format
From the scenario author perspective, the ML Enricher is indistinguishable from OpenAPI enricher - it just takes
some input parameters and returns a value, e.g. a risk.

Similarly to SQL and OpenAPI enrichers, as ML model input and output are known to Designer, when you double-click the ML Enricher node in the scenario you will see entry fields required by the model; data type hints and syntax error checking functionality will be active.
![alt_text](img/mlEnricherForm.png "ML Enricher")

From the scenario author perspective, the ML Enricher is indistinguishable from OpenAPI enricher - it just takes some input parameters and returns a value.
The following ML models are supported:
- [ONNX](https://onnx.ai/)
- [PMML](https://en.wikipedia.org/wiki/Predictive_Model_Markup_Language)
- [H2O](https://h2o.ai/)
- models fetched from [MLflow](https://mlflow.org/) models registry

![alt_text](img/mlEnricherForm.png "ML Enricher")
Please check [ML integration](../integration/MachineLearning.md) page for more details on how models are inferred by Nussknacker.
Binary file modified docs/scenarios_authoring/img/mlEnricherForm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading