-
Notifications
You must be signed in to change notification settings - Fork 118
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
doc: Add custom runtime with PVC example (#495)
#### Motivation I'm very happy to quickly install `kserve modelmesh` by following [quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md), but I encountered the problem when I want to write a python-based custom `ServingRuntime` for model `mnist-svm.joblib` which is also used in many guides and docs, and I also opened an [issue ](#494) for it. However, I can hardly find the complete process which is available online, including the community. So I pieced together some information to make it's easier for user to do it. #### Modifications 1. Add a REAMD for describing the complete process 2. Add a folder for custom ServingRuntime, including the python code, Dockerfile and required library configuration file. #### Result The process for writing a python-based custom `ServingRuntime` is completed, hope it can be a easy start for fresh user. --------- Signed-off-by: zhlsunshine <[email protected]>
- Loading branch information
1 parent
8cabb80
commit 70cdc3e
Showing
4 changed files
with
378 additions
and
0 deletions.
There are no files selected for viewing
279 changes: 279 additions & 0 deletions
279
...s/python-custom-runtime/Python-Based-Custom-Runtime-with-Model-Stored-on-PVC.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,279 @@ | ||
# The Python-Based Custom Runtime with Model Stored on Persistent Volume Claim | ||
|
||
This document provides step-by-step instructions to demonstrate how to write a custom Python-based `ServingRuntime` inheriting from [MLServer's MLModel class](https://github.com/SeldonIO/MLServer/blob/master/mlserver/model.py) and deploy a model stored on persistent volume claims with it. | ||
|
||
This example assumes that ModelMesh Serving was deployed using the [quickstart guide](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md). | ||
|
||
# Deploy a model stored on a Persistent Volume Claim | ||
|
||
Let's use namespace `modelmesh-serving` here: | ||
|
||
```shell | ||
kubectl config set-context --current --namespace=modelmesh-serving | ||
``` | ||
|
||
## 1. Create PV and PVC for storing model file | ||
|
||
```shell | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: v1 | ||
kind: PersistentVolume | ||
metadata: | ||
name: my-models-pv | ||
spec: | ||
capacity: | ||
storage: 1Gi | ||
accessModes: | ||
- ReadWriteMany | ||
persistentVolumeReclaimPolicy: Retain | ||
storageClassName: "" | ||
hostPath: | ||
path: "/mnt/models" | ||
--- | ||
apiVersion: v1 | ||
kind: PersistentVolumeClaim | ||
metadata: | ||
name: "my-models-pvc" | ||
spec: | ||
accessModes: | ||
- ReadWriteMany | ||
resources: | ||
requests: | ||
storage: 1Gi | ||
EOF | ||
``` | ||
|
||
## 2. Create a pod to access the PVC | ||
|
||
```shell | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: v1 | ||
kind: Pod | ||
metadata: | ||
name: "pvc-access" | ||
spec: | ||
containers: | ||
- name: main | ||
image: ubuntu | ||
command: ["/bin/sh", "-ec", "sleep 10000"] | ||
volumeMounts: | ||
- name: "my-pvc" | ||
mountPath: "/mnt/models" | ||
volumes: | ||
- name: "my-pvc" | ||
persistentVolumeClaim: | ||
claimName: "my-models-pvc" | ||
EOF | ||
``` | ||
|
||
## 3. Store the model on this persistent volume | ||
|
||
The sample model file we used in this doc is `sklearn/mnist-svm.joblib`. | ||
|
||
```shell | ||
curl -sOL https://github.com/kserve/modelmesh-minio-examples/raw/main/sklearn/mnist-svm.joblib | ||
``` | ||
|
||
Copy this model file to the `pvc-access` pod: | ||
|
||
```shell | ||
kubectl cp mnist-svm.joblib pvc-access:/mnt/models/ | ||
``` | ||
|
||
Verify the model exists on the persistent volume: | ||
|
||
```shell | ||
kubectl exec -it pvc-access -- ls -alr /mnt/models/ | ||
|
||
# total 348 | ||
# -rw-rw-r-- 1 1000 1000 344817 Mar 19 08:37 mnist-svm.joblib | ||
# drwxr-xr-x 1 root root 4096 Mar 19 08:34 .. | ||
# drwxr-xr-x 2 root root 4096 Mar 19 08:37 . | ||
``` | ||
|
||
## 4. Configure ModelMesh Serving to use the persistent volume claim | ||
|
||
Create the `model-serving-config` ConfigMap with the setting allowAnyPVC: true: | ||
|
||
```shell | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: model-serving-config | ||
data: | ||
config.yaml: | | ||
allowAnyPVC: true | ||
EOF | ||
``` | ||
|
||
Verify the configuration setting: | ||
|
||
```shell | ||
kubectl get cm "model-serving-config" -o jsonpath="{.data['config\.yaml']}" | ||
``` | ||
|
||
# Implement the Python-based Custom Runtime on MLServer | ||
|
||
All of the necessary resources are contained in [custom-model](./custom-model), including the model code, and the Dockerfile. | ||
|
||
## 1. Implement the API of MLModel | ||
|
||
Both `load` and `predict` must be implemented to support this custom `ServingRuntime`. The code file [custom_model.py](./custom-model/custom_model.py) provides a simplified implementation of `CustomMLModel` for model `mnist-svm.joblib`. You can read more about it [here](https://github.com/kserve/modelmesh-serving/blob/main/docs/runtimes/mlserver_custom.md). | ||
|
||
## 2. Build the custom ServingRuntime image | ||
|
||
You can use [`mlserver`](https://mlserver.readthedocs.io/en/stable/examples/custom/README.html#building-a-custom-image) or `docker` to help to build the custom `ServingRuntime` image, and the latter is done in [Dockerfile](./custom-model/Dockerfile). | ||
|
||
To build the image, execute the following command from within the [custom-model](./custom-model) directory. | ||
|
||
```shell | ||
docker build -t <DOCKER-HUB-ORG>/custom-model-server:0.1 . | ||
|
||
``` | ||
|
||
> **Note**: Please use the `--build-arg` to add the http proxy if there is proxy in user's environment, such as: | ||
```shell | ||
docker build --build-arg HTTP_PROXY=http://<DOMAIN-OR-IP>:PORT --build-arg HTTPS_PROXY=http://<DOMAIN-OR-IP>:PORT -t <DOCKER-HUB-ORG>/custom-model-server:0.1 . | ||
``` | ||
|
||
## 3. Define and Apply Custom ServingRuntime | ||
|
||
Below, you will create a ServingRuntime using the image built above. You can learn more about the custom `ServingRuntime` template [here](https://github.com/kserve/modelmesh-serving/blob/main/docs/runtimes/mlserver_custom.md#custom-servingruntime-template). | ||
|
||
```shell | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: serving.kserve.io/v1alpha1 | ||
kind: ServingRuntime | ||
metadata: | ||
name: my-custom-model-0.x | ||
spec: | ||
supportedModelFormats: | ||
- name: custom_model | ||
version: "1" | ||
autoSelect: true | ||
multiModel: true | ||
grpcDataEndpoint: port:8001 | ||
grpcEndpoint: port:8085 | ||
containers: | ||
- name: mlserver | ||
image: <DOCKER-HUB-ORG>/custom-model-server:0.1 | ||
env: | ||
- name: MLSERVER_MODELS_DIR | ||
value: "/models/_mlserver_models/" | ||
- name: MLSERVER_GRPC_PORT | ||
value: "8001" | ||
- name: MLSERVER_HTTP_PORT | ||
value: "8002" | ||
- name: MLSERVER_LOAD_MODELS_AT_STARTUP | ||
value: "false" | ||
- name: MLSERVER_MODEL_NAME | ||
value: dummy-model | ||
- name: MLSERVER_HOST | ||
value: "127.0.0.1" | ||
- name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH | ||
value: "-1" | ||
resources: | ||
requests: | ||
cpu: 500m | ||
memory: 1Gi | ||
limits: | ||
cpu: "5" | ||
memory: 1Gi | ||
builtInAdapter: | ||
serverType: mlserver | ||
runtimeManagementPort: 8001 | ||
memBufferBytes: 134217728 | ||
modelLoadingTimeoutMillis: 90000 | ||
EOF | ||
``` | ||
|
||
Verify the available `ServingRuntime`, including the custom one: | ||
|
||
```shell | ||
kubectl get servingruntimes | ||
|
||
NAME DISABLED MODELTYPE CONTAINERS AGE | ||
mlserver-1.x sklearn mlserver 10m | ||
my-custom-model-0.x custom_model mlserver 10m | ||
ovms-1.x openvino_ir ovms 10m | ||
torchserve-0.x pytorch-mar torchserve 10m | ||
triton-2.x keras triton 10m | ||
``` | ||
|
||
## 4. Deploy the InferenceService using the custom ServingRuntime | ||
|
||
```shell | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: serving.kserve.io/v1beta1 | ||
kind: InferenceService | ||
metadata: | ||
name: sklearn-pvc-example | ||
annotations: | ||
serving.kserve.io/deploymentMode: ModelMesh | ||
spec: | ||
predictor: | ||
model: | ||
modelFormat: | ||
name: custom-model | ||
runtime: my-custom-model-0.x | ||
storage: | ||
parameters: | ||
type: pvc | ||
name: my-models-pvc | ||
path: mnist-svm.joblib | ||
EOF | ||
``` | ||
|
||
After a few seconds, this InferenceService named `sklearn-pvc-example` should be ready: | ||
|
||
```shell | ||
kubectl get isvc | ||
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE | ||
sklearn-pvc-example grpc://modelmesh-serving.modelmesh-serving:8033 True 69s | ||
``` | ||
|
||
## 5. Run an inference request for this InferenceService | ||
|
||
Firstly, set up a port-forward to facilitate REST requests: | ||
|
||
```shell | ||
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 & | ||
|
||
# [1] running kubectl port-forward in the background | ||
# Forwarding from 0.0.0.0:8008 -> 8008 | ||
``` | ||
|
||
Performing an inference request to the SKLearn MNIST model via `curl`. Make sure the `MODEL_NAME` variable is set correctly. | ||
|
||
```shell | ||
MODEL_NAME="sklearn-pvc-example" | ||
curl -s -X POST -k "http://localhost:8008/v2/models/${MODEL_NAME}/infer" -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}' | jq . | ||
{ | ||
"model_name": "sklearn-pvc-example__isvc-72fbffc584", | ||
"outputs": [ | ||
{ | ||
"name": "predict", | ||
"datatype": "INT64", | ||
"shape": [1], | ||
"data": [8] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
> **Note**: `jq` is optional, it is used to format the output of the InferenceService. | ||
To delete the resources created in this example, run the following commands: | ||
|
||
```shell | ||
kubectl delete isvc "sklearn-pvc-example" | ||
kubectl delete pod "pvc-access" | ||
kubectl delete pvc "my-models-pvc" | ||
``` |
35 changes: 35 additions & 0 deletions
35
docs/examples/python-custom-runtime/custom-model/Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
FROM python:3.9.13 | ||
# ENV LANG C.UTF-8 | ||
|
||
COPY requirements.txt ./requirements.txt | ||
RUN pip3 install --no-cache-dir -r requirements.txt | ||
|
||
# The custom `MLModel` implementation should be on the Python search path | ||
# instead of relying on the working directory of the image. If using a | ||
# single-file module, this can be accomplished with: | ||
COPY --chown=${USER} ./custom_model.py /opt/custom_model.py | ||
ENV PYTHONPATH=/opt/ | ||
WORKDIR /opt | ||
|
||
# environment variables to be compatible with ModelMesh Serving | ||
# these can also be set in the ServingRuntime, but this is recommended for | ||
# consistency when building and testing | ||
# reference: https://mlserver.readthedocs.io/en/latest/reference/settings.html | ||
ENV MLSERVER_MODELS_DIR=/models/_mlserver_models \ | ||
MLSERVER_GRPC_PORT=8001 \ | ||
MLSERVER_HTTP_PORT=8002 \ | ||
MLSERVER_METRICS_PORT=8082 \ | ||
MLSERVER_LOAD_MODELS_AT_STARTUP=false \ | ||
MLSERVER_DEBUG=false \ | ||
MLSERVER_PARALLEL_WORKERS=1 \ | ||
MLSERVER_GRPC_MAX_MESSAGE_LENGTH=33554432 \ | ||
# https://github.com/SeldonIO/MLServer/pull/748 | ||
MLSERVER__CUSTOM_GRPC_SERVER_SETTINGS='{"grpc.max_metadata_size": "32768"}' \ | ||
MLSERVER_MODEL_NAME=dummy-model | ||
|
||
# With this setting, the implementation field is not required in the model | ||
# settings which eases integration by allowing the built-in adapter to generate | ||
# a basic model settings file | ||
ENV MLSERVER_MODEL_IMPLEMENTATION=custom_model.CustomMLModel | ||
|
||
CMD mlserver start $MLSERVER_MODELS_DIR |
61 changes: 61 additions & 0 deletions
61
docs/examples/python-custom-runtime/custom-model/custom_model.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
import os | ||
from os.path import exists | ||
from typing import Dict, List | ||
from mlserver import MLModel | ||
from mlserver.utils import get_model_uri | ||
from mlserver.types import InferenceRequest, InferenceResponse, ResponseOutput, Parameters | ||
from mlserver.codecs import DecodedParameterName | ||
from joblib import load | ||
|
||
import logging | ||
import numpy as np | ||
|
||
logging.basicConfig(level=logging.INFO) | ||
logger = logging.getLogger(__name__) | ||
|
||
_to_exclude = { | ||
"parameters": {DecodedParameterName, "headers"}, | ||
'inputs': {"__all__": {"parameters": {DecodedParameterName, "headers"}}} | ||
} | ||
|
||
WELLKNOWN_MODEL_FILENAMES = ["mnist-svm.joblib"] | ||
|
||
|
||
class CustomMLModel(MLModel): # pylint:disable=c-extension-no-member | ||
async def load(self) -> bool: | ||
model_uri = await get_model_uri(self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES) | ||
logger.info("Model load URI: {model_uri}") | ||
if exists(model_uri): | ||
logger.info(f"Loading MNIST model from {model_uri}") | ||
self._model = load(model_uri) | ||
logger.info("Model loaded successfully") | ||
else: | ||
logger.info(f"Model not exist in {model_uri}") | ||
# raise FileNotFoundError(model_uri) | ||
self.ready = False | ||
return self.ready | ||
|
||
self.ready = True | ||
return self.ready | ||
|
||
async def predict(self, payload: InferenceRequest) -> InferenceResponse: | ||
input_data = [input_data.data for input_data in payload.inputs] | ||
input_name = [input_data.name for input_data in payload.inputs] | ||
input_data_array = np.array(input_data) | ||
result = self._model.predict(input_data_array) | ||
predictions = np.array(result) | ||
|
||
logger.info(f"Predict result is: {result}") | ||
return InferenceResponse( | ||
id=payload.id, | ||
model_name = self.name, | ||
model_version = self.version, | ||
outputs = [ | ||
ResponseOutput( | ||
name = str(input_name[0]), | ||
shape = predictions.shape, | ||
datatype = "INT64", | ||
data=predictions.tolist(), | ||
) | ||
], | ||
) |
3 changes: 3 additions & 0 deletions
3
docs/examples/python-custom-runtime/custom-model/requirements.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
mlserver==1.3.2 | ||
scikit-learn==0.24.2 | ||
joblib==1.0.1 |