Skip to content

Commit

Permalink
doc: Add custom runtime with PVC example (#495)
Browse files Browse the repository at this point in the history
#### Motivation

I'm very happy to quickly install `kserve modelmesh` by following
[quickstart](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md),
but I encountered the problem when I want to write a python-based custom
`ServingRuntime` for model `mnist-svm.joblib` which is also used in many
guides and docs, and I also opened an [issue
](#494) for it.
However, I can hardly find the complete process which is available
online, including the community. So I pieced together some information
to make it's easier for user to do it.

#### Modifications

1. Add a REAMD for describing the complete process
2. Add a folder for custom ServingRuntime, including the python code,
Dockerfile and required library configuration file.

#### Result

The process for writing a python-based custom `ServingRuntime` is
completed, hope it can be a easy start for fresh user.

---------

Signed-off-by: zhlsunshine <[email protected]>
  • Loading branch information
zhlsunshine authored Mar 22, 2024
1 parent 8cabb80 commit 70cdc3e
Show file tree
Hide file tree
Showing 4 changed files with 378 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,279 @@
# The Python-Based Custom Runtime with Model Stored on Persistent Volume Claim

This document provides step-by-step instructions to demonstrate how to write a custom Python-based `ServingRuntime` inheriting from [MLServer's MLModel class](https://github.com/SeldonIO/MLServer/blob/master/mlserver/model.py) and deploy a model stored on persistent volume claims with it.

This example assumes that ModelMesh Serving was deployed using the [quickstart guide](https://github.com/kserve/modelmesh-serving/blob/main/docs/quickstart.md).

# Deploy a model stored on a Persistent Volume Claim

Let's use namespace `modelmesh-serving` here:

```shell
kubectl config set-context --current --namespace=modelmesh-serving
```

## 1. Create PV and PVC for storing model file

```shell
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-models-pv
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: ""
hostPath:
path: "/mnt/models"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: "my-models-pvc"
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
EOF
```

## 2. Create a pod to access the PVC

```shell
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: Pod
metadata:
name: "pvc-access"
spec:
containers:
- name: main
image: ubuntu
command: ["/bin/sh", "-ec", "sleep 10000"]
volumeMounts:
- name: "my-pvc"
mountPath: "/mnt/models"
volumes:
- name: "my-pvc"
persistentVolumeClaim:
claimName: "my-models-pvc"
EOF
```

## 3. Store the model on this persistent volume

The sample model file we used in this doc is `sklearn/mnist-svm.joblib`.

```shell
curl -sOL https://github.com/kserve/modelmesh-minio-examples/raw/main/sklearn/mnist-svm.joblib
```

Copy this model file to the `pvc-access` pod:

```shell
kubectl cp mnist-svm.joblib pvc-access:/mnt/models/
```

Verify the model exists on the persistent volume:

```shell
kubectl exec -it pvc-access -- ls -alr /mnt/models/

# total 348
# -rw-rw-r-- 1 1000 1000 344817 Mar 19 08:37 mnist-svm.joblib
# drwxr-xr-x 1 root root 4096 Mar 19 08:34 ..
# drwxr-xr-x 2 root root 4096 Mar 19 08:37 .
```

## 4. Configure ModelMesh Serving to use the persistent volume claim

Create the `model-serving-config` ConfigMap with the setting allowAnyPVC: true:

```shell
kubectl apply -f - <<EOF
---
apiVersion: v1
kind: ConfigMap
metadata:
name: model-serving-config
data:
config.yaml: |
allowAnyPVC: true
EOF
```

Verify the configuration setting:

```shell
kubectl get cm "model-serving-config" -o jsonpath="{.data['config\.yaml']}"
```

# Implement the Python-based Custom Runtime on MLServer

All of the necessary resources are contained in [custom-model](./custom-model), including the model code, and the Dockerfile.

## 1. Implement the API of MLModel

Both `load` and `predict` must be implemented to support this custom `ServingRuntime`. The code file [custom_model.py](./custom-model/custom_model.py) provides a simplified implementation of `CustomMLModel` for model `mnist-svm.joblib`. You can read more about it [here](https://github.com/kserve/modelmesh-serving/blob/main/docs/runtimes/mlserver_custom.md).

## 2. Build the custom ServingRuntime image

You can use [`mlserver`](https://mlserver.readthedocs.io/en/stable/examples/custom/README.html#building-a-custom-image) or `docker` to help to build the custom `ServingRuntime` image, and the latter is done in [Dockerfile](./custom-model/Dockerfile).

To build the image, execute the following command from within the [custom-model](./custom-model) directory.

```shell
docker build -t <DOCKER-HUB-ORG>/custom-model-server:0.1 .

```

> **Note**: Please use the `--build-arg` to add the http proxy if there is proxy in user's environment, such as:
```shell
docker build --build-arg HTTP_PROXY=http://<DOMAIN-OR-IP>:PORT --build-arg HTTPS_PROXY=http://<DOMAIN-OR-IP>:PORT -t <DOCKER-HUB-ORG>/custom-model-server:0.1 .
```

## 3. Define and Apply Custom ServingRuntime

Below, you will create a ServingRuntime using the image built above. You can learn more about the custom `ServingRuntime` template [here](https://github.com/kserve/modelmesh-serving/blob/main/docs/runtimes/mlserver_custom.md#custom-servingruntime-template).

```shell
kubectl apply -f - <<EOF
---
apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
name: my-custom-model-0.x
spec:
supportedModelFormats:
- name: custom_model
version: "1"
autoSelect: true
multiModel: true
grpcDataEndpoint: port:8001
grpcEndpoint: port:8085
containers:
- name: mlserver
image: <DOCKER-HUB-ORG>/custom-model-server:0.1
env:
- name: MLSERVER_MODELS_DIR
value: "/models/_mlserver_models/"
- name: MLSERVER_GRPC_PORT
value: "8001"
- name: MLSERVER_HTTP_PORT
value: "8002"
- name: MLSERVER_LOAD_MODELS_AT_STARTUP
value: "false"
- name: MLSERVER_MODEL_NAME
value: dummy-model
- name: MLSERVER_HOST
value: "127.0.0.1"
- name: MLSERVER_GRPC_MAX_MESSAGE_LENGTH
value: "-1"
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: "5"
memory: 1Gi
builtInAdapter:
serverType: mlserver
runtimeManagementPort: 8001
memBufferBytes: 134217728
modelLoadingTimeoutMillis: 90000
EOF
```

Verify the available `ServingRuntime`, including the custom one:

```shell
kubectl get servingruntimes

NAME DISABLED MODELTYPE CONTAINERS AGE
mlserver-1.x sklearn mlserver 10m
my-custom-model-0.x custom_model mlserver 10m
ovms-1.x openvino_ir ovms 10m
torchserve-0.x pytorch-mar torchserve 10m
triton-2.x keras triton 10m
```

## 4. Deploy the InferenceService using the custom ServingRuntime

```shell
kubectl apply -f - <<EOF
---
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: sklearn-pvc-example
annotations:
serving.kserve.io/deploymentMode: ModelMesh
spec:
predictor:
model:
modelFormat:
name: custom-model
runtime: my-custom-model-0.x
storage:
parameters:
type: pvc
name: my-models-pvc
path: mnist-svm.joblib
EOF
```

After a few seconds, this InferenceService named `sklearn-pvc-example` should be ready:

```shell
kubectl get isvc
NAME URL READY PREV LATEST PREVROLLEDOUTREVISION LATESTREADYREVISION AGE
sklearn-pvc-example grpc://modelmesh-serving.modelmesh-serving:8033 True 69s
```

## 5. Run an inference request for this InferenceService

Firstly, set up a port-forward to facilitate REST requests:

```shell
kubectl port-forward --address 0.0.0.0 service/modelmesh-serving 8008 &

# [1] running kubectl port-forward in the background
# Forwarding from 0.0.0.0:8008 -> 8008
```

Performing an inference request to the SKLearn MNIST model via `curl`. Make sure the `MODEL_NAME` variable is set correctly.

```shell
MODEL_NAME="sklearn-pvc-example"
curl -s -X POST -k "http://localhost:8008/v2/models/${MODEL_NAME}/infer" -d '{"inputs": [{ "name": "predict", "shape": [1, 64], "datatype": "FP32", "data": [0.0, 0.0, 1.0, 11.0, 14.0, 15.0, 3.0, 0.0, 0.0, 1.0, 13.0, 16.0, 12.0, 16.0, 8.0, 0.0, 0.0, 8.0, 16.0, 4.0, 6.0, 16.0, 5.0, 0.0, 0.0, 5.0, 15.0, 11.0, 13.0, 14.0, 0.0, 0.0, 0.0, 0.0, 2.0, 12.0, 16.0, 13.0, 0.0, 0.0, 0.0, 0.0, 0.0, 13.0, 16.0, 16.0, 6.0, 0.0, 0.0, 0.0, 0.0, 16.0, 16.0, 16.0, 7.0, 0.0, 0.0, 0.0, 0.0, 11.0, 13.0, 12.0, 1.0, 0.0]}]}' | jq .
{
"model_name": "sklearn-pvc-example__isvc-72fbffc584",
"outputs": [
{
"name": "predict",
"datatype": "INT64",
"shape": [1],
"data": [8]
}
]
}
```

> **Note**: `jq` is optional, it is used to format the output of the InferenceService.
To delete the resources created in this example, run the following commands:

```shell
kubectl delete isvc "sklearn-pvc-example"
kubectl delete pod "pvc-access"
kubectl delete pvc "my-models-pvc"
```
35 changes: 35 additions & 0 deletions docs/examples/python-custom-runtime/custom-model/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM python:3.9.13
# ENV LANG C.UTF-8

COPY requirements.txt ./requirements.txt
RUN pip3 install --no-cache-dir -r requirements.txt

# The custom `MLModel` implementation should be on the Python search path
# instead of relying on the working directory of the image. If using a
# single-file module, this can be accomplished with:
COPY --chown=${USER} ./custom_model.py /opt/custom_model.py
ENV PYTHONPATH=/opt/
WORKDIR /opt

# environment variables to be compatible with ModelMesh Serving
# these can also be set in the ServingRuntime, but this is recommended for
# consistency when building and testing
# reference: https://mlserver.readthedocs.io/en/latest/reference/settings.html
ENV MLSERVER_MODELS_DIR=/models/_mlserver_models \
MLSERVER_GRPC_PORT=8001 \
MLSERVER_HTTP_PORT=8002 \
MLSERVER_METRICS_PORT=8082 \
MLSERVER_LOAD_MODELS_AT_STARTUP=false \
MLSERVER_DEBUG=false \
MLSERVER_PARALLEL_WORKERS=1 \
MLSERVER_GRPC_MAX_MESSAGE_LENGTH=33554432 \
# https://github.com/SeldonIO/MLServer/pull/748
MLSERVER__CUSTOM_GRPC_SERVER_SETTINGS='{"grpc.max_metadata_size": "32768"}' \
MLSERVER_MODEL_NAME=dummy-model

# With this setting, the implementation field is not required in the model
# settings which eases integration by allowing the built-in adapter to generate
# a basic model settings file
ENV MLSERVER_MODEL_IMPLEMENTATION=custom_model.CustomMLModel

CMD mlserver start $MLSERVER_MODELS_DIR
61 changes: 61 additions & 0 deletions docs/examples/python-custom-runtime/custom-model/custom_model.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
import os
from os.path import exists
from typing import Dict, List
from mlserver import MLModel
from mlserver.utils import get_model_uri
from mlserver.types import InferenceRequest, InferenceResponse, ResponseOutput, Parameters
from mlserver.codecs import DecodedParameterName
from joblib import load

import logging
import numpy as np

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

_to_exclude = {
"parameters": {DecodedParameterName, "headers"},
'inputs': {"__all__": {"parameters": {DecodedParameterName, "headers"}}}
}

WELLKNOWN_MODEL_FILENAMES = ["mnist-svm.joblib"]


class CustomMLModel(MLModel): # pylint:disable=c-extension-no-member
async def load(self) -> bool:
model_uri = await get_model_uri(self._settings, wellknown_filenames=WELLKNOWN_MODEL_FILENAMES)
logger.info("Model load URI: {model_uri}")
if exists(model_uri):
logger.info(f"Loading MNIST model from {model_uri}")
self._model = load(model_uri)
logger.info("Model loaded successfully")
else:
logger.info(f"Model not exist in {model_uri}")
# raise FileNotFoundError(model_uri)
self.ready = False
return self.ready

self.ready = True
return self.ready

async def predict(self, payload: InferenceRequest) -> InferenceResponse:
input_data = [input_data.data for input_data in payload.inputs]
input_name = [input_data.name for input_data in payload.inputs]
input_data_array = np.array(input_data)
result = self._model.predict(input_data_array)
predictions = np.array(result)

logger.info(f"Predict result is: {result}")
return InferenceResponse(
id=payload.id,
model_name = self.name,
model_version = self.version,
outputs = [
ResponseOutput(
name = str(input_name[0]),
shape = predictions.shape,
datatype = "INT64",
data=predictions.tolist(),
)
],
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
mlserver==1.3.2
scikit-learn==0.24.2
joblib==1.0.1

0 comments on commit 70cdc3e

Please sign in to comment.