Skip to content

Monitor deployed tensorflow models with Prometheus and Grafana

Notifications You must be signed in to change notification settings

ImScientist/tensorflow-serving

Repository files navigation

Monitor deployed tensorflow models with Prometheus and Grafana

Table of Contents

  1. Create and serve multiple models
  2. Monitoring with Prometheus and Visualization with Grafana using docker-compose
  3. Monitoring with Prometheus and Visualization with Grafana using Kubernetes
  4. References

1 Create and serve multiple models

  • We have to create and export the models that will be served. They will be stored in the models directory:

    python create_models.py
  • We have to use the model server config file in models/models.config that specifies the locations of all exposed models:

    docker run -t --rm -p 8501:8501 -p 8500:8500\
        --name=serving \
        -v "$(pwd)/models/:/models/" \
        tensorflow/serving:2.11.0 \
        --model_config_file=/models/models.config \
        --allow_version_labels_for_unavailable_models=true \
        --model_config_file_poll_wait_seconds=60

    The --model_config_file_poll_wait_seconds=60 option means that the server polls model updates every 60s.

  • Test the REST API:

    # model paths  
    # /v1/models/<model name>
    # /v1/models/<model name>/versions/<version number>
    # /v1/models/<model name>/labels/<version label>
    
    MODEL_PATH=half_plus_two/versions/1
    MODEL_PATH=half_plus_ten/labels/stable
    MODEL_PATH=half_plus_ten/labels/canary
    
    curl -X POST http://localhost:8501/v1/models/${MODEL_PATH}:predict \
        -H 'Content-type: application/json' \
        -d '{"signature_name": "serving_default", "instances": [{"x": [0, 1, 2]}]}'

2 Monitoring with Prometheus and Visualization with Grafana using docker-compose

  • We will use the models/monitoring.config file that exposes a path that can be scraped by prometheus.

  • In addition, we will use the prometheus_docker_compose.yml file. The scrape_configs.metrics_path matches the path exposed in models/monitoring.config.

  • You can start the three services with docker-compose up.

  • From docker-compose.yml you can see that:

    • to read metrics from /monitoring/prometheus/metrics we had to set --rest_api_port=8501 and --monitoring_config_file=/models/monitoring.config. You can verify that the metrics are exposed by visiting http://localhost:8501/monitoring/prometheus/metrics.

    • you should be able to access the Prometheus web UI through localhost:9090. You can also execute the query :tensorflow:serving:request_count and see what happens.

    • you should be able to access the Grafana WebUI through localhost:3000. The username and password are admin and admin, respectively. In the WebUI you can add a new datasource with the url http://prometheus:9090.

    • you can access the models with the same requests that you have used in the previous section.

  • You can stop the three services with docker-compose down.

3 Monitoring with Prometheus and Visualization with Grafana using Kubernetes

The code snippets below are tested locally on the rancher-desktop context but it should work for docker-desktop users, as well. We will mention the places that were changed explicitly to make things work locally.

  • Tensorflow server

    • In the ideal case we have to mount a volume that contains the models into the pod running the tensorflow server. In order to make local testing easier we will just extend the tensorflow server image by adding to it the models:

      docker build -t tf-server:1.0.0 -f Dockerfile .
      
      # Create the server
      kubectl create namespace tfmodels
      helm install --namespace tfmodels tf-serving-chart helm/tf-serving
    • From the values.yaml in the helm chart for the tensorflow server you can see that:

      • we have enabled only the service component but not the ingress component. This has to be changed at some point.
      • we are pulling the locally stored tensorflow server image by setting pullPolicy: Never in tf-serving/values.yaml.
    • Besides values.yaml, Chart.yaml and templates/deplyment.yaml we have not changed the default template.

    • Since we have used a service of type LoadBalancer we can make API calls to the service in the same way as we did before.

  • Prometheus

    • Launch prometheus with:

      # helm repo add bitnami https://charts.bitnami.com/bitnami
      
      kubectl create namespace monitoring
      helm install --namespace monitoring \
        prometheus-chart bitnami/kube-prometheus

      You should get a message telling you under which DNS from within the cluster Prometheus can be accessed.

      Prometheus can be accessed via port "9090" on the following DNS name from within your cluster:
      
        prometheus-chart-kube-prom-prometheus.monitoring.svc.cluster.local

      To access Prometheus from outside the cluster execute the following command:

      kubectl -n monitoring port-forward svc/prometheus-chart-kube-prom-prometheus 9090:9090
      # Browse to http://127.0.0.1:9090
    • To force Prometheus to monitor our service in tfmodels namespace, create a new ServiceMonitor component:

      kubectl apply -f helm/prometheus/servicemonitor.yaml

      The labels under spec.selector.matchLabels should match the labels of the service whose metrics endpoint we want to monitor (/monitoring/prometheus/metrics). In the prometheus UI under Status -> Targets you should see that a new serviceMonitor component was discovered.

  • Grafana

    • Launch Grafana with:
      helm install --namespace monitoring \
        grafana-chart bitnami/grafana
      To access Grafana from outside the cluster execute the following command:
      kubectl -n monitoring port-forward svc/grafana-chart 8081:3000
      # Browse to http://127.0.0.1:8081 to access the service
    • You can add Prometheus as a datasource by using the previously obtained prometheus DNS name as a datasource URL: http://prometheus-chart-kube-prom-prometheus.monitoring.svc.cluster.local:9090.
    • You should be able to create your first dashboard by using the metric :tensorflow:serving:request_count and Prometheus as a data source.
    • You can make few model predictions by executing the same curl-code snippets, as before, and then running the :tensorflow:serving:request_count query in the Prometheus UI.

  • Remove all services:

    helm uninstall --namespace tfmodels tf-serving-chart
    helm uninstall --namespace monitoring prometheus-chart
    helm uninstall --namespace monitoring grafana-chart
    kubectl delete -f helm/prometheus/servicemonitor.yaml
    kubectl delete namespace tfmodels
    kubectl delete namespace monitoring

4 References