From a73ce58e2f19421442c2a60a5bed83a8b5b3db0b Mon Sep 17 00:00:00 2001 From: Jakub Ledworowski Date: Thu, 28 Nov 2024 09:29:33 +0100 Subject: [PATCH 1/6] [TDX] Added basic documentation to enable TDX in ChatQnA - added README_tdx.md - described steps to run ChatQnA using helm and GMC Signed-off-by: Jakub Ledworowski --- ChatQnA/README.md | 4 + ChatQnA/kubernetes/intel/README_tdx.md | 167 +++++++++++++++++++++++++ 2 files changed, 171 insertions(+) create mode 100644 ChatQnA/kubernetes/intel/README_tdx.md diff --git a/ChatQnA/README.md b/ChatQnA/README.md index 6b7dd27ad..f98873af4 100644 --- a/ChatQnA/README.md +++ b/ChatQnA/README.md @@ -247,6 +247,10 @@ docker compose up -d Refer to the [NVIDIA GPU Guide](./docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source. +### Deploy ChatQnA into Kubernetes on Xeon with Intel TDX protection + +Refer to the [Kubernetes Guide](./kubernetes/intel/README_tdx.md) for instructions on deploying ChatQnA into Kubernetes on Xeon with services protected using Intel TDX. + ### Deploy ChatQnA into Kubernetes on Xeon & Gaudi with GMC Refer to the [Kubernetes Guide](./kubernetes/intel/README_gmc.md) for instructions on deploying ChatQnA into Kubernetes on Xeon & Gaudi with GMC. diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md new file mode 100644 index 000000000..0437540fc --- /dev/null +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -0,0 +1,167 @@ +# Deploy ChatQnA in Kubernetes Cluster on Xeon with Intel TDX + +This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server where the microservices are protected by [Intel TDX](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html). +The guide references the project [GenAIInfra](https://github.com/opea-project/GenAIInfra.git) to prepare the infrastructure. + +The deployment process is intended for users who want to deploy ChatQnA services: + +- with pods protected by Intel TDX, +- on a single node in a cluster (acting as a master and worker) that is a Xeon 5th Gen platform or later, +- running Ubuntu 24.04, +- using images pushed to public repository, like quay.io or docker hub. + +It's split into 3 sections: + +1. [Cluster Configuration](#cluster-configuration) - steps required to prepare components in the cluster required to use Intel TDX. +2. [Node configuration](#node-configuration) - additional steps to be performed on the node that are required to run heavy applications like OPEA ChatQnA. +3. [ChatQnA Services Configuration and Deployment](#chatqna-services-configuration-and-deployment) - describes how to deploy ChatQnA services with Intel TDX protection. + +> [!NOTE] +> Running TDX-protected services requires the user to define the pod's resources request (cpu, memory). +> +> Due to lack of hotplugging feature in TDX, the assigned resources cannot be changed after the pod is scheduled and the resources will not be shared with any other pod. +> +> This means, that the total amount of resources assigned to all TDX-protected pods must be less than the total amount of resources available on the node, leaving room for the non-TDX pods requests. + + +## Cluster Configuration + +To prepare cluster to run Intel TDX-protected workloads, follow [Intel Confidential Computing Documentation](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/01/introduction/index.html). + + +## Node Configuration + + +### Kubelet Configuration + +To run a complex and heavy application like OPEA, the cluster administrator must increase the kubelet timeout for container creation, otherwise the pod creation may fail due to timeout `Context deadline exceeded`. +This is required because the container creation process can take a long time due to the size of pod images and the need to download the AI models. +Run the following script on all nodes to increase the kubelet timeout to 30 minutes and restart the kubelet automatically if the setting was applied (sudo required): + +```bash +echo "Setting up the environment..." +kubelet_config="/var/lib/kubelet/config.yaml" +# save the current kubelet timeout setting +previous=$(sudo grep runtimeRequestTimeout "${kubelet_config}") +# Increase kubelet timeout +sudo sed -i 's/runtimeRequestTimeout: .*/runtimeRequestTimeout: 30m/' "${kubelet_config}" +new=$(sudo grep runtimeRequestTimeout "${kubelet_config}") +# Check if the kubelet timeout setting was updated +if [[ "$previous" == "$new" ]]; then + echo "kubelet runtimeRequestTimeout setting was not updated." +else + echo "kubelet runtimeRequestTimeout setting was updated." + echo "Updated kubelet runtimeRequestTimeout setting:" + sudo grep runtimeRequestTimeout "${kubelet_config}" + echo "Restarting kubelet..." + sudo systemctl daemon-reload && sudo systemctl restart kubelet + echo "Waiting 30s for kubelet to restart..." + sleep 30 + echo "kubelet restarted." +fi +``` + +> [!NOTE] +> The script is prepared for vanilla kubernetes installation. +> If you are using a different kubernetes distribution, the kubelet configuration file location may differ or the setting could be managed otherwise. +> +> After kubelet restart, some of the internal pods from `kube-system` namespace might be reloaded automatically. + + +## ChatQnA Services Configuration and Deployment + +To protect a single component with Intel TDX, user must modify its manifest file. +The process is described in details in the [Demo Workload Deployment](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-protected-with-intel-tdx-and-quote-verified-using-intel-trust-authority). + +As an example we will use the `llm-uservice` component from the ChatQnA pipeline and deploy it using helm charts. + +Steps: + +1. Export the address of KBS deployed in previous steps. + If the KBS was deployed in your cluster, you can get the address by running the following command: + + ```bash + export KBS_ADDRESS=http://$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}'):$(kubectl get svc kbs -n coco-tenant -o jsonpath='{.spec.ports[0].nodePort}'); \ + echo $KBS_ADDRESS + ``` + +2. Find the manifest for `llm-uservice` component (e.g.: GenAIInfra/microservices-connector/config/manifests/llm-uservice.yaml). +3. Add the following annotations to the manifest file and replace KBS_ADDRESS with actual value: + + ```yaml + apiVersion: apps/v1 + kind: Deployment + metadata: + name: llm-uservice + # (...) + spec: + selector: + matchLabels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: llm-uservice + # (...) + template: + metadata: + # (...) + annotations: + io.katacontainers.config.hypervisor.kernel_params: "agent.guest_components_rest_api=all agent.aa_kbc_params=cc_kbc::" # <<--- enable attestation through KBS and provide the KBS address to the pod + io.katacontainers.config.runtime.create_container_timeout: "600" # <<--- increase the timeout for container creation + spec: + runtimeClassName: kata-qemu-tdx # <<--- this is required to start the pod in Trust Domain (TD, virtual machine protected with Intel TDX) + initContainers: # <<--- this is required to perform attestation before the main container starts + - name: init-attestation + image: storytel/alpine-bash-curl:latest + command: ["/bin/sh","-c"] + args: + - | + echo starting; + (curl http://127.0.0.1:8006/aa/token\?token_type\=kbs | grep -iv "get token failed" | grep -iv "error" | grep -i token && echo "ATTESTATION COMPLETED SUCCESSFULLY") || (echo "ATTESTATION FAILED" && exit 1); + containers: + - name: llm-uservice + # (...) + resources: # <<--- specify resources enough to run the service efficiently (memory must be at least 2x the image size) + limits: + cpu: "4" + memory: 4Gi + requests: + cpu: "4" + memory: 4Gi + ``` + + Note, that due to the nature of TDX, the resources assigned to the pod cannot be shared with any other pod. + +4. Deploy the GMC as usual using helm: + + ```bash + helm install -n system --create-namespace gmc . + ``` + +5. After the `gmc-controller` pod is running, deploy the chatqna: + + ```bash + kubectl create ns chatqa; \ + kubectl apply -f cpu/xeon/gmc/chatQnA_xeon.yaml + ``` + +6. After the services are up, you may verify that the `llm-uservice` is running in a Trust Domain by checking the pod's status: + + ```bash + # Find the pod name + POD_NAME=$(kubectl get pods -n chatqa | grep 'llm-svc-deployment-' | awk '{print $1}') + # Print the runtimeClassName + kubectl get pod $POD_NAME -n chatqa -o jsonpath='{.spec.runtimeClassName}' + echo "" + # Find the initContainer name + INIT_CONTAINER_NAME=$(kubectl get pod $POD_NAME -n chatqa -o jsonpath='{.spec.initContainers[0].name}') + # Print the logs of the initContainer + kubectl logs $POD_NAME -n chatqa -c $INIT_CONTAINER_NAME | grep -i attestation + ``` + + The output should contain the `kata-qemu-tdx` runtimeClassName and the `ATTESTATION COMPLETED SUCCESSFULLY` message. + + ```text + kata-qemu-tdx + ATTESTATION COMPLETED SUCCESSFULLY + ``` + +At this point you have successfully deployed the ChatQnA services with the `llm-uservice` component running in a Trust Domain protected by Intel TDX. From 635655816c38db05b751efdbeed766076513f1da Mon Sep 17 00:00:00 2001 From: Jakub Ledworowski Date: Mon, 9 Dec 2024 16:19:13 +0100 Subject: [PATCH 2/6] [TDX] Improved TDX enabling guide - Removed deployment option with helm - Added sample chatqna_tdx.yaml - Generalized description but left ChatQnA as an example Signed-off-by: Jakub Ledworowski --- ChatQnA/kubernetes/intel/README_tdx.md | 197 +-- .../intel/cpu/xeon/manifest/chatqna_tdx.yaml | 1092 +++++++++++++++++ 2 files changed, 1199 insertions(+), 90 deletions(-) create mode 100644 ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md index 0437540fc..65146af0c 100644 --- a/ChatQnA/kubernetes/intel/README_tdx.md +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -1,9 +1,8 @@ -# Deploy ChatQnA in Kubernetes Cluster on Xeon with Intel TDX +# Deploy example application in Kubernetes Cluster on Xeon with Intel TDX -This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server where the microservices are protected by [Intel TDX](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html). -The guide references the project [GenAIInfra](https://github.com/opea-project/GenAIInfra.git) to prepare the infrastructure. +This document outlines the deployment process for an example application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server where the microservices are protected by [Intel TDX](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html). -The deployment process is intended for users who want to deploy ChatQnA services: +The deployment process is intended for users who want to deploy an example application: - with pods protected by Intel TDX, - on a single node in a cluster (acting as a master and worker) that is a Xeon 5th Gen platform or later, @@ -14,7 +13,7 @@ It's split into 3 sections: 1. [Cluster Configuration](#cluster-configuration) - steps required to prepare components in the cluster required to use Intel TDX. 2. [Node configuration](#node-configuration) - additional steps to be performed on the node that are required to run heavy applications like OPEA ChatQnA. -3. [ChatQnA Services Configuration and Deployment](#chatqna-services-configuration-and-deployment) - describes how to deploy ChatQnA services with Intel TDX protection. +3. [Deployment of services protected with Intel TDX](#deployment-of-services-protected-with-intel-tdx) - describes how to deploy an example application with services protected using Intel TDX. > [!NOTE] > Running TDX-protected services requires the user to define the pod's resources request (cpu, memory). @@ -31,6 +30,9 @@ To prepare cluster to run Intel TDX-protected workloads, follow [Intel Confident ## Node Configuration +This section outlines required changes to be performed on each node. +These steps might be automated with various configuration management tools like Ansible, Puppet, Chef, etc. + ### Kubelet Configuration @@ -67,101 +69,116 @@ fi > > After kubelet restart, some of the internal pods from `kube-system` namespace might be reloaded automatically. +All kubelet configuration options can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/). + + +## Deployment of services protected with Intel TDX + +This section describes how to deploy an example application with services protected using Intel TDX: -## ChatQnA Services Configuration and Deployment +1. [Overview of the changes needed](#overview-of-the-changes-needed) - describes the changes required to protect a single component with Intel TDX. +2. [Example deployment of ChatQnA with TDX protection](#example-deployment-of-chatqna-with-tdx-protection) - provides a quick start to run ChatQnA example application with all services protected with Intel TDX. +3. [Customization of deployment configuration](#customization-of-deployment-configuration) - describes how to manually modify the deployment configuration to protect a single component with Intel TDX. + + +### Overview of the changes needed To protect a single component with Intel TDX, user must modify its manifest file. The process is described in details in the [Demo Workload Deployment](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-protected-with-intel-tdx-and-quote-verified-using-intel-trust-authority). -As an example we will use the `llm-uservice` component from the ChatQnA pipeline and deploy it using helm charts. - -Steps: - -1. Export the address of KBS deployed in previous steps. - If the KBS was deployed in your cluster, you can get the address by running the following command: - - ```bash - export KBS_ADDRESS=http://$(kubectl get nodes -o jsonpath='{.items[0].status.addresses[0].address}'):$(kubectl get svc kbs -n coco-tenant -o jsonpath='{.spec.ports[0].nodePort}'); \ - echo $KBS_ADDRESS - ``` - -2. Find the manifest for `llm-uservice` component (e.g.: GenAIInfra/microservices-connector/config/manifests/llm-uservice.yaml). -3. Add the following annotations to the manifest file and replace KBS_ADDRESS with actual value: - - ```yaml - apiVersion: apps/v1 - kind: Deployment - metadata: - name: llm-uservice - # (...) - spec: - selector: - matchLabels: - app.kubernetes.io/name: llm-uservice - app.kubernetes.io/instance: llm-uservice - # (...) - template: - metadata: - # (...) - annotations: - io.katacontainers.config.hypervisor.kernel_params: "agent.guest_components_rest_api=all agent.aa_kbc_params=cc_kbc::" # <<--- enable attestation through KBS and provide the KBS address to the pod - io.katacontainers.config.runtime.create_container_timeout: "600" # <<--- increase the timeout for container creation - spec: - runtimeClassName: kata-qemu-tdx # <<--- this is required to start the pod in Trust Domain (TD, virtual machine protected with Intel TDX) - initContainers: # <<--- this is required to perform attestation before the main container starts - - name: init-attestation - image: storytel/alpine-bash-curl:latest - command: ["/bin/sh","-c"] - args: - - | - echo starting; - (curl http://127.0.0.1:8006/aa/token\?token_type\=kbs | grep -iv "get token failed" | grep -iv "error" | grep -i token && echo "ATTESTATION COMPLETED SUCCESSFULLY") || (echo "ATTESTATION FAILED" && exit 1); - containers: - - name: llm-uservice - # (...) - resources: # <<--- specify resources enough to run the service efficiently (memory must be at least 2x the image size) - limits: - cpu: "4" - memory: 4Gi - requests: - cpu: "4" - memory: 4Gi - ``` +Here, we describe the required changes on the example Deployment definition below: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: llm-uservice + # (...) +spec: + selector: + matchLabels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: llm-uservice + # (...) + template: + metadata: + # (...) + annotations: + io.katacontainers.config.runtime.create_container_timeout: "600" # <<--- increase the timeout for container creation + spec: + runtimeClassName: kata-qemu-tdx # <<--- this is required to start the pod in Trust Domain (TD, virtual machine protected with Intel TDX) + containers: + - name: llm-uservice + # (...) + resources: # <<--- specify resources enough to run the service efficiently (memory must be at least 2x the image size) + limits: + cpu: "4" + memory: 4Gi + requests: + cpu: "4" + memory: 4Gi +``` + + +### Example deployment of ChatQnA with TDX protection + +As an example we will use the ChatQnA application. +If you want to just give it a try, simply run: + +```bash +kubectl apply -f chatqna_tdx.yaml +``` - Note, that due to the nature of TDX, the resources assigned to the pod cannot be shared with any other pod. +After a few minutes, the ChatQnA services should be up and running in the cluster and all of them will be protected with Intel TDX. +You may verify, that the pods are running with the TDX-protection by checking the runtime class name, e.g.: + +```bash +POD_NAME=$(kubectl get pods | grep 'chatqna-tgi' | awk '{print $1}') +kubectl get pod $POD_NAME -o jsonpath='{.spec.runtimeClassName}' +``` + +In the output you should see: + +```text +kata-qemu-tdx +``` + +This is a simple indicator that the pod is running in a Trust Domain protected by Intel TDX. +However, for a production use-case, the attestation process is crucial to verify the integrity of the pod. +You may read more about how to enable attestation [here](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-protected-with-intel-tdx-and-quote-verified-using-intel-trust-authority). -4. Deploy the GMC as usual using helm: - ```bash - helm install -n system --create-namespace gmc . - ``` - -5. After the `gmc-controller` pod is running, deploy the chatqna: +### Customization of deployment configuration + +If you want to have more control over what is protected with Intel TDX or use a different deployment file, you can manually modify the deployment configuration, by following the steps below: + +1. Run the script to modify the chosen services with the changes described in [previous section](#overview-of-the-changes-needed): ```bash - kubectl create ns chatqa; \ - kubectl apply -f cpu/xeon/gmc/chatQnA_xeon.yaml + SERVICES=("llm-uservice") + FILE=chatqna.yaml + for SERVICE in "${SERVICES[@]}"; do + yq eval ' + (select(.kind == "Deployment" and .metadata.name == "'"$SERVICE"'") | .spec.template.metadata.annotations."io.katacontainers.config.runtime.create_container_timeout") = "800" + ' "$FILE" -i; + yq eval ' + (select(.kind == "Deployment" and .metadata.name == "'"$SERVICE"'") | .spec.template.spec.runtimeClassName) = "kata-qemu-tdx" + ' "$FILE" -i; + done ``` - -6. After the services are up, you may verify that the `llm-uservice` is running in a Trust Domain by checking the pod's status: - - ```bash - # Find the pod name - POD_NAME=$(kubectl get pods -n chatqa | grep 'llm-svc-deployment-' | awk '{print $1}') - # Print the runtimeClassName - kubectl get pod $POD_NAME -n chatqa -o jsonpath='{.spec.runtimeClassName}' - echo "" - # Find the initContainer name - INIT_CONTAINER_NAME=$(kubectl get pod $POD_NAME -n chatqa -o jsonpath='{.spec.initContainers[0].name}') - # Print the logs of the initContainer - kubectl logs $POD_NAME -n chatqa -c $INIT_CONTAINER_NAME | grep -i attestation - ``` - - The output should contain the `kata-qemu-tdx` runtimeClassName and the `ATTESTATION COMPLETED SUCCESSFULLY` message. - - ```text - kata-qemu-tdx - ATTESTATION COMPLETED SUCCESSFULLY + +2. For each service, define the resources that must be assigned to the pod to run the service efficiently. + The resources must be defined in the `resources` section of the pod's container definition. + The `memory` must be at least 2x the image size. + The `cpu` and `memory` resources must be defined at least in `limits` sections. + By default, the pod will be assigned 1 CPU and 2048 MiB of memory, but half of it will be used for filesystem. + +3. Apply the changes to the deployment configuration: + + ```bash + kubectl apply -f chatqna.yaml ``` -At this point you have successfully deployed the ChatQnA services with the `llm-uservice` component running in a Trust Domain protected by Intel TDX. +### Troubleshoting + +In case of any problems regarding pod creation, refer to [Troubleshooting guide](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/04/troubleshooting/). diff --git a/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml b/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml new file mode 100644 index 000000000..cf72af30f --- /dev/null +++ b/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml @@ -0,0 +1,1092 @@ +--- +# Source: chatqna/charts/data-prep/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-data-prep-config + labels: + helm.sh/chart: data-prep-1.0.0 + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +data: + TEI_ENDPOINT: "http://chatqna-tei" + EMBED_MODEL: "" + REDIS_URL: "redis://chatqna-redis-vector-db:6379" + INDEX_NAME: "rag-redis" + KEY_INDEX_NAME: "file-keys" + SEARCH_BATCH_SIZE: "10" + HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here" + HF_HOME: "/tmp/.cache/huggingface" + http_proxy: "" + https_proxy: "" + no_proxy: "" + LOGFLAG: "" +--- +# Source: chatqna/charts/retriever-usvc/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-retriever-usvc-config + labels: + helm.sh/chart: retriever-usvc-1.0.0 + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +data: + TEI_EMBEDDING_ENDPOINT: "http://chatqna-tei" + EMBED_MODEL: "" + REDIS_URL: "redis://chatqna-redis-vector-db:6379" + INDEX_NAME: "rag-redis" + EASYOCR_MODULE_PATH: "/tmp/.EasyOCR" + http_proxy: "" + https_proxy: "" + no_proxy: "" + HF_HOME: "/tmp/.cache/huggingface" + HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here" + LOGFLAG: "" +--- +# Source: chatqna/charts/tei/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-tei-config + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "BAAI/bge-base-en-v1.5" + PORT: "2081" + http_proxy: "" + https_proxy: "" + no_proxy: "" + NUMBA_CACHE_DIR: "/tmp" + TRANSFORMERS_CACHE: "/tmp/transformers_cache" + HF_HOME: "/tmp/.cache/huggingface" + MAX_WARMUP_SEQUENCE_LENGTH: "512" +--- +# Source: chatqna/charts/teirerank/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-teirerank-config + labels: + helm.sh/chart: teirerank-1.0.0 + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "BAAI/bge-reranker-base" + PORT: "2082" + http_proxy: "" + https_proxy: "" + no_proxy: "" + NUMBA_CACHE_DIR: "/tmp" + TRANSFORMERS_CACHE: "/tmp/transformers_cache" + HF_HOME: "/tmp/.cache/huggingface" +--- +# Source: chatqna/charts/tgi/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-tgi-config + labels: + helm.sh/chart: tgi-1.0.0 + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "2.1.0" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "Intel/neural-chat-7b-v3-3" + PORT: "2080" + HF_TOKEN: "insert-your-huggingface-token-here" + http_proxy: "" + https_proxy: "" + no_proxy: "" + HABANA_LOGS: "/tmp/habana_logs" + NUMBA_CACHE_DIR: "/tmp" + HF_HOME: "/tmp/.cache/huggingface" + CUDA_GRAPHS: "0" +--- +# Source: chatqna/templates/nginx-deployment.yaml +apiVersion: v1 +data: + default.conf: |+ + # Copyright (C) 2024 Intel Corporation + # SPDX-License-Identifier: Apache-2.0 + + + server { + listen 80; + listen [::]:80; + + proxy_connect_timeout 600; + proxy_send_timeout 600; + proxy_read_timeout 600; + send_timeout 600; + + client_max_body_size 10G; + + location /home { + alias /usr/share/nginx/html/index.html; + } + + location / { + proxy_pass http://chatqna-chatqna-ui:5173; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/chatqna { + proxy_pass http://chatqna:8888; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/dataprep { + proxy_pass http://chatqna-data-prep:6007; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/dataprep/get_file { + proxy_pass http://chatqna-data-prep:6007; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/dataprep/delete_file { + proxy_pass http://chatqna-data-prep:6007; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + } + +kind: ConfigMap +metadata: + name: chatqna-nginx-config +--- +# Source: chatqna/charts/chatqna-ui/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-chatqna-ui + labels: + helm.sh/chart: chatqna-ui-1.0.0 + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 5173 + targetPort: ui + protocol: TCP + name: ui + selector: + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/data-prep/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-data-prep + labels: + helm.sh/chart: data-prep-1.0.0 + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 6007 + targetPort: 6007 + protocol: TCP + name: data-prep + selector: + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/redis-vector-db/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-redis-vector-db + labels: + helm.sh/chart: redis-vector-db-1.0.0 + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "7.2.0-v9" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 6379 + targetPort: 6379 + protocol: TCP + name: redis-service + - port: 8001 + targetPort: 8001 + protocol: TCP + name: redis-insight + selector: + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/retriever-usvc/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-retriever-usvc + labels: + helm.sh/chart: retriever-usvc-1.0.0 + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 7000 + targetPort: 7000 + protocol: TCP + name: retriever-usvc + selector: + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/tei/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-tei + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2081 + protocol: TCP + name: tei + selector: + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/teirerank/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-teirerank + labels: + helm.sh/chart: teirerank-1.0.0 + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2082 + protocol: TCP + name: teirerank + selector: + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/tgi/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-tgi + labels: + helm.sh/chart: tgi-1.0.0 + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "2.1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2080 + protocol: TCP + name: tgi + selector: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/templates/nginx-deployment.yaml +apiVersion: v1 +kind: Service +metadata: + name: chatqna-nginx +spec: + ports: + - port: 80 + protocol: TCP + targetPort: 80 + selector: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna-nginx + type: NodePort +--- +# Source: chatqna/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna + labels: + helm.sh/chart: chatqna-1.0.0 + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 8888 + targetPort: 8888 + protocol: TCP + name: chatqna + selector: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna +--- +# Source: chatqna/charts/chatqna-ui/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-chatqna-ui + labels: + helm.sh/chart: chatqna-ui-1.0.0 + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + helm.sh/chart: chatqna-ui-1.0.0 + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm + annotations: + io.katacontainers.config.runtime.create_container_timeout: "360" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: chatqna-ui + securityContext: + {} + image: "opea/chatqna-ui:latest" + imagePullPolicy: Always + ports: + - name: ui + containerPort: 5173 + protocol: TCP + resources: + limits: + memory: "2Gi" + volumeMounts: + - mountPath: /tmp + name: tmp + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/data-prep/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-data-prep + labels: + helm.sh/chart: data-prep-1.0.0 + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + annotations: + io.katacontainers.config.runtime.create_container_timeout: "360" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: chatqna + envFrom: + - configMapRef: + name: chatqna-data-prep-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: false + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/dataprep-redis:latest" + imagePullPolicy: Always + ports: + - name: data-prep + containerPort: 6007 + protocol: TCP + volumeMounts: + - mountPath: /tmp + name: tmp + livenessProbe: + failureThreshold: 24 + httpGet: + path: v1/health_check + port: data-prep + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: v1/health_check + port: data-prep + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: v1/health_check + port: data-prep + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + memory: "9Gi" + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/redis-vector-db/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-redis-vector-db + labels: + helm.sh/chart: redis-vector-db-1.0.0 + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "7.2.0-v9" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: redis-vector-db + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "redis/redis-stack:7.2.0-v9" + imagePullPolicy: Always + volumeMounts: + - mountPath: /data + name: data-volume + - mountPath: /redisinsight + name: redisinsight-volume + - mountPath: /tmp + name: tmp + ports: + - name: redis-service + containerPort: 6379 + protocol: TCP + - name: redis-insight + containerPort: 8001 + protocol: TCP + startupProbe: + tcpSocket: + port: 6379 # Probe the Redis port + initialDelaySeconds: 5 + periodSeconds: 5 + failureThreshold: 120 + resources: + {} + volumes: + - name: data-volume + emptyDir: {} + - name: redisinsight-volume + emptyDir: {} + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/retriever-usvc/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-retriever-usvc + labels: + helm.sh/chart: retriever-usvc-1.0.0 + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + annotations: + io.katacontainers.config.runtime.create_container_timeout: "360" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: chatqna + envFrom: + - configMapRef: + name: chatqna-retriever-usvc-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/retriever-redis:latest" + imagePullPolicy: Always + ports: + - name: retriever-usvc + containerPort: 7000 + protocol: TCP + volumeMounts: + - mountPath: /tmp + name: tmp + livenessProbe: + failureThreshold: 24 + httpGet: + path: v1/health_check + port: retriever-usvc + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: v1/health_check + port: retriever-usvc + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: v1/health_check + port: retriever-usvc + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + cpu: "2" + memory: "7Gi" + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/tei/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-tei + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: tei + envFrom: + - configMapRef: + name: chatqna-tei-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5" + imagePullPolicy: Always + args: + - "--auto-truncate" + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /dev/shm + name: shm + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2081 + protocol: TCP + livenessProbe: + failureThreshold: 24 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + cpu: "2" + memory: "4Gi" + volumes: + - name: model-volume + emptyDir: {} + - name: shm + emptyDir: + medium: Memory + sizeLimit: 1Gi + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/teirerank/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-teirerank + labels: + helm.sh/chart: teirerank-1.0.0 + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: teirerank + envFrom: + - configMapRef: + name: chatqna-teirerank-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5" + imagePullPolicy: Always + args: + - "--auto-truncate" + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /dev/shm + name: shm + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2082 + protocol: TCP + livenessProbe: + failureThreshold: 24 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + cpu: "2" + memory: 4Gi + volumes: + - name: model-volume + emptyDir: {} + - name: shm + emptyDir: + medium: Memory + sizeLimit: 1Gi + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/tgi/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-tgi + labels: + helm.sh/chart: tgi-1.0.0 + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "2.1.0" + app.kubernetes.io/managed-by: Helm +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + annotations: + io.katacontainers.config.runtime.create_container_timeout: "800" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: tgi + envFrom: + - configMapRef: + name: chatqna-tgi-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu" + imagePullPolicy: Always + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2080 + protocol: TCP + livenessProbe: + failureThreshold: 24 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + readinessProbe: + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + startupProbe: + failureThreshold: 240 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + resources: + limits: + cpu: "8" + memory: "80Gi" + volumes: + - name: model-volume + emptyDir: {} + - name: tmp + emptyDir: {} +--- +# Source: chatqna/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna + labels: + helm.sh/chart: chatqna-1.0.0 + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm + app: chatqna +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + null + containers: + - name: chatqna + env: + - name: LLM_SERVER_HOST_IP + value: chatqna-tgi + - name: RERANK_SERVER_HOST_IP + value: chatqna-teirerank + - name: RETRIEVER_SERVICE_HOST_IP + value: chatqna-retriever-usvc + - name: EMBEDDING_SERVER_HOST_IP + value: chatqna-tei + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/chatqna:latest" + imagePullPolicy: Always + volumeMounts: + - mountPath: /tmp + name: tmp + ports: + - name: chatqna + containerPort: 8888 + protocol: TCP + resources: + null + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/templates/nginx-deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-nginx + labels: + helm.sh/chart: chatqna-1.0.0 + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm + app: chatqna-nginx +spec: + selector: + matchLabels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna-nginx + template: + metadata: + labels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna-nginx + spec: + runtimeClassName: kata-qemu-tdx + containers: + - image: nginx:1.27.1 + imagePullPolicy: Always + name: nginx + volumeMounts: + - mountPath: /etc/nginx/conf.d + name: nginx-config-volume + securityContext: {} + volumes: + - configMap: + defaultMode: 420 + name: chatqna-nginx-config + name: nginx-config-volume From 806d9936c17aab96eb90dccf50a96411fbe2e035 Mon Sep 17 00:00:00 2001 From: Jakub Ledworowski Date: Mon, 9 Dec 2024 16:26:18 +0100 Subject: [PATCH 3/6] [TDX] Improve writing Signed-off-by: Jakub Ledworowski --- ChatQnA/kubernetes/intel/README_tdx.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md index 65146af0c..37df49300 100644 --- a/ChatQnA/kubernetes/intel/README_tdx.md +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -167,11 +167,12 @@ If you want to have more control over what is protected with Intel TDX or use a done ``` -2. For each service, define the resources that must be assigned to the pod to run the service efficiently. - The resources must be defined in the `resources` section of the pod's container definition. - The `memory` must be at least 2x the image size. - The `cpu` and `memory` resources must be defined at least in `limits` sections. - By default, the pod will be assigned 1 CPU and 2048 MiB of memory, but half of it will be used for filesystem. +2. For each service, edit the deployment file to define the resources that must be assigned to the pod to run the service efficiently: + + - The resources must be defined in the `resources` section of the pod's container definition. + - The `memory` must be at least 2x the image size. + - The `cpu` and `memory` resources must be defined at least in `limits` sections. + - By default, the pod will be assigned 1 CPU and 2048 MiB of memory, but half of it will be used for filesystem. 3. Apply the changes to the deployment configuration: From d7e377134a295f5dcdeeb3a3c8fff666099f811d Mon Sep 17 00:00:00 2001 From: Jakub Ledworowski Date: Mon, 9 Dec 2024 16:41:02 +0100 Subject: [PATCH 4/6] [TDX] Fixed paths to chatqna.yaml Signed-off-by: Jakub Ledworowski --- ChatQnA/kubernetes/intel/README_tdx.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md index 37df49300..9eee72505 100644 --- a/ChatQnA/kubernetes/intel/README_tdx.md +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -126,7 +126,7 @@ As an example we will use the ChatQnA application. If you want to just give it a try, simply run: ```bash -kubectl apply -f chatqna_tdx.yaml +kubectl apply -f cpu/xeon/manifest/chatqna_tdx.yaml ``` After a few minutes, the ChatQnA services should be up and running in the cluster and all of them will be protected with Intel TDX. @@ -156,7 +156,7 @@ If you want to have more control over what is protected with Intel TDX or use a ```bash SERVICES=("llm-uservice") - FILE=chatqna.yaml + FILE=cpu/xeon/manifest/chatqna.yaml for SERVICE in "${SERVICES[@]}"; do yq eval ' (select(.kind == "Deployment" and .metadata.name == "'"$SERVICE"'") | .spec.template.metadata.annotations."io.katacontainers.config.runtime.create_container_timeout") = "800" From 608e8699e45bf87cf5f978e8b1b19a77e86c3ae0 Mon Sep 17 00:00:00 2001 From: Jakub Ledworowski Date: Thu, 12 Dec 2024 15:32:33 +0100 Subject: [PATCH 5/6] [TDX] Simplified the descriptions; added Getting Started Signed-off-by: Jakub Ledworowski --- ChatQnA/kubernetes/intel/README_tdx.md | 129 ++++++------------------- 1 file changed, 30 insertions(+), 99 deletions(-) diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md index 9eee72505..e32c04621 100644 --- a/ChatQnA/kubernetes/intel/README_tdx.md +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -5,86 +5,42 @@ This document outlines the deployment process for an example application utilizi The deployment process is intended for users who want to deploy an example application: - with pods protected by Intel TDX, -- on a single node in a cluster (acting as a master and worker) that is a Xeon 5th Gen platform or later, +- on a single node in a cluster (acting as a master and worker) that is a Xeon 4th Gen platform or later, - running Ubuntu 24.04, - using images pushed to public repository, like quay.io or docker hub. -It's split into 3 sections: -1. [Cluster Configuration](#cluster-configuration) - steps required to prepare components in the cluster required to use Intel TDX. -2. [Node configuration](#node-configuration) - additional steps to be performed on the node that are required to run heavy applications like OPEA ChatQnA. -3. [Deployment of services protected with Intel TDX](#deployment-of-services-protected-with-intel-tdx) - describes how to deploy an example application with services protected using Intel TDX. +## Getting Started -> [!NOTE] -> Running TDX-protected services requires the user to define the pod's resources request (cpu, memory). -> -> Due to lack of hotplugging feature in TDX, the assigned resources cannot be changed after the pod is scheduled and the resources will not be shared with any other pod. -> -> This means, that the total amount of resources assigned to all TDX-protected pods must be less than the total amount of resources available on the node, leaving room for the non-TDX pods requests. +Follow the below steps on the Xeon server node to deploy the example application: +1. [Install Ubuntu 24.04 and enable Intel TDX](https://github.com/canonical/tdx/blob/noble-24.04/README.md#setup-host-os) +2. [Install Kubernetes cluster](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) +3. [Install Confidential Containers Operator](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/02/infrastructure_setup/#install-confidential-containers-operator) +4. Increase the kubelet timeout: -## Cluster Configuration - -To prepare cluster to run Intel TDX-protected workloads, follow [Intel Confidential Computing Documentation](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/01/introduction/index.html). - - -## Node Configuration - -This section outlines required changes to be performed on each node. -These steps might be automated with various configuration management tools like Ansible, Puppet, Chef, etc. - - -### Kubelet Configuration - -To run a complex and heavy application like OPEA, the cluster administrator must increase the kubelet timeout for container creation, otherwise the pod creation may fail due to timeout `Context deadline exceeded`. -This is required because the container creation process can take a long time due to the size of pod images and the need to download the AI models. -Run the following script on all nodes to increase the kubelet timeout to 30 minutes and restart the kubelet automatically if the setting was applied (sudo required): - -```bash -echo "Setting up the environment..." -kubelet_config="/var/lib/kubelet/config.yaml" -# save the current kubelet timeout setting -previous=$(sudo grep runtimeRequestTimeout "${kubelet_config}") -# Increase kubelet timeout -sudo sed -i 's/runtimeRequestTimeout: .*/runtimeRequestTimeout: 30m/' "${kubelet_config}" -new=$(sudo grep runtimeRequestTimeout "${kubelet_config}") -# Check if the kubelet timeout setting was updated -if [[ "$previous" == "$new" ]]; then - echo "kubelet runtimeRequestTimeout setting was not updated." -else - echo "kubelet runtimeRequestTimeout setting was updated." - echo "Updated kubelet runtimeRequestTimeout setting:" - sudo grep runtimeRequestTimeout "${kubelet_config}" - echo "Restarting kubelet..." - sudo systemctl daemon-reload && sudo systemctl restart kubelet - echo "Waiting 30s for kubelet to restart..." - sleep 30 - echo "kubelet restarted." -fi -``` - -> [!NOTE] -> The script is prepared for vanilla kubernetes installation. -> If you are using a different kubernetes distribution, the kubelet configuration file location may differ or the setting could be managed otherwise. -> -> After kubelet restart, some of the internal pods from `kube-system` namespace might be reloaded automatically. - -All kubelet configuration options can be found [here](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/). - - -## Deployment of services protected with Intel TDX + ```bash + sudo sed -i 's/runtimeRequestTimeout: .*/runtimeRequestTimeout: 30m/' "/var/lib/kubelet/config.yaml" + sudo systemctl daemon-reload && sudo systemctl restart kubelet + ``` + +5. Deploy ChatQnA: -This section describes how to deploy an example application with services protected using Intel TDX: + ```bash + kubectl apply -f cpu/xeon/manifest/chatqna_tdx.yaml + ``` + +6. Verify all pods are running: -1. [Overview of the changes needed](#overview-of-the-changes-needed) - describes the changes required to protect a single component with Intel TDX. -2. [Example deployment of ChatQnA with TDX protection](#example-deployment-of-chatqna-with-tdx-protection) - provides a quick start to run ChatQnA example application with all services protected with Intel TDX. -3. [Customization of deployment configuration](#customization-of-deployment-configuration) - describes how to manually modify the deployment configuration to protect a single component with Intel TDX. + ```bash + kubectl get pods + ``` -### Overview of the changes needed +## Advanced configuration To protect a single component with Intel TDX, user must modify its manifest file. -The process is described in details in the [Demo Workload Deployment](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-protected-with-intel-tdx-and-quote-verified-using-intel-trust-authority). +The details are described in the [Demo Workload Deployment](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-and-protected-by-intel-tdx). Here, we describe the required changes on the example Deployment definition below: @@ -120,39 +76,11 @@ spec: ``` -### Example deployment of ChatQnA with TDX protection - -As an example we will use the ChatQnA application. -If you want to just give it a try, simply run: - -```bash -kubectl apply -f cpu/xeon/manifest/chatqna_tdx.yaml -``` - -After a few minutes, the ChatQnA services should be up and running in the cluster and all of them will be protected with Intel TDX. -You may verify, that the pods are running with the TDX-protection by checking the runtime class name, e.g.: - -```bash -POD_NAME=$(kubectl get pods | grep 'chatqna-tgi' | awk '{print $1}') -kubectl get pod $POD_NAME -o jsonpath='{.spec.runtimeClassName}' -``` - -In the output you should see: - -```text -kata-qemu-tdx -``` - -This is a simple indicator that the pod is running in a Trust Domain protected by Intel TDX. -However, for a production use-case, the attestation process is crucial to verify the integrity of the pod. -You may read more about how to enable attestation [here](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-protected-with-intel-tdx-and-quote-verified-using-intel-trust-authority). - - ### Customization of deployment configuration If you want to have more control over what is protected with Intel TDX or use a different deployment file, you can manually modify the deployment configuration, by following the steps below: -1. Run the script to modify the chosen services with the changes described in [previous section](#overview-of-the-changes-needed): +1. Run the script to apply changes only to the chosen `SERVICES` on the `FILE` of your choice: ```bash SERVICES=("llm-uservice") @@ -167,11 +95,10 @@ If you want to have more control over what is protected with Intel TDX or use a done ``` -2. For each service, edit the deployment file to define the resources that must be assigned to the pod to run the service efficiently: +2. For each service from `SERVICES`, edit the deployment `FILE` to define the resources that must be assigned to the pod to run the service efficiently: - The resources must be defined in the `resources` section of the pod's container definition. - The `memory` must be at least 2x the image size. - - The `cpu` and `memory` resources must be defined at least in `limits` sections. - By default, the pod will be assigned 1 CPU and 2048 MiB of memory, but half of it will be used for filesystem. 3. Apply the changes to the deployment configuration: @@ -180,6 +107,10 @@ If you want to have more control over what is protected with Intel TDX or use a kubectl apply -f chatqna.yaml ``` -### Troubleshoting +> [!IMPORTANT] +> Total amount of resources assigned to all TDX-protected pods must be less than the total amount of resources available on the node, leaving room for the non-TDX pods requests. + + +## Troubleshoting In case of any problems regarding pod creation, refer to [Troubleshooting guide](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/04/troubleshooting/). From c48d38d7e6fb940af1cdf063578a7977d65d99f3 Mon Sep 17 00:00:00 2001 From: Jakub Ledworowski Date: Fri, 13 Dec 2024 10:31:21 +0100 Subject: [PATCH 6/6] [TDX] Improved description and scripts after review Signed-off-by: Jakub Ledworowski --- ChatQnA/kubernetes/intel/README_tdx.md | 46 ++++++++++++++++++-------- 1 file changed, 33 insertions(+), 13 deletions(-) diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md index e32c04621..0e9914efb 100644 --- a/ChatQnA/kubernetes/intel/README_tdx.md +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -24,13 +24,19 @@ Follow the below steps on the Xeon server node to deploy the example application sudo systemctl daemon-reload && sudo systemctl restart kubelet ``` -5. Deploy ChatQnA: +5. Change directory: ```bash - kubectl apply -f cpu/xeon/manifest/chatqna_tdx.yaml + cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest + ``` + +6. Deploy ChatQnA: + + ```bash + kubectl apply -f chatqna_tdx.yaml ``` -6. Verify all pods are running: +7. Verify all pods are running: ```bash kubectl get pods @@ -78,13 +84,29 @@ spec: ### Customization of deployment configuration -If you want to have more control over what is protected with Intel TDX or use a different deployment file, you can manually modify the deployment configuration, by following the steps below: +If you want to have more control over what is protected with Intel TDX or use a different deployment file, you can manually modify the deployment configuration, by following steps below: -1. Run the script to apply changes only to the chosen `SERVICES` on the `FILE` of your choice: +1. Change directory: + + ```bash + cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest + ``` + +2. Define the services you want to protect with Intel TDX: ```bash SERVICES=("llm-uservice") - FILE=cpu/xeon/manifest/chatqna.yaml + ``` + +3. Define the pipeline you want to deploy: + + ```bash + FILE=chatqna.yaml + ``` + +4. Run the script to add `runtimeClassName` and required annotation only to the chosen `SERVICES` in the `FILE` you defined above: + + ```bash for SERVICE in "${SERVICES[@]}"; do yq eval ' (select(.kind == "Deployment" and .metadata.name == "'"$SERVICE"'") | .spec.template.metadata.annotations."io.katacontainers.config.runtime.create_container_timeout") = "800" @@ -95,16 +117,14 @@ If you want to have more control over what is protected with Intel TDX or use a done ``` -2. For each service from `SERVICES`, edit the deployment `FILE` to define the resources that must be assigned to the pod to run the service efficiently: - - - The resources must be defined in the `resources` section of the pod's container definition. - - The `memory` must be at least 2x the image size. - - By default, the pod will be assigned 1 CPU and 2048 MiB of memory, but half of it will be used for filesystem. +5. For each service from `SERVICES`, edit the deployment `FILE` to define the resources that must be assigned to the pod to run the service efficiently. + The `memory` must be at least 2x the image size. + By default, the pod will be assigned `1 CPU` and `2048 MiB` of memory, but half of it will be used for filesystem. -3. Apply the changes to the deployment configuration: +6. Apply the changes to the deployment configuration: ```bash - kubectl apply -f chatqna.yaml + kubectl apply -f $FILE ``` > [!IMPORTANT]