diff --git a/ChatQnA/README.md b/ChatQnA/README.md index 6b7dd27ad..f98873af4 100644 --- a/ChatQnA/README.md +++ b/ChatQnA/README.md @@ -247,6 +247,10 @@ docker compose up -d Refer to the [NVIDIA GPU Guide](./docker_compose/nvidia/gpu/README.md) for more instructions on building docker images from source. +### Deploy ChatQnA into Kubernetes on Xeon with Intel TDX protection + +Refer to the [Kubernetes Guide](./kubernetes/intel/README_tdx.md) for instructions on deploying ChatQnA into Kubernetes on Xeon with services protected using Intel TDX. + ### Deploy ChatQnA into Kubernetes on Xeon & Gaudi with GMC Refer to the [Kubernetes Guide](./kubernetes/intel/README_gmc.md) for instructions on deploying ChatQnA into Kubernetes on Xeon & Gaudi with GMC. diff --git a/ChatQnA/kubernetes/intel/README_tdx.md b/ChatQnA/kubernetes/intel/README_tdx.md new file mode 100644 index 000000000..0e9914efb --- /dev/null +++ b/ChatQnA/kubernetes/intel/README_tdx.md @@ -0,0 +1,136 @@ +# Deploy example application in Kubernetes Cluster on Xeon with Intel TDX + +This document outlines the deployment process for an example application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline components on Intel Xeon server where the microservices are protected by [Intel TDX](https://www.intel.com/content/www/us/en/developer/tools/trust-domain-extensions/overview.html). + +The deployment process is intended for users who want to deploy an example application: + +- with pods protected by Intel TDX, +- on a single node in a cluster (acting as a master and worker) that is a Xeon 4th Gen platform or later, +- running Ubuntu 24.04, +- using images pushed to public repository, like quay.io or docker hub. + + +## Getting Started + +Follow the below steps on the Xeon server node to deploy the example application: + +1. [Install Ubuntu 24.04 and enable Intel TDX](https://github.com/canonical/tdx/blob/noble-24.04/README.md#setup-host-os) +2. [Install Kubernetes cluster](https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/) +3. [Install Confidential Containers Operator](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/02/infrastructure_setup/#install-confidential-containers-operator) +4. Increase the kubelet timeout: + + ```bash + sudo sed -i 's/runtimeRequestTimeout: .*/runtimeRequestTimeout: 30m/' "/var/lib/kubelet/config.yaml" + sudo systemctl daemon-reload && sudo systemctl restart kubelet + ``` + +5. Change directory: + + ```bash + cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest + ``` + +6. Deploy ChatQnA: + + ```bash + kubectl apply -f chatqna_tdx.yaml + ``` + +7. Verify all pods are running: + + ```bash + kubectl get pods + ``` + + +## Advanced configuration + +To protect a single component with Intel TDX, user must modify its manifest file. +The details are described in the [Demo Workload Deployment](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/03/demo_workload_deployment/#pod-isolated-by-kata-containers-and-protected-by-intel-tdx). + +Here, we describe the required changes on the example Deployment definition below: + +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: llm-uservice + # (...) +spec: + selector: + matchLabels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: llm-uservice + # (...) + template: + metadata: + # (...) + annotations: + io.katacontainers.config.runtime.create_container_timeout: "600" # <<--- increase the timeout for container creation + spec: + runtimeClassName: kata-qemu-tdx # <<--- this is required to start the pod in Trust Domain (TD, virtual machine protected with Intel TDX) + containers: + - name: llm-uservice + # (...) + resources: # <<--- specify resources enough to run the service efficiently (memory must be at least 2x the image size) + limits: + cpu: "4" + memory: 4Gi + requests: + cpu: "4" + memory: 4Gi +``` + + +### Customization of deployment configuration + +If you want to have more control over what is protected with Intel TDX or use a different deployment file, you can manually modify the deployment configuration, by following steps below: + +1. Change directory: + + ```bash + cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest + ``` + +2. Define the services you want to protect with Intel TDX: + + ```bash + SERVICES=("llm-uservice") + ``` + +3. Define the pipeline you want to deploy: + + ```bash + FILE=chatqna.yaml + ``` + +4. Run the script to add `runtimeClassName` and required annotation only to the chosen `SERVICES` in the `FILE` you defined above: + + ```bash + for SERVICE in "${SERVICES[@]}"; do + yq eval ' + (select(.kind == "Deployment" and .metadata.name == "'"$SERVICE"'") | .spec.template.metadata.annotations."io.katacontainers.config.runtime.create_container_timeout") = "800" + ' "$FILE" -i; + yq eval ' + (select(.kind == "Deployment" and .metadata.name == "'"$SERVICE"'") | .spec.template.spec.runtimeClassName) = "kata-qemu-tdx" + ' "$FILE" -i; + done + ``` + +5. For each service from `SERVICES`, edit the deployment `FILE` to define the resources that must be assigned to the pod to run the service efficiently. + The `memory` must be at least 2x the image size. + By default, the pod will be assigned `1 CPU` and `2048 MiB` of memory, but half of it will be used for filesystem. + +6. Apply the changes to the deployment configuration: + + ```bash + kubectl apply -f $FILE + ``` + +> [!IMPORTANT] +> Total amount of resources assigned to all TDX-protected pods must be less than the total amount of resources available on the node, leaving room for the non-TDX pods requests. + + +## Troubleshoting + +In case of any problems regarding pod creation, refer to [Troubleshooting guide](https://cc-enabling.trustedservices.intel.com/intel-confidential-containers-guide/04/troubleshooting/). diff --git a/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml b/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml new file mode 100644 index 000000000..cf72af30f --- /dev/null +++ b/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna_tdx.yaml @@ -0,0 +1,1092 @@ +--- +# Source: chatqna/charts/data-prep/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-data-prep-config + labels: + helm.sh/chart: data-prep-1.0.0 + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +data: + TEI_ENDPOINT: "http://chatqna-tei" + EMBED_MODEL: "" + REDIS_URL: "redis://chatqna-redis-vector-db:6379" + INDEX_NAME: "rag-redis" + KEY_INDEX_NAME: "file-keys" + SEARCH_BATCH_SIZE: "10" + HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here" + HF_HOME: "/tmp/.cache/huggingface" + http_proxy: "" + https_proxy: "" + no_proxy: "" + LOGFLAG: "" +--- +# Source: chatqna/charts/retriever-usvc/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-retriever-usvc-config + labels: + helm.sh/chart: retriever-usvc-1.0.0 + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +data: + TEI_EMBEDDING_ENDPOINT: "http://chatqna-tei" + EMBED_MODEL: "" + REDIS_URL: "redis://chatqna-redis-vector-db:6379" + INDEX_NAME: "rag-redis" + EASYOCR_MODULE_PATH: "/tmp/.EasyOCR" + http_proxy: "" + https_proxy: "" + no_proxy: "" + HF_HOME: "/tmp/.cache/huggingface" + HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here" + LOGFLAG: "" +--- +# Source: chatqna/charts/tei/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-tei-config + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "BAAI/bge-base-en-v1.5" + PORT: "2081" + http_proxy: "" + https_proxy: "" + no_proxy: "" + NUMBA_CACHE_DIR: "/tmp" + TRANSFORMERS_CACHE: "/tmp/transformers_cache" + HF_HOME: "/tmp/.cache/huggingface" + MAX_WARMUP_SEQUENCE_LENGTH: "512" +--- +# Source: chatqna/charts/teirerank/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-teirerank-config + labels: + helm.sh/chart: teirerank-1.0.0 + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "BAAI/bge-reranker-base" + PORT: "2082" + http_proxy: "" + https_proxy: "" + no_proxy: "" + NUMBA_CACHE_DIR: "/tmp" + TRANSFORMERS_CACHE: "/tmp/transformers_cache" + HF_HOME: "/tmp/.cache/huggingface" +--- +# Source: chatqna/charts/tgi/templates/configmap.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-tgi-config + labels: + helm.sh/chart: tgi-1.0.0 + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "2.1.0" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "Intel/neural-chat-7b-v3-3" + PORT: "2080" + HF_TOKEN: "insert-your-huggingface-token-here" + http_proxy: "" + https_proxy: "" + no_proxy: "" + HABANA_LOGS: "/tmp/habana_logs" + NUMBA_CACHE_DIR: "/tmp" + HF_HOME: "/tmp/.cache/huggingface" + CUDA_GRAPHS: "0" +--- +# Source: chatqna/templates/nginx-deployment.yaml +apiVersion: v1 +data: + default.conf: |+ + # Copyright (C) 2024 Intel Corporation + # SPDX-License-Identifier: Apache-2.0 + + + server { + listen 80; + listen [::]:80; + + proxy_connect_timeout 600; + proxy_send_timeout 600; + proxy_read_timeout 600; + send_timeout 600; + + client_max_body_size 10G; + + location /home { + alias /usr/share/nginx/html/index.html; + } + + location / { + proxy_pass http://chatqna-chatqna-ui:5173; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/chatqna { + proxy_pass http://chatqna:8888; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/dataprep { + proxy_pass http://chatqna-data-prep:6007; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/dataprep/get_file { + proxy_pass http://chatqna-data-prep:6007; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/dataprep/delete_file { + proxy_pass http://chatqna-data-prep:6007; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + } + +kind: ConfigMap +metadata: + name: chatqna-nginx-config +--- +# Source: chatqna/charts/chatqna-ui/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-chatqna-ui + labels: + helm.sh/chart: chatqna-ui-1.0.0 + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 5173 + targetPort: ui + protocol: TCP + name: ui + selector: + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/data-prep/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-data-prep + labels: + helm.sh/chart: data-prep-1.0.0 + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 6007 + targetPort: 6007 + protocol: TCP + name: data-prep + selector: + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/redis-vector-db/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-redis-vector-db + labels: + helm.sh/chart: redis-vector-db-1.0.0 + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "7.2.0-v9" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 6379 + targetPort: 6379 + protocol: TCP + name: redis-service + - port: 8001 + targetPort: 8001 + protocol: TCP + name: redis-insight + selector: + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/retriever-usvc/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-retriever-usvc + labels: + helm.sh/chart: retriever-usvc-1.0.0 + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 7000 + targetPort: 7000 + protocol: TCP + name: retriever-usvc + selector: + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/tei/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-tei + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2081 + protocol: TCP + name: tei + selector: + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/teirerank/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-teirerank + labels: + helm.sh/chart: teirerank-1.0.0 + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2082 + protocol: TCP + name: teirerank + selector: + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/charts/tgi/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna-tgi + labels: + helm.sh/chart: tgi-1.0.0 + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "2.1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2080 + protocol: TCP + name: tgi + selector: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna +--- +# Source: chatqna/templates/nginx-deployment.yaml +apiVersion: v1 +kind: Service +metadata: + name: chatqna-nginx +spec: + ports: + - port: 80 + protocol: TCP + targetPort: 80 + selector: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna-nginx + type: NodePort +--- +# Source: chatqna/templates/service.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: chatqna + labels: + helm.sh/chart: chatqna-1.0.0 + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + type: ClusterIP + ports: + - port: 8888 + targetPort: 8888 + protocol: TCP + name: chatqna + selector: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna +--- +# Source: chatqna/charts/chatqna-ui/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-chatqna-ui + labels: + helm.sh/chart: chatqna-ui-1.0.0 + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + helm.sh/chart: chatqna-ui-1.0.0 + app.kubernetes.io/name: chatqna-ui + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm + annotations: + io.katacontainers.config.runtime.create_container_timeout: "360" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: chatqna-ui + securityContext: + {} + image: "opea/chatqna-ui:latest" + imagePullPolicy: Always + ports: + - name: ui + containerPort: 5173 + protocol: TCP + resources: + limits: + memory: "2Gi" + volumeMounts: + - mountPath: /tmp + name: tmp + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/data-prep/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-data-prep + labels: + helm.sh/chart: data-prep-1.0.0 + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: data-prep + app.kubernetes.io/instance: chatqna + annotations: + io.katacontainers.config.runtime.create_container_timeout: "360" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: chatqna + envFrom: + - configMapRef: + name: chatqna-data-prep-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: false + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/dataprep-redis:latest" + imagePullPolicy: Always + ports: + - name: data-prep + containerPort: 6007 + protocol: TCP + volumeMounts: + - mountPath: /tmp + name: tmp + livenessProbe: + failureThreshold: 24 + httpGet: + path: v1/health_check + port: data-prep + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: v1/health_check + port: data-prep + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: v1/health_check + port: data-prep + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + memory: "9Gi" + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/redis-vector-db/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-redis-vector-db + labels: + helm.sh/chart: redis-vector-db-1.0.0 + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "7.2.0-v9" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: redis-vector-db + app.kubernetes.io/instance: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: redis-vector-db + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "redis/redis-stack:7.2.0-v9" + imagePullPolicy: Always + volumeMounts: + - mountPath: /data + name: data-volume + - mountPath: /redisinsight + name: redisinsight-volume + - mountPath: /tmp + name: tmp + ports: + - name: redis-service + containerPort: 6379 + protocol: TCP + - name: redis-insight + containerPort: 8001 + protocol: TCP + startupProbe: + tcpSocket: + port: 6379 # Probe the Redis port + initialDelaySeconds: 5 + periodSeconds: 5 + failureThreshold: 120 + resources: + {} + volumes: + - name: data-volume + emptyDir: {} + - name: redisinsight-volume + emptyDir: {} + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/retriever-usvc/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-retriever-usvc + labels: + helm.sh/chart: retriever-usvc-1.0.0 + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: retriever-usvc + app.kubernetes.io/instance: chatqna + annotations: + io.katacontainers.config.runtime.create_container_timeout: "360" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: chatqna + envFrom: + - configMapRef: + name: chatqna-retriever-usvc-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/retriever-redis:latest" + imagePullPolicy: Always + ports: + - name: retriever-usvc + containerPort: 7000 + protocol: TCP + volumeMounts: + - mountPath: /tmp + name: tmp + livenessProbe: + failureThreshold: 24 + httpGet: + path: v1/health_check + port: retriever-usvc + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: v1/health_check + port: retriever-usvc + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: v1/health_check + port: retriever-usvc + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + cpu: "2" + memory: "7Gi" + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/tei/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-tei + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: tei + envFrom: + - configMapRef: + name: chatqna-tei-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5" + imagePullPolicy: Always + args: + - "--auto-truncate" + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /dev/shm + name: shm + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2081 + protocol: TCP + livenessProbe: + failureThreshold: 24 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + cpu: "2" + memory: "4Gi" + volumes: + - name: model-volume + emptyDir: {} + - name: shm + emptyDir: + medium: Memory + sizeLimit: 1Gi + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/teirerank/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-teirerank + labels: + helm.sh/chart: teirerank-1.0.0 + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: teirerank + app.kubernetes.io/instance: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: teirerank + envFrom: + - configMapRef: + name: chatqna-teirerank-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-embeddings-inference:cpu-1.5" + imagePullPolicy: Always + args: + - "--auto-truncate" + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /dev/shm + name: shm + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2082 + protocol: TCP + livenessProbe: + failureThreshold: 24 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: /health + port: http + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + limits: + cpu: "2" + memory: 4Gi + volumes: + - name: model-volume + emptyDir: {} + - name: shm + emptyDir: + medium: Memory + sizeLimit: 1Gi + - name: tmp + emptyDir: {} +--- +# Source: chatqna/charts/tgi/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-tgi + labels: + helm.sh/chart: tgi-1.0.0 + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "2.1.0" + app.kubernetes.io/managed-by: Helm +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: chatqna + annotations: + io.katacontainers.config.runtime.create_container_timeout: "800" + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + {} + containers: + - name: tgi + envFrom: + - configMapRef: + name: chatqna-tgi-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu" + imagePullPolicy: Always + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2080 + protocol: TCP + livenessProbe: + failureThreshold: 24 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + readinessProbe: + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + startupProbe: + failureThreshold: 240 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + resources: + limits: + cpu: "8" + memory: "80Gi" + volumes: + - name: model-volume + emptyDir: {} + - name: tmp + emptyDir: {} +--- +# Source: chatqna/templates/deployment.yaml +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna + labels: + helm.sh/chart: chatqna-1.0.0 + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm + app: chatqna +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna + template: + metadata: + labels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna + spec: + runtimeClassName: kata-qemu-tdx + securityContext: + null + containers: + - name: chatqna + env: + - name: LLM_SERVER_HOST_IP + value: chatqna-tgi + - name: RERANK_SERVER_HOST_IP + value: chatqna-teirerank + - name: RETRIEVER_SERVICE_HOST_IP + value: chatqna-retriever-usvc + - name: EMBEDDING_SERVER_HOST_IP + value: chatqna-tei + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/chatqna:latest" + imagePullPolicy: Always + volumeMounts: + - mountPath: /tmp + name: tmp + ports: + - name: chatqna + containerPort: 8888 + protocol: TCP + resources: + null + volumes: + - name: tmp + emptyDir: {} +--- +# Source: chatqna/templates/nginx-deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: chatqna-nginx + labels: + helm.sh/chart: chatqna-1.0.0 + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "v1.0" + app.kubernetes.io/managed-by: Helm + app: chatqna-nginx +spec: + selector: + matchLabels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna-nginx + template: + metadata: + labels: + app.kubernetes.io/name: chatqna + app.kubernetes.io/instance: chatqna + app: chatqna-nginx + spec: + runtimeClassName: kata-qemu-tdx + containers: + - image: nginx:1.27.1 + imagePullPolicy: Always + name: nginx + volumeMounts: + - mountPath: /etc/nginx/conf.d + name: nginx-config-volume + securityContext: {} + volumes: + - configMap: + defaultMode: 420 + name: chatqna-nginx-config + name: nginx-config-volume