From 44d9c2833ee8fba70c5cb9cecf37abbd27e58653 Mon Sep 17 00:00:00 2001 From: devpramod Date: Wed, 25 Sep 2024 15:57:29 +0000 Subject: [PATCH 01/19] add k8s docs for getting started and helm Signed-off-by: devpramod --- .../ChatQnA/deploy/k8s_getting_started.md | 76 +++ examples/ChatQnA/deploy/k8s_helm.md | 572 ++++++++++++++++++ 2 files changed, 648 insertions(+) create mode 100644 examples/ChatQnA/deploy/k8s_getting_started.md create mode 100644 examples/ChatQnA/deploy/k8s_helm.md diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md new file mode 100644 index 00000000..64766ca2 --- /dev/null +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -0,0 +1,76 @@ +# Getting Started + +## Introduction +Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences: + +- **Using GMC ( GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management. +- **Using Manifests**: This involves deploying directly using Kubernetes manifest files without the GenAI Microservices Connector (GMC). +- **Using Helm Charts**: Facilitates deployment through Helm, which manages Kubernetes applications through packages of pre-configured Kubernetes resources. + +This guide will provide detailed instructions on using these resources. If you're already familiar with Kubernetes, feel free to skip ahead to (**Deploy using Helm**) + +### Kubernetes Cluster and Development Environment + +**Setting Up the Kubernetes Cluster:** Before beginning deployment for the ChatQnA application, ensure that a Kubernetes cluster is ready. For guidance on setting up your Kubernetes cluster, please refer to the comprehensive setup instructions available on the [Opea Project deployment guide](https://opea-project.github.io/latest/deploy/index.html). + +**Development Pre-requisites:** To prepare for the deployment, familiarize yourself with the necessary development tools and configurations by visiting the [GenAI Infrastructure development page](https://opea-project.github.io/latest/GenAIInfra/DEVELOPMENT.html). This page covers all the essential tools and settings needed for effective development within the Kubernetes environment. + + +**Understanding Kubernetes Deployment Tools and Resources:** + +- **kubectl**: This command-line tool allows you to deploy applications, inspect and manage cluster resources, and view logs. For instance, `kubectl apply -f chatqna.yaml` would be used to deploy resources defined in a manifest file. + +- **Pods**: Pods are the smallest deployable units created and managed by Kubernetes. A pod typically encapsulates one or more containers where your application runs. + +**Verifying Kubernetes Cluster Access with kubectl** +```bash +kubectl get nodes +``` + +This command lists all the nodes in the cluster, verifying that `kubectl` is correctly configured and has the necessary permissions to interact with the cluster. + +Some commonly used kubectl commands and their functions that will help deploy ChatQnA successfully: +|Command |Function | +|------------------------------- |-----------------------------| +|kubectl describe pod [`pod-name`] | Provides detailed information about a specific pod, including its current state, recent events, and configuration details. | +|kubectl delete deployments --all | Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources. | +|kubectl get pods -o wide | Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on. | +|kubectl logs [`pod-name`] | Fetches the logs generated by a container in a specific pod, useful for debugging and monitoring application behavior. | +|kubectl get svc | Lists all services in the current namespace, providing a quick overview of the network services and their status. + +#### Create and Set Namespace +A Kubernetes namespace is a logical division within a cluster that is used to isolate different environments, teams, or projects, allowing for finer control over resources and access management. To create a namespace called `chatqa`, use: +```bash +kubectl create ns chatqa +``` +When deploying resources (like pods, services, etc.) into your specific namespace, use the `--namespace` flag with `kubectl` commands, or specify the namespace in your resource configuration files. + +To deploy a pod in the `chatqa` namespace: +```bash +kubectl apply -f your-pod-config.yaml --namespace=chatqa +``` +If you want to avoid specifying the namespace with every command, you can set the default namespace for your current context: +```bash +kubectl config set-context --current --namespace=chatqa +``` + +### Using Helm Charts to Deploy + +**What is Helm?** Helm is a package manager for Kubernetes, similar to how apt is for Ubuntu. It simplifies deploying and managing Kubernetes applications through Helm charts, which are packages of pre-configured Kubernetes resources. + +**Key Components of a Helm Chart:** + +- **Chart.yaml**: This file contains metadata about the chart such as name, version, and description. +- **values.yaml**: Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. +- **deployment.yaml**: Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. + +**Update Dependencies:** + +- A script called **./update_dependency.sh** is provided which is used to update chart dependencies, ensuring all nested charts are at their latest versions. +- The command `helm dependency update chatqna` updates the dependencies for the `chatqna` chart based on the versions specified in `Chart.yaml`. + +**Helm Install Command:** + +- `helm install [RELEASE_NAME] [CHART_NAME]`: This command deploys a Helm chart into your Kubernetes cluster, creating a new release. It is used to set up all the Kubernetes resources specified in the chart and track the version of the deployment. + +For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md new file mode 100644 index 00000000..d88bb907 --- /dev/null +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -0,0 +1,572 @@ +# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm Charts + +This deployment section covers multi-node on-prem deployment of the ChatQnA +example with OPEA comps to deploy using the TGI service. There are several +slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will +be covering one option of doing it for convenience : we will be showcasing how +to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, +deployed on a Kubernetes cluster. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, +please follow the instructions here (*** ### Kubernetes Cluster and Development Environment***). +For a quick introduction on Helm Charts, visit the helm section in (**getting started**) + +## Overview + +There are several ways to setup a ChatQnA use case. Here in this tutorial, we +will walk through how to enable the below list of microservices from OPEA +GenAIComps to deploy a multi-node TGI megaservice solution. +> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node. + +1. Data Prep +2. Embedding +3. Retriever +4. Reranking +5. LLM with TGI + +## Prerequisites + +### Install Helm +First, ensure that Helm (version >= 3.15) is installed on your system. Helm is an essential tool for managing Kubernetes applications. It simplifies the deployment and management of Kubernetes applications using Helm charts. +For detailed installation instructions, please refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) + +### Clone Repository +First step is to clone the GenAIInfra which is the containerization and cloud native suite for OPEA, including artifacts to deploy ChatQnA in a cloud native way. + +```bash +git clone https://github.com/opea-project/GenAIInfra.git +``` +Checkout the release tag +``` +cd GenAIInfra/helm-charts/ +git checkout tags/v1.0 +``` +### HF Token +The example can utilize model weights from HuggingFace and langchain. + +Setup your [HuggingFace](https://huggingface.co/) account and generate +[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Setup the HuggingFace token +``` +export HF_TOKEN="Your_Huggingface_API_Token" +``` + +### Proxy Settings +Make sure to setup Proxies if you are behind a firewall. +For services requiring internet access, such as the LLM microservice, embedding service, reranking service, and other backend services, proxy settings can be essential. These settings ensure services can download necessary content from the internet, especially when behind a corporate firewall. +Proxy can be set in the `values.yaml` file, like so: +Open the `values.yaml` file using an editor +```bash +vi GenAIInfra/helm-charts/chatqna/values.yaml +``` +Update the following section and save file: +```yaml +global: + http_proxy: "http://your-proxy-address:port" + https_proxy: "http://your-proxy-address:port" + no_proxy: "localhost,127.0.0.1,localaddress,.localdomain.com" +``` +## Use Case Setup +The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as Docker image sources and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common). + +This use case employs a tailored combination of Helm charts and `values.yaml` configurations to deploy the following components and tools: +|use case components | Tools | Model | Service Type | +|---------------- |--------------|-----------------------------|-------| +|Data Prep | LangChain | NA |OPEA Microservice | +|VectorDB | Redis | NA |Open source service| +|Embedding | TEI | BAAI/bge-base-en-v1.5 |OPEA Microservice | +|Reranking | TEI | BAAI/bge-reranker-base | OPEA Microservice | +|LLM | TGI |Intel/neural-chat-7b-v3-3 |OPEA Microservice | +|UI | | NA | Gateway Service | +Tools and models mentioned in the table are configurable either through the +environment variable or `values.yaml` + +Set a new [namespace](#create-and-set-namespace) and switch to it if needed + +To enable UI, uncomment the lines `54-58` in `GenAIInfra/helm-charts/chatqna/values.yaml`: +```bash +chatqna-ui: + image: + repository: "opea/chatqna-ui" + tag: "latest" + containerPort: "5173" +``` + + +Next, we will update the dependencies for all Helm charts in the specified directory and ensure the `chatqna` Helm chart is ready for deployment by updating its dependencies as defined in the `Chart.yaml` file. + +```bash +# all Helm charts in the specified directory have their +# dependencies up-to-date, facilitating consistent deployments. +./update_dependency.sh + +# "chatqna" here refers to the directory name that contains the Helm +# chart for the ChatQnA application +helm dependency update chatqna +``` + +Set the necessary environment variables to setup the use case +```bash +export MODELDIR="/mnt/opea-models" #export MODELDIR="null" if you don't want to cache the model. +export MODELNAME="Intel/neural-chat-7b-v3-3" +export EMBEDDING_MODELNAME="BAAI/bge-base-en-v1.5" +export RERANKER_MODELNAME="BAAI/bge-reranker-base" +``` + +## Deploy the use case +In this tutorial, we will be deploying using Helm with the provided chart. The Helm install commands will initiate all the aforementioned services as Kubernetes pods. + +```bash +helm install chatqna chatqna \ + --set global.HUGGINGFACEHUB_API_TOKEN=${HF_TOKEN} \ + --set global.modelUseHostPath=${MODELDIR} \ + --set tgi.LLM_MODEL_ID=${MODELNAME} \ + --set tei.EMBEDDING_MODEL_ID=${EMBEDDING_MODELNAME} \ + --set teirerank.RERANK_MODEL_ID=${RERANKER_MODELNAME} +``` + +**OUTPUT:** +```bash +NAME: chatqna +LAST DEPLOYED: Thu Sep 5 13:40:20 2024 +NAMESPACE: chatqa +STATUS: deployed +REVISION: 1 + +``` + + +### Validate microservice +#### Check the pod status +Check if all the pods launched via Helm have started. + +For example, the ChatQnA deployment starts 12 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'. To perform a quick sanity check, use the command `kubectl get pods` to see if all the pods are active. +``` +NAME READY STATUS RESTARTS AGE +chatqna-5cd6b44f98-7tdnk 1/1 Running 0 15m +chatqna-chatqna-ui-b9984f596-4pckn 1/1 Running 0 15m +chatqna-data-prep-7496bcf74-gj2fm 1/1 Running 0 15m +chatqna-embedding-usvc-79c9795545-5zpk5 1/1 Running 0 15m +chatqna-llm-uservice-564c497d65-kw6b2 1/1 Running 0 15m +chatqna-nginx-67fc749576-krmxs 1/1 Running 0 15m +chatqna-redis-vector-db-798f474769-5g7bh 1/1 Running 0 15m +chatqna-reranking-usvc-767545c6ff-966w2 1/1 Running 0 15m +chatqna-retriever-usvc-5ccf966546-446dd 1/1 Running 0 15m +chatqna-tei-7b987585c9-nwncb 1/1 Running 0 15m +chatqna-teirerank-fd745dcd5-md2l5 1/1 Running 0 15m +chatqna-tgi-675c4d79f6-cf4pq 1/1 Running 0 15m + + +``` +> **Note:** Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on + + +When issues are encountered with a pod in the Kubernetes deployment, there are two primary commands to diagnose and potentially resolve problems: +1. **Checking Logs**: To view the logs of a specific pod, which can provide insight into what the application is doing and any errors it might be encountering, use: + ```bash + kubectl logs [pod-name] + ``` +2. **Describing Pods**: For a detailed view of the pod's current state, its configuration, and its operational events, run: + ```bash + kubectl describe pod [pod-name] + ``` +For example, if the status of the TGI service does not show 'Running', describe the pod using the name from the above table: +```bash +kubectl describe pod chatqna-tgi-778bb6598f-cv5cg +``` +or check logs using: +```bash +kubectl logs chatqna-tgi-778bb6598f-cv5cg +``` + +## Interacting with ChatQnA deployment +This section will walk you through what are the different ways to interact with +the microservices deployed + +Before starting the validation of microservices, check the network configuration of services using: +```bash + kubectl get svc + ``` + This command will display a list of services along with their network-related details such as cluster IP and ports. + ``` + NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +chatqna ClusterIP 100.XX.XXX.92 8888/TCP 37m +chatqna-chatqna-ui ClusterIP 100.XX.XX.87 5174/TCP 37m +chatqna-data-prep ClusterIP 100.XX.XXX.62 6007/TCP 37m +chatqna-embedding-usvc ClusterIP 100.XX.XX.77 6000/TCP 37m +chatqna-llm-uservice ClusterIP 100.XX.XXX.133 9000/TCP 37m +chatqna-nginx NodePort 100.XX.XX.173 80:30700/TCP 37m +chatqna-redis-vector-db ClusterIP 100.XX.X.126 6379/TCP,8001/TCP 37m +chatqna-reranking-usvc ClusterIP 100.XX.XXX.82 8000/TCP 37m +chatqna-retriever-usvc ClusterIP 100.XX.XXX.157 7000/TCP 37m +chatqna-tei ClusterIP 100.XX.XX.143 80/TCP 37m +chatqna-teirerank ClusterIP 100.XX.XXX.120 80/TCP 37m +chatqna-tgi ClusterIP 100.XX.XX.133 80/TCP 37m + + ``` + To begin port forwarding, which maps a service's port from the cluster to local host for testing, use: + ```bash + kubectl port-forward svc/[service-name] [local-port]:[service-port] + ``` + Replace `[service-name]`, `[local-port]`, and `[service-port]` with the appropriate values from your services list (as shown in the output given by `kubectl get svc`). This setup enables interaction with the microservice directly from the local machine. In another terminal, use `curl` commands to test the functionality and response of the service. + +Use `ctrl+c` to end the port-forwarding to test other services. + +### Dataprep Microservice(Optional) +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-data-prep 6007:6007 +``` +Follow the below steps in a different terminal. + +If you want to add/update the default knowledge base, you can use the following +commands. The dataprep microservice extracts the texts from variety of data +sources, chunks the data, embeds each chunk using embedding microservice and +store the embedded vectors in the redis vector database. + +Local File `nke-10k-2023.pdf` Upload: + +``` +curl -X POST "http://localhost:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./nke-10k-2023.pdf" +``` + +This command updates a knowledge base by uploading a local file for processing. +Update the file path according to your environment. + +Add Knowledge Base via HTTP Links: + +``` +curl -X POST "http://localhost:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' +``` + +This command updates a knowledge base by submitting a list of HTTP links for processing. + +Also, you are able to get the file list that you uploaded: + +``` +curl -X POST "http://localhost:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" + +``` + +To delete the file/link you uploaded you can use the following commands: + +#### Delete link +``` +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" +``` + +#### Delete file + +``` +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "nke-10k-2023.pdf"}' \ + -H "Content-Type: application/json" +``` + +#### Delete all uploaded files and links + +``` +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" +``` + +### TEI Embedding Service +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-tei 6006:80 +``` +Follow the below steps in a different terminal. + +The TEI embedding service takes in a string as input, embeds the string into a +vector of a specific length determined by the embedding model and returns this +embedded vector. + +``` +curl http://localhost:6006/embed \ + -X POST \ + -d '{"inputs":"What is Deep Learning?"}' \ + -H 'Content-Type: application/json' +``` + +In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of +length 768. + +### Embedding Microservice +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-embedding-usvc 6000:6000 +``` +Follow the below steps in a different terminal. + +The embedding microservice depends on the TEI embedding service. In terms of +input parameters, it takes in a string, embeds it into a vector using the TEI +embedding service and pads other default parameters that are required for the +retrieval microservice and returns it. +``` +curl http://localhost:6000/v1/embeddings\ + -X POST \ + -d '{"text":"hello"}' \ + -H 'Content-Type: application/json' +``` + +### Retriever Microservice +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-retriever-usvc 7000:7000 +``` +Follow the below steps in a different terminal. + +To consume the retriever microservice, you need to generate a mock embedding +vector by Python script. The length of embedding vector is determined by the +embedding model. Here we use the +model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. + +Check the vector dimension of your embedding model and set +`your_embedding` dimension equal to it. + +``` +export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") + +curl http://localhost:7000/v1/retrieval \ + -X POST \ + -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ + -H 'Content-Type: application/json' + +``` +The output of the retriever microservice comprises of the a unique id for the +request, initial query or the input to the retrieval microservice, a list of top +`n` retrieved documents relevant to the input query, and top_n where n refers to +the number of documents to be returned. + +The output is retrieved text that relevant to the input data: +``` +{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } + +``` +### TEI Reranking Service + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-teirerank 8808:80 +``` +Follow the below steps in a different terminal. + +The TEI Reranking Service reranks the documents returned by the retrieval +service. It consumes the query and list of documents and returns the document +index based on decreasing order of the similarity score. The document +corresponding to the returned index with the highest score is the most relevant +document for the input query. +``` +curl http://localhost:8808/rerank \ + -X POST \ + -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ + -H 'Content-Type: application/json' +``` + +Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` + +### Reranking Microservice +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-reranking-usvc 8000:8000 +``` +Follow the below steps in a different terminal. + +The reranking microservice consumes the TEI Reranking service and pads the +response with default parameters required for the llm microservice. + +``` +curl http://localhost:8000/v1/reranking\ + -X POST \ + -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ + [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ + -H 'Content-Type: application/json' +``` + +The input to the microservice is the `initial_query` and a list of retrieved +documents and it outputs the most relevant document to the initial query along +with other default parameter such as temperature, `repetition_penalty`, +`chat_template` and so on. We can also get top n documents by setting `top_n` as one +of the input parameters. For example: + +``` +curl http://localhost:8000/v1/reranking\ + -X POST \ + -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ + [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ + -H 'Content-Type: application/json' +``` + +Here is the output: + +``` +{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} + +``` +You may notice reranking microservice are with state ('ID' and other meta data), +while reranking service are not. + +### vLLM and TGI Service + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-tgi 9009:80 +``` +Follow the below steps in a different terminal. + +``` +curl http://localhost:9009/generate \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -H 'Content-Type: application/json' + +``` + +TGI service generate text for the input prompt. Here is the expected result from TGI: + +``` +{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +``` + +**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. + +If you get + +``` +curl: (7) Failed to connect to localhost port 8008 after 0 ms: Connection refused +``` + +and the log shows model warm up, please wait for a while and try it later. + +``` +2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set +2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit. +2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model Intel/neural-chat-7b-v3-3 +2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model +``` + +### LLM Microservice + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-llm-uservice 9000:9000 +``` +Follow the below steps in a different terminal. + +``` +curl http://localhost:9000/v1/chat/completions\ + -X POST \ + -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ + "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ + -H 'Content-Type: application/json' + +``` + +You will get generated text from LLM: + +``` +data: b'\n' +data: b'\n' +data: b'Deep' +data: b' learning' +data: b' is' +data: b' a' +data: b' subset' +data: b' of' +data: b' machine' +data: b' learning' +data: b' that' +data: b' uses' +data: b' algorithms' +data: b' to' +data: b' learn' +data: b' from' +data: b' data' +data: [DONE] +``` +### MegaService + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kkubectl port-forward svc/chatqna 8888:8888 +``` +Follow the below steps in a different terminal. + +``` +curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "model": "Intel/neural-chat-7b-v3-3", + "messages": "What is the revenue of Nike in 2023?" + }' + +``` + +Here is the output for your reference: + +``` +data: b'\n' +data: b'An' +data: b'swer' +data: b':' +data: b' In' +data: b' fiscal' +data: b' ' +data: b'2' +data: b'0' +data: b'2' +data: b'3' +data: b',' +data: b' N' +data: b'I' +data: b'KE' +data: b',' +data: b' Inc' +data: b'.' +data: b' achieved' +data: b' record' +data: b' Rev' +data: b'en' +data: b'ues' +data: b' of' +data: b' $' +data: b'5' +data: b'1' +data: b'.' +data: b'2' +data: b' billion' +data: b'.' +data: b'' +data: [DONE] +``` +## Launch UI +### Basic UI +To access the frontend, open the following URL in your browser: +`http://{k8s-node-ip-address}:${port}` +You can find the NGINX port using the following command: +```bash +export port=$(kubectl get service chatqna-nginx --output='jsonpath={.spec.ports[0].nodePort}') +echo $port +``` +Open a browser to access `http://:${port}` + + By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `GenAIInfra/helm-charts/chatqna/values.yaml` file as shown below: +``` +chatqna-ui: + image: + repository: "opea/chatqna-ui" + tag: "latest" + containerPort: "5173" +``` +### Stop the services +Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +``` +helm uninstall chatqna +``` \ No newline at end of file From 8e7f1386c12be37529a5526d6a5bad7f98aa77cd Mon Sep 17 00:00:00 2001 From: devpramod Date: Fri, 27 Sep 2024 15:30:30 +0000 Subject: [PATCH 02/19] fix formatting issues Signed-off-by: devpramod Signed-off-by: devpramod --- examples/ChatQnA/deploy/index.rst | 3 ++- .../ChatQnA/deploy/k8s_getting_started.md | 24 +++++++++---------- examples/ChatQnA/deploy/k8s_helm.md | 15 ++++++------ 3 files changed, 22 insertions(+), 20 deletions(-) diff --git a/examples/ChatQnA/deploy/index.rst b/examples/ChatQnA/deploy/index.rst index 0c3c3e55..31d9cc03 100644 --- a/examples/ChatQnA/deploy/index.rst +++ b/examples/ChatQnA/deploy/index.rst @@ -19,9 +19,10 @@ Single Node Kubernetes ********** +* Getting Started +* Using Helm Charts * Xeon & Gaudi with GMC * Xeon & Gaudi without GMC -* Using Helm Charts Cloud Native ************ diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md index 64766ca2..c25f5c1e 100644 --- a/examples/ChatQnA/deploy/k8s_getting_started.md +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -1,11 +1,11 @@ -# Getting Started +# Getting Started with Kubernetes for ChatQnA ## Introduction Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences: -- **Using GMC ( GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management. -- **Using Manifests**: This involves deploying directly using Kubernetes manifest files without the GenAI Microservices Connector (GMC). -- **Using Helm Charts**: Facilitates deployment through Helm, which manages Kubernetes applications through packages of pre-configured Kubernetes resources. +- **Using GMC ( GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management. +- **Using Manifests**: This involves deploying directly using Kubernetes manifest files without the GenAI Microservices Connector (GMC). +- **Using Helm Charts**: Facilitates deployment through Helm, which manages Kubernetes applications through packages of pre-configured Kubernetes resources. This guide will provide detailed instructions on using these resources. If you're already familiar with Kubernetes, feel free to skip ahead to (**Deploy using Helm**) @@ -18,9 +18,9 @@ This guide will provide detailed instructions on using these resources. If you'r **Understanding Kubernetes Deployment Tools and Resources:** -- **kubectl**: This command-line tool allows you to deploy applications, inspect and manage cluster resources, and view logs. For instance, `kubectl apply -f chatqna.yaml` would be used to deploy resources defined in a manifest file. +- **kubectl**: This command-line tool allows you to deploy applications, inspect and manage cluster resources, and view logs. For instance, `kubectl apply -f chatqna.yaml` would be used to deploy resources defined in a manifest file. -- **Pods**: Pods are the smallest deployable units created and managed by Kubernetes. A pod typically encapsulates one or more containers where your application runs. +- **Pods**: Pods are the smallest deployable units created and managed by Kubernetes. A pod typically encapsulates one or more containers where your application runs. **Verifying Kubernetes Cluster Access with kubectl** ```bash @@ -60,17 +60,17 @@ kubectl config set-context --current --namespace=chatqa **Key Components of a Helm Chart:** -- **Chart.yaml**: This file contains metadata about the chart such as name, version, and description. -- **values.yaml**: Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. -- **deployment.yaml**: Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. +- **Chart.yaml**: This file contains metadata about the chart such as name, version, and description. +- **values.yaml**: Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. +- **deployment.yaml**: Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. **Update Dependencies:** -- A script called **./update_dependency.sh** is provided which is used to update chart dependencies, ensuring all nested charts are at their latest versions. -- The command `helm dependency update chatqna` updates the dependencies for the `chatqna` chart based on the versions specified in `Chart.yaml`. +- A script called **./update_dependency.sh** is provided which is used to update chart dependencies, ensuring all nested charts are at their latest versions. +- The command `helm dependency update chatqna` updates the dependencies for the `chatqna` chart based on the versions specified in `Chart.yaml`. **Helm Install Command:** -- `helm install [RELEASE_NAME] [CHART_NAME]`: This command deploys a Helm chart into your Kubernetes cluster, creating a new release. It is used to set up all the Kubernetes resources specified in the chart and track the version of the deployment. +- `helm install [RELEASE_NAME] [CHART_NAME]`: This command deploys a Helm chart into your Kubernetes cluster, creating a new release. It is used to set up all the Kubernetes resources specified in the chart and track the version of the deployment. For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index d88bb907..23b264c6 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -7,14 +7,13 @@ be covering one option of doing it for convenience : we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, please follow the instructions here (*** ### Kubernetes Cluster and Development Environment***). -For a quick introduction on Helm Charts, visit the helm section in (**getting started**) +For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md) ## Overview There are several ways to setup a ChatQnA use case. Here in this tutorial, we will walk through how to enable the below list of microservices from OPEA GenAIComps to deploy a multi-node TGI megaservice solution. -> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node. 1. Data Prep 2. Embedding @@ -22,14 +21,16 @@ GenAIComps to deploy a multi-node TGI megaservice solution. 4. Reranking 5. LLM with TGI +> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node. + ## Prerequisites ### Install Helm First, ensure that Helm (version >= 3.15) is installed on your system. Helm is an essential tool for managing Kubernetes applications. It simplifies the deployment and management of Kubernetes applications using Helm charts. -For detailed installation instructions, please refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) +For detailed installation instructions, refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) ### Clone Repository -First step is to clone the GenAIInfra which is the containerization and cloud native suite for OPEA, including artifacts to deploy ChatQnA in a cloud native way. +Next step is to clone the GenAIInfra which is the containerization and cloud native suite for OPEA, including artifacts to deploy ChatQnA in a cloud native way. ```bash git clone https://github.com/opea-project/GenAIInfra.git @@ -51,12 +52,12 @@ export HF_TOKEN="Your_Huggingface_API_Token" ``` ### Proxy Settings -Make sure to setup Proxies if you are behind a firewall. + For services requiring internet access, such as the LLM microservice, embedding service, reranking service, and other backend services, proxy settings can be essential. These settings ensure services can download necessary content from the internet, especially when behind a corporate firewall. Proxy can be set in the `values.yaml` file, like so: Open the `values.yaml` file using an editor ```bash -vi GenAIInfra/helm-charts/chatqna/values.yaml +vi chatqna/values.yaml ``` Update the following section and save file: ```yaml @@ -569,4 +570,4 @@ chatqna-ui: Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: ``` helm uninstall chatqna -``` \ No newline at end of file +``` From fcf8851822f9cf83c90e4bc7b33d87eaf0b1612d Mon Sep 17 00:00:00 2001 From: devpramod Date: Fri, 27 Sep 2024 21:14:26 +0000 Subject: [PATCH 03/19] update toctree Signed-off-by: devpramod --- examples/ChatQnA/deploy/index.rst | 8 ++++++-- examples/ChatQnA/deploy/k8s_helm.md | 1 + 2 files changed, 7 insertions(+), 2 deletions(-) diff --git a/examples/ChatQnA/deploy/index.rst b/examples/ChatQnA/deploy/index.rst index 31d9cc03..4673631d 100644 --- a/examples/ChatQnA/deploy/index.rst +++ b/examples/ChatQnA/deploy/index.rst @@ -19,8 +19,12 @@ Single Node Kubernetes ********** -* Getting Started -* Using Helm Charts +.. toctree:: + :maxdepth: 1 + + K8s Getting Started + TGI on Xeon with Helm Charts + * Xeon & Gaudi with GMC * Xeon & Gaudi without GMC diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 23b264c6..e65fa407 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -78,6 +78,7 @@ This use case employs a tailored combination of Helm charts and `values.yaml` co |Reranking | TEI | BAAI/bge-reranker-base | OPEA Microservice | |LLM | TGI |Intel/neural-chat-7b-v3-3 |OPEA Microservice | |UI | | NA | Gateway Service | + Tools and models mentioned in the table are configurable either through the environment variable or `values.yaml` From 12e2e4fc2c5b59e2bf677595a076d1b041c13cd0 Mon Sep 17 00:00:00 2001 From: devpramod Date: Tue, 15 Oct 2024 13:52:38 +0000 Subject: [PATCH 04/19] upddate both docs Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_getting_started.md | 14 +++++++------- examples/ChatQnA/deploy/k8s_helm.md | 11 ++++++----- 2 files changed, 13 insertions(+), 12 deletions(-) diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md index c25f5c1e..f98ad808 100644 --- a/examples/ChatQnA/deploy/k8s_getting_started.md +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -3,11 +3,11 @@ ## Introduction Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences: -- **Using GMC ( GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management. +- **Using GMC (GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management. - **Using Manifests**: This involves deploying directly using Kubernetes manifest files without the GenAI Microservices Connector (GMC). - **Using Helm Charts**: Facilitates deployment through Helm, which manages Kubernetes applications through packages of pre-configured Kubernetes resources. -This guide will provide detailed instructions on using these resources. If you're already familiar with Kubernetes, feel free to skip ahead to (**Deploy using Helm**) +This guide will provide detailed instructions on using these resources. If you're already familiar with Kubernetes, feel free to skip ahead to [Helm Deployment](./k8s_helm.md) ### Kubernetes Cluster and Development Environment @@ -32,11 +32,11 @@ This command lists all the nodes in the cluster, verifying that `kubectl` is cor Some commonly used kubectl commands and their functions that will help deploy ChatQnA successfully: |Command |Function | |------------------------------- |-----------------------------| -|kubectl describe pod [`pod-name`] | Provides detailed information about a specific pod, including its current state, recent events, and configuration details. | -|kubectl delete deployments --all | Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources. | -|kubectl get pods -o wide | Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on. | -|kubectl logs [`pod-name`] | Fetches the logs generated by a container in a specific pod, useful for debugging and monitoring application behavior. | -|kubectl get svc | Lists all services in the current namespace, providing a quick overview of the network services and their status. +|`kubectl describe pod ` | Provides detailed information about a specific pod, including its current state, recent events, and configuration details. | +|`kubectl delete deployments --all` | Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources. | +|`kubectl get pods -o wide` | Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on. | +|`kubectl logs ` | Fetches the logs generated by a container in a specific pod, useful for debugging and monitoring application behavior. | +|`kubectl get svc` | Lists all services in the current namespace, providing a quick overview of the network services and their status. #### Create and Set Namespace A Kubernetes namespace is a logical division within a cluster that is used to isolate different environments, teams, or projects, allowing for finer control over resources and access management. To create a namespace called `chatqa`, use: diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index e65fa407..07f854c1 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -1,11 +1,11 @@ -# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm Charts +# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm Charts This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience : we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, -deployed on a Kubernetes cluster. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, +deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, please follow the instructions here (*** ### Kubernetes Cluster and Development Environment***). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md) @@ -115,7 +115,7 @@ export RERANKER_MODELNAME="BAAI/bge-reranker-base" ``` ## Deploy the use case -In this tutorial, we will be deploying using Helm with the provided chart. The Helm install commands will initiate all the aforementioned services as Kubernetes pods. +The `helm install` command will initiate all the aforementioned services such as Kubernetes pods. ```bash helm install chatqna chatqna \ @@ -159,7 +159,8 @@ chatqna-tgi-675c4d79f6-cf4pq 1/1 Running 0 ``` -> **Note:** Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on +> [!NOTE] +> Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on When issues are encountered with a pod in the Kubernetes deployment, there are two primary commands to diagnose and potentially resolve problems: @@ -187,7 +188,7 @@ the microservices deployed Before starting the validation of microservices, check the network configuration of services using: ```bash kubectl get svc - ``` +``` This command will display a list of services along with their network-related details such as cluster IP and ports. ``` NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE From de87a0acea2cfa0a91a3e87b62e8105900ed2eee Mon Sep 17 00:00:00 2001 From: devpramod Date: Tue, 19 Nov 2024 16:16:42 +0000 Subject: [PATCH 05/19] add k8s mainfest and add to getting started Signed-off-by: devpramod --- examples/ChatQnA/deploy/index.rst | 4 +- .../ChatQnA/deploy/k8s_getting_started.md | 22 +- examples/ChatQnA/deploy/k8s_helm.md | 16 +- examples/ChatQnA/deploy/k8s_manifest.md | 480 ++++++++++++++++++ 4 files changed, 504 insertions(+), 18 deletions(-) create mode 100644 examples/ChatQnA/deploy/k8s_manifest.md diff --git a/examples/ChatQnA/deploy/index.rst b/examples/ChatQnA/deploy/index.rst index 4673631d..5eba8254 100644 --- a/examples/ChatQnA/deploy/index.rst +++ b/examples/ChatQnA/deploy/index.rst @@ -24,9 +24,7 @@ Kubernetes K8s Getting Started TGI on Xeon with Helm Charts - -* Xeon & Gaudi with GMC -* Xeon & Gaudi without GMC + TGI on Xeon with Kubernetes Manifest Cloud Native ************ diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md index f98ad808..f24aa273 100644 --- a/examples/ChatQnA/deploy/k8s_getting_started.md +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -58,11 +58,13 @@ kubectl config set-context --current --namespace=chatqa **What is Helm?** Helm is a package manager for Kubernetes, similar to how apt is for Ubuntu. It simplifies deploying and managing Kubernetes applications through Helm charts, which are packages of pre-configured Kubernetes resources. -**Key Components of a Helm Chart:** +#### Key Components of a Helm Chart -- **Chart.yaml**: This file contains metadata about the chart such as name, version, and description. -- **values.yaml**: Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. -- **deployment.yaml**: Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. +| Component |Description | +| --- | --- | +| `Chart.yaml` | This file contains metadata about the chart such as name, version, and description. | +| `values.yaml` | Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. | +| `deployment.yaml` | Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. | **Update Dependencies:** @@ -74,3 +76,15 @@ kubectl config set-context --current --namespace=chatqa - `helm install [RELEASE_NAME] [CHART_NAME]`: This command deploys a Helm chart into your Kubernetes cluster, creating a new release. It is used to set up all the Kubernetes resources specified in the chart and track the version of the deployment. For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). + +### Using Kubernetes Manifest to Deploy +Manifest files in YAML format define the Kubernetes resources you want to manage. The main components in a manifest file include: + +- **ConfigMap**: Stores configuration data that can be used by pods, allowing you to keep containerized applications portable without embedding configuration data directly within the application's images. For example, a ConfigMap might store the database URL and credentials that your application needs to connect to a database. + +- **Services**: Defines a logical set of Pods and a policy by which to access them. This resource abstracts the way you expose an application running on a set of Pods as a network service. + +- **Deployment**: Manages the state of replicated application instances. It automatically replaces instances that fail or are deleted, maintaining the desired state of the application. + + +For more detailed examples, you can view the [ChatQnA manifest file](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml) which includes definitions for services, deployments, and other resources essential for running the ChatQnA application. This file is a reference for understanding how Kubernetes resources for ChatQnA are defined and orchestrated. \ No newline at end of file diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 07f854c1..187a08d4 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -1,13 +1,6 @@ -# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm Charts +# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm -This deployment section covers multi-node on-prem deployment of the ChatQnA -example with OPEA comps to deploy using the TGI service. There are several -slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will -be covering one option of doing it for convenience : we will be showcasing how -to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, -deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, -please follow the instructions here (*** ### Kubernetes Cluster and Development Environment***). -For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md) +This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md). ## Overview @@ -61,6 +54,7 @@ vi chatqna/values.yaml ``` Update the following section and save file: ```yaml +# chatqna/values.yaml global: http_proxy: "http://your-proxy-address:port" https_proxy: "http://your-proxy-address:port" @@ -166,11 +160,11 @@ chatqna-tgi-675c4d79f6-cf4pq 1/1 Running 0 When issues are encountered with a pod in the Kubernetes deployment, there are two primary commands to diagnose and potentially resolve problems: 1. **Checking Logs**: To view the logs of a specific pod, which can provide insight into what the application is doing and any errors it might be encountering, use: ```bash - kubectl logs [pod-name] + kubectl logs ``` 2. **Describing Pods**: For a detailed view of the pod's current state, its configuration, and its operational events, run: ```bash - kubectl describe pod [pod-name] + kubectl describe pod ``` For example, if the status of the TGI service does not show 'Running', describe the pod using the name from the above table: ```bash diff --git a/examples/ChatQnA/deploy/k8s_manifest.md b/examples/ChatQnA/deploy/k8s_manifest.md new file mode 100644 index 00000000..f6677573 --- /dev/null +++ b/examples/ChatQnA/deploy/k8s_manifest.md @@ -0,0 +1,480 @@ +# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm Charts + in helm update + +This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). +For a quick introduction on deploying using Mainfest files, visit the `Using Kubernetes Manifest to Deploy` section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md) + +## Overview + +There are several ways to setup a ChatQnA use case. Here in this tutorial, we +will walk through how to enable the below list of microservices from OPEA +[GenAIComps](https://github.com/opea-project/GenAIComps) to deploy a multi-node TGI megaservice solution. + +1. Data Prep +2. Embedding +3. Retriever +4. Reranking +5. LLM with TGI + +> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node. + +## Prerequisites + +### Clone Repository +To set up the workspace for deploying ChatQnA via Kubernetes, start by cloning the `GenAIExamples` repository and navigate to the ChatQnA Kubernetes manifests directory: + +```bash +https://github.com/opea-project/GenAIExamples.git +``` +Checkout the release tag +``` +cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest +git checkout tags/v1.1 +``` +### Bfloat16 Inference Optimization +We recommend using newer CPUs, such as 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) and later, that support the bfloat16 data type. If your hardware includes such CPUs and your model is compatible with bfloat16, adding the `--dtype bfloat16` argument to the HuggingFace `text-generation-inference` server can significantly reduce memory usage by half and provide a moderate speed boost. This change has already been configured in the `chatqna_bf16.yaml` file. To use it, follow these steps: + +Run `kubectl get nodes` and identify the nodes in your cluster with BFloat16 support and label the nodes to schedule the service on it automatically: + +```bash +kubectl label node node-type=node-bfloat16 +``` + +>**Note:** The manifest folder has several configuration pipelines that can be deployed for ChatQnA. In this example, we'll use the `chatqna_bf16.yaml` configuration. You can use `chatqna.yaml` instead if you don't have BFloat16 support in your nodes. + +### HF Token +The example can utilize model weights from HuggingFace and langchain. + +Setup your [HuggingFace](https://huggingface.co/) account and generate +[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Add the HuggingFace token to the manifest +```bash +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +# write the token to appropriate places in the manifest file +sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna_bf16.yaml +``` + +### Proxy Settings +If you are behind a corporate VPN, proxy settings must be added for services requiring internet access, such as the LLM microservice, embedding service, reranking service, and other backend services. +Proxy settings can be set in the `ConfigMap` section across the manifest file. One example for `chatqna-tei-config` is shown below. + +To configure proxy settings for the Text Embedding Inference (TEI) microservice using a `ConfigMap` in the `chatqna_bf16.yaml` manifest, open `chatqna_bf16.yaml` in an editor and populate the `http_proxy`, `https_proxy` and `no_proxy` fields under `data` marked by `#1`, `#2` and `#3` as follows: + +```bash +vi chatqna_bf16.yaml +``` +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: chatqna-tei-config + labels: + helm.sh/chart: tei-1.0.0 + app.kubernetes.io/name: tei + app.kubernetes.io/instance: chatqna + app.kubernetes.io/version: "cpu-1.5" + app.kubernetes.io/managed-by: Helm +data: + MODEL_ID: "BAAI/bge-base-en-v1.5" + PORT: "2081" + http_proxy: "http://your-proxy-address:port" #1 + https_proxy: "http://your-proxy-address:port" #2 + no_proxy: "localhost,127.0.0.1,localaddress,.localdomain.com" #3 + NUMBA_CACHE_DIR: "/tmp" + TRANSFORMERS_CACHE: "/tmp/transformers_cache" + HF_HOME: "/tmp/.cache/huggingface" + MAX_WARMUP_SEQUENCE_LENGTH: "512" +``` +## Use Case Setup + +As mentioned the use case will use the following combination of the [GenAIComps](https://github.com/opea-project/GenAIComps) with the tools: + +|use case components | Tools | Model | Service Type | +|---------------- |--------------|-----------------------------|-------| +|Data Prep | LangChain | NA |OPEA Microservice | +|VectorDB | Redis | NA |Open source service| +|Embedding | TEI | BAAI/bge-base-en-v1.5 |OPEA Microservice | +|Reranking | TEI | BAAI/bge-reranker-base | OPEA Microservice | +|LLM | TGI |Intel/neural-chat-7b-v3-3 |OPEA Microservice | +|UI | | NA | Gateway Service | + +Tools and models mentioned in the table are configurable either through the +`chatqna_bf16.yaml` + + +## Deploy the use case + +Set a new [namespace](#create-and-set-namespace) and switch to it if needed and run: + + ```bash + kubectl apply -f chatqna_bf16.yaml +``` + +It takes a few minutes for all the microservices to be up and running. Go to the next section which is [Validate Microservices](#validate-microservices) to verify that the deployment is successful. + + + +### Validate microservice +#### Check the pod status +To check if all the pods have started, run: + +```bash +kubectl get pods +``` +You should expect a similar output as below: +``` +NAME READY STATUS RESTARTS AGE +chatqna-chatqna-ui-77dbdfc949-6dtms 1/1 Running 0 5m7s +chatqna-data-prep-798f59f447-4frqt 1/1 Running 0 5m7s +chatqna-df57cc766-t6lkg 1/1 Running 0 5m7s +chatqna-nginx-5dd47bfc7d-54x96 1/1 Running 0 5m7s +chatqna-redis-vector-db-7f489b6bb6-mvzbw 1/1 Running 0 5m7s +chatqna-retriever-usvc-6695979d67-z5jgx 1/1 Running 0 5m7s +chatqna-tei-769dc796c-gh5vx 1/1 Running 0 5m7s +chatqna-teirerank-54f58c596c-76xqz 1/1 Running 0 5m7s +chatqna-tgi-7b5556d46d-pnzph 1/1 Running 0 5m7s +``` +> [!NOTE] +> Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on + +The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'. + +When issues are encountered with a pod in the Kubernetes deployment, there are two primary commands to diagnose and potentially resolve problems: +1. **Checking Logs**: To view the logs of a specific pod, which can provide insight into what the application is doing and any errors it might be encountering, use: + ```bash + kubectl logs + ``` +2. **Describing Pods**: For a detailed view of the pod's current state, its configuration, and its operational events, run: + ```bash + kubectl describe pod + ``` +For example, if the status of the TGI service does not show 'Running', describe the pod using the name from the above table: +```bash +kubectl describe pod chatqna-tgi-778bb6598f-cv5cg +``` +or check logs using: +```bash +kubectl logs chatqna-tgi-778bb6598f-cv5cg +``` + +## Interacting with ChatQnA deployment +This section will walk you through what are the different ways to interact with +the microservices deployed + +Before starting the validation of microservices, check the network configuration of services using: +```bash +kubectl get svc +``` + This command will display a list of services along with their network-related details such as cluster IP and ports. + ``` +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +chatqna ClusterIP 10.108.186.198 8888/TCP 8m16s +chatqna-chatqna-ui ClusterIP 10.102.80.123 5173/TCP 8m16s +chatqna-data-prep ClusterIP 10.110.143.212 6007/TCP 8m16s +chatqna-nginx NodePort 10.100.224.12 80:30304/TCP 8m16s +chatqna-redis-vector-db ClusterIP 10.205.9.19 6379/TCP,8001/TCP 8m16s +chatqna-retriever-usvc ClusterIP 10.202.3.15 7000/TCP 8m16s +chatqna-tei ClusterIP 10.105.204.12 80/TCP 8m16s +chatqna-teirerank ClusterIP 10.115.146.21 80/TCP 8m16s +chatqna-tgi ClusterIP 10.108.195.244 80/TCP 8m16s +kubernetes ClusterIP 10.92.0.100 443/TCP 11d + ``` + To begin port forwarding, which maps a service's port from the cluster to local host for testing, use: + ```bash + kubectl port-forward svc/[service-name] [local-port]:[service-port] + ``` + Replace `[service-name]`, `[local-port]`, and `[service-port]` with the appropriate values from your services list (as shown in the output given by `kubectl get svc`). This setup enables interaction with the microservice directly from the local machine. In another terminal, use `curl` commands to test the functionality and response of the service. + +Use `ctrl+c` to end the port-forwarding to test other services. + +### MegaService Before RAG Dataprep + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna 8888:8888 +``` +Follow the below steps in a different terminal. + +``` +curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "model": "Intel/neural-chat-7b-v3-3", + "messages": "What is the revenue of Nike in 2023?" + }' + +``` +Here is the output for your reference: +```bash +data: b' O', data: b'PE', data: b'A', data: b' stands', data: b' for', data: b' Organization', data: b' of', data: b' Public', data: b' Em', data: b'ploy', data: b'ees', data: b' of', data: b' Alabama', data: b'.', data: b' It', data: b' is', data: b' a', data: b' labor', data: b' union', data: b' representing', data: b' public', data: b' employees', data: b' in', data: b' the', data: b' state', data: b' of', data: b' Alabama', data: b',', data: b' working', data: b' to', data: b' protect', data: b' their', data: b' rights', data: b' and', data: b' interests', data: b'.', data: b'', data: b'', data: [DONE] +``` +which is essentially the following sentence: +``` +OPEA stands for Organization of Public Employees of Alabama. It is a labor union representing public employees in the state of Alabama, working to protect their rights and interests. +``` +In the upcoming sections we will see how this answer can be improved with RAG. + +### Dataprep Microservice +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-data-prep 6007:6007 +``` +Follow the below steps in a different terminal. + +If you want to add/update the default knowledge base, you can use the following +commands. The dataprep microservice extracts the texts from variety of data +sources, chunks the data, embeds each chunk using embedding microservice and +store the embedded vectors in the redis vector database. + +this example leverages the OPEA document for its RAG based content. You can download the [OPEA document](https://opea-project.github.io/latest/_downloads/41c91aec1d47f20ca22350daa8c2cadc/what_is_opea.pdf) and upload it using the UI. + + +Local File `what_is_opea.pdf` Upload: + +``` +curl -X POST "http://localhost:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F "files=@./what_is_opea.pdf" +``` + +This command updates a knowledge base by uploading a local file for processing. +Update the file path according to your environment. + +You should see the following output after successful execution: +``` +{"status":200,"message":"Data preparation succeeded"} +``` +For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29) + +### MegaService After RAG Dataprep + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna 8888:8888 +``` +Similarly, follow the below steps in a different terminal. + +``` +curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "model": "Intel/neural-chat-7b-v3-3", + "messages": "What is OPEA?" + }' + +``` +After uploading the pdf with information about OPEA, we can see that the pdf is being used as a context to answer the question correctly: + +```bash +data: b' O', data: b'PE', data: b'A', data: b' (', data: b'Open', data: b' Platform', data: b' for', data: b' Enterprise', data: b' AI', data: b')', data: b' is', data: b' a', data: b' framework', data: b' that', data: b' focuses', data: b' on', data: b' creating', data: b' and', data: b' evalu', data: b'ating', data: b' open', data: b',', data: b' multi', data: b'-', data: b'provider', data: b',', data: b' robust', data: b',', data: b' and', data: b' compos', data: b'able', data: b' gener', data: b'ative', data: b' AI', data: b' (', data: b'Gen', data: b'AI', data: b')', data: b' solutions', data: b'.', data: b' It', data: b' aims', data: b' to', data: b' facilitate', data: b' the', data: b' implementation', data: b' of', data: b' enterprise', data: b'-', data: b'grade', data: b' composite', data: b' Gen', data: b'AI', data: b' solutions', data: b',', data: b' particularly', data: b' Ret', data: b'riev', data: b'al', data: b' Aug', data: b'ment', data: b'ed', data: b' Gener', data: b'ative', data: b' AI', data: b' (', data: b'R', data: b'AG', data: b'),', data: b' by', data: b' simpl', data: b'ifying', data: b' the', data: b' integration', data: b' of', data: b' secure', data: b',', data: b' perform', data: b'ant', data: b',', data: b' and', data: b' cost', data: b'-', data: b'effective', data: b' Gen', data: b'AI', data: b' work', data: b'fl', data: b'ows', data: b' into', data: b' business', data: b' systems', data: b'.', data: b'', data: b'', data: [DONE] +``` +The above output has been parsed into the below sentence which shows how the LLM has picked up the right context to answer the question correctly after the document upload: +``` +OPEN Platform for Enterprise AI (Open Platform for Enterprise AI) is a framework that focuses on creating and evaluating open, multi-provider, robust, and composable generative AI (GenAI) solutions. It aims to facilitate the implementation of enterprise-grade composite GenAI solutions, particularly Retrieval Augmented Generative AI (RAG), by simplifying the integration of secure, performant, and cost-effective GenAI workflows into business systems. +``` + +### TEI Embedding Service +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-tei 6006:80 +``` +Follow the below steps in a different terminal. + +The TEI embedding service takes in a string as input, embeds the string into a +vector of a specific length determined by the embedding model and returns this +embedded vector. + +``` +curl http://localhost:6006/embed \ + -X POST \ + -d '{"inputs":"What is Deep Learning?"}' \ + -H 'Content-Type: application/json' +``` + +In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of +length 768. + + +### Retriever Microservice +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-retriever-usvc 7000:7000 +``` +Follow the below steps in a different terminal. + +To consume the retriever microservice, you need to generate a mock embedding +vector by Python script. The length of embedding vector is determined by the +embedding model. Here we use the +model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. + +Check the vector dimension of your embedding model and set +`your_embedding` dimension equal to it. + +``` +export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") + +curl http://localhost:7000/v1/retrieval \ + -X POST \ + -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ + -H 'Content-Type: application/json' + +``` +The output of the retriever microservice comprises of the a unique id for the +request, initial query or the input to the retrieval microservice, a list of top +`n` retrieved documents relevant to the input query, and top_n where n refers to +the number of documents to be returned. + +The output is retrieved text that relevant to the input data: +``` +{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } + +``` +### TEI Reranking Service + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-teirerank 8808:80 +``` +Follow the below steps in a different terminal. + +The TEI Reranking Service reranks the documents returned by the retrieval +service. It consumes the query and list of documents and returns the document +index based on decreasing order of the similarity score. The document +corresponding to the returned index with the highest score is the most relevant +document for the input query. +``` +curl http://localhost:8808/rerank \ + -X POST \ + -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ + -H 'Content-Type: application/json' +``` + +Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` + + +### TGI Service + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna-tgi 9009:80 +``` +Follow the below steps in a different terminal. + +``` +curl http://localhost:9009/generate \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ + -H 'Content-Type: application/json' + +``` + +TGI service generate text for the input prompt. Here is the expected result from TGI: + +``` +{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} +``` + +**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. + +If you get + +``` +curl: (7) Failed to connect to localhost port 8008 after 0 ms: Connection refused +``` + +and the log shows model warm up, please wait for a while and try it later. + +``` +2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set +2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit. +2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model Intel/neural-chat-7b-v3-3 +2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model +``` + +### Dataprep Microservice (Advanced) + +Add Knowledge Base via HTTP Links: + +``` +curl -X POST "http://localhost:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' +``` + +This command updates a knowledge base by submitting a list of HTTP links for processing. + +Also, you are able to get the file list that you uploaded: + +``` +curl -X POST "http://localhost:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" + +``` + +To delete the file/link you uploaded you can use the following commands: + +#### Delete link +``` +# The dataprep service will add a .txt postfix for link file + +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" +``` + +#### Delete file + +``` +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "what_is_opea.pdf"}' \ + -H "Content-Type: application/json" +``` + +#### Delete all uploaded files and links + +``` +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" +``` + + + +## Launch UI +### Basic UI +To access the frontend, open the following URL in your browser: +`http://{k8s-node-ip-address}:${port}` +You can find the NGINX port using the following command: +```bash +kubectl get service chatqna-nginx +``` +Which shows the Nginx port as follows: +``` +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +chatqna-nginx NodePort 10.201.220.120 80:30304/TCP 16h +``` +We can see that it is serving at port `30304` based on this configuration via a NodePort. + +Next step is to get the `` by running: +```bash +kubectl get nodes -o wide +``` +The command shows internal IPs for all the nodes in the cluster: +``` +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +minikube Ready control-plane 11d v1.31.0 190.128.49.1 Ubuntu 22.04.4 LTS 5.15.0-124-generic docker://27.2.0 +``` +When using a NodePort, all the nodes in the cluster will be listening at the specified port, which is `30304` in this example. The `` can be found under INTERNAL-IP. Here it is `190.128.49.1`. + +Open a browser to access `http://:${port}`. +From the configuration shown above, it would be `http://190.128.49.1:30304` + +Alternatively, You can also choose to use port forwarding as shown previously using: +```bash +kubectl port-forward service/chatqna-nginx 8080:80 +``` +and open a browser to access `http://localhost:8080` + + Visit this [link](https://opea-project.github.io/latest/getting-started/README.html#:~:text=tei%2Dembedding%2Dserver%20%20%20%20%20%20%20%20%20%7C-,Interact%20with%20ChatQnA,-%C2%B6) to see how to interact with the UI. +### Stop the services +Once you are done with the entire pipeline and wish to stop and remove all the pods, use the command below: +``` +kubectl delete deployments --all +``` \ No newline at end of file From ccf1240fa2854bb907d5ce7db2ec085f9c3c9451 Mon Sep 17 00:00:00 2001 From: devpramod Date: Tue, 19 Nov 2024 19:33:27 +0000 Subject: [PATCH 06/19] update helm Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 358 +++++++++++----------------- 1 file changed, 143 insertions(+), 215 deletions(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 187a08d4..15c1a86e 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -31,7 +31,7 @@ git clone https://github.com/opea-project/GenAIInfra.git Checkout the release tag ``` cd GenAIInfra/helm-charts/ -git checkout tags/v1.0 +git checkout tags/v1.1 ``` ### HF Token The example can utilize model weights from HuggingFace and langchain. @@ -46,8 +46,8 @@ export HF_TOKEN="Your_Huggingface_API_Token" ### Proxy Settings -For services requiring internet access, such as the LLM microservice, embedding service, reranking service, and other backend services, proxy settings can be essential. These settings ensure services can download necessary content from the internet, especially when behind a corporate firewall. -Proxy can be set in the `values.yaml` file, like so: +If you are behind a corporate VPN, proxy settings must be added for services requiring internet access, such as the LLM microservice, embedding service, reranking service, and other backend services. +Proxy can be set in the `values.yaml`. Open the `values.yaml` file using an editor ```bash vi chatqna/values.yaml @@ -61,6 +61,7 @@ global: no_proxy: "localhost,127.0.0.1,localaddress,.localdomain.com" ``` ## Use Case Setup + The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as Docker image sources and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common). This use case employs a tailored combination of Helm charts and `values.yaml` configurations to deploy the following components and tools: @@ -78,7 +79,7 @@ environment variable or `values.yaml` Set a new [namespace](#create-and-set-namespace) and switch to it if needed -To enable UI, uncomment the lines `54-58` in `GenAIInfra/helm-charts/chatqna/values.yaml`: +To enable UI, uncomment the lines `56-62` in `GenAIInfra/helm-charts/chatqna/values.yaml`: ```bash chatqna-ui: image: @@ -87,7 +88,6 @@ chatqna-ui: containerPort: "5173" ``` - Next, we will update the dependencies for all Helm charts in the specified directory and ensure the `chatqna` Helm chart is ready for deployment by updating its dependencies as defined in the `Chart.yaml` file. ```bash @@ -127,35 +127,33 @@ LAST DEPLOYED: Thu Sep 5 13:40:20 2024 NAMESPACE: chatqa STATUS: deployed REVISION: 1 - ``` +It takes a few minutes for all the microservices to be up and running. Go to the next section which is [Validate Microservices](#validate-microservices) to verify that the deployment is successful. ### Validate microservice #### Check the pod status -Check if all the pods launched via Helm have started. +To check if all the pods have started, run: -For example, the ChatQnA deployment starts 12 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'. To perform a quick sanity check, use the command `kubectl get pods` to see if all the pods are active. +```bash +kubectl get pods +``` +You should expect a similar output as below: ``` -NAME READY STATUS RESTARTS AGE -chatqna-5cd6b44f98-7tdnk 1/1 Running 0 15m -chatqna-chatqna-ui-b9984f596-4pckn 1/1 Running 0 15m -chatqna-data-prep-7496bcf74-gj2fm 1/1 Running 0 15m -chatqna-embedding-usvc-79c9795545-5zpk5 1/1 Running 0 15m -chatqna-llm-uservice-564c497d65-kw6b2 1/1 Running 0 15m -chatqna-nginx-67fc749576-krmxs 1/1 Running 0 15m -chatqna-redis-vector-db-798f474769-5g7bh 1/1 Running 0 15m -chatqna-reranking-usvc-767545c6ff-966w2 1/1 Running 0 15m -chatqna-retriever-usvc-5ccf966546-446dd 1/1 Running 0 15m -chatqna-tei-7b987585c9-nwncb 1/1 Running 0 15m -chatqna-teirerank-fd745dcd5-md2l5 1/1 Running 0 15m -chatqna-tgi-675c4d79f6-cf4pq 1/1 Running 0 15m - - +NAME READY STATUS RESTARTS AGE +chatqna-chatqna-ui-77dbdfc949-6dtms 1/1 Running 0 5m7s +chatqna-data-prep-798f59f447-4frqt 1/1 Running 0 5m7s +chatqna-df57cc766-t6lkg 1/1 Running 0 5m7s +chatqna-nginx-5dd47bfc7d-54x96 1/1 Running 0 5m7s +chatqna-redis-vector-db-7f489b6bb6-mvzbw 1/1 Running 0 5m7s +chatqna-retriever-usvc-6695979d67-z5jgx 1/1 Running 0 5m7s +chatqna-tei-769dc796c-gh5vx 1/1 Running 0 5m7s +chatqna-teirerank-54f58c596c-76xqz 1/1 Running 0 5m7s +chatqna-tgi-7b5556d46d-pnzph 1/1 Running 0 5m7s ``` -> [!NOTE] -> Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on +>**Note:** Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on +For example, the ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'. To perform a quick sanity check, use the command `kubectl get pods` to see if all the pods are active. When issues are encountered with a pod in the Kubernetes deployment, there are two primary commands to diagnose and potentially resolve problems: 1. **Checking Logs**: To view the logs of a specific pod, which can provide insight into what the application is doing and any errors it might be encountering, use: @@ -185,20 +183,17 @@ Before starting the validation of microservices, check the network configuration ``` This command will display a list of services along with their network-related details such as cluster IP and ports. ``` - NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -chatqna ClusterIP 100.XX.XXX.92 8888/TCP 37m -chatqna-chatqna-ui ClusterIP 100.XX.XX.87 5174/TCP 37m -chatqna-data-prep ClusterIP 100.XX.XXX.62 6007/TCP 37m -chatqna-embedding-usvc ClusterIP 100.XX.XX.77 6000/TCP 37m -chatqna-llm-uservice ClusterIP 100.XX.XXX.133 9000/TCP 37m -chatqna-nginx NodePort 100.XX.XX.173 80:30700/TCP 37m -chatqna-redis-vector-db ClusterIP 100.XX.X.126 6379/TCP,8001/TCP 37m -chatqna-reranking-usvc ClusterIP 100.XX.XXX.82 8000/TCP 37m -chatqna-retriever-usvc ClusterIP 100.XX.XXX.157 7000/TCP 37m -chatqna-tei ClusterIP 100.XX.XX.143 80/TCP 37m -chatqna-teirerank ClusterIP 100.XX.XXX.120 80/TCP 37m -chatqna-tgi ClusterIP 100.XX.XX.133 80/TCP 37m - +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +chatqna ClusterIP 10.108.186.198 8888/TCP 8m16s +chatqna-chatqna-ui ClusterIP 10.102.80.123 5173/TCP 8m16s +chatqna-data-prep ClusterIP 10.110.143.212 6007/TCP 8m16s +chatqna-nginx NodePort 10.100.224.12 80:30304/TCP 8m16s +chatqna-redis-vector-db ClusterIP 10.205.9.19 6379/TCP,8001/TCP 8m16s +chatqna-retriever-usvc ClusterIP 10.202.3.15 7000/TCP 8m16s +chatqna-tei ClusterIP 10.105.204.12 80/TCP 8m16s +chatqna-teirerank ClusterIP 10.115.146.21 80/TCP 8m16s +chatqna-tgi ClusterIP 10.108.195.244 80/TCP 8m16s +kubernetes ClusterIP 10.92.0.100 443/TCP 11d ``` To begin port forwarding, which maps a service's port from the cluster to local host for testing, use: ```bash @@ -208,7 +203,33 @@ chatqna-tgi ClusterIP 100.XX.XX.133 80/TCP Use `ctrl+c` to end the port-forwarding to test other services. -### Dataprep Microservice(Optional) + +### MegaService Before RAG Dataprep + +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna 8888:8888 +``` +Follow the below steps in a different terminal. + +``` +curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "model": "Intel/neural-chat-7b-v3-3", + "messages": "What is the revenue of Nike in 2023?" + }' + +``` +Here is the output for your reference: +```bash +data: b' O', data: b'PE', data: b'A', data: b' stands', data: b' for', data: b' Organization', data: b' of', data: b' Public', data: b' Em', data: b'ploy', data: b'ees', data: b' of', data: b' Alabama', data: b'.', data: b' It', data: b' is', data: b' a', data: b' labor', data: b' union', data: b' representing', data: b' public', data: b' employees', data: b' in', data: b' the', data: b' state', data: b' of', data: b' Alabama', data: b',', data: b' working', data: b' to', data: b' protect', data: b' their', data: b' rights', data: b' and', data: b' interests', data: b'.', data: b'', data: b'', data: [DONE] +``` +which is essentially the following sentence: +``` +OPEA stands for Organization of Public Employees of Alabama. It is a labor union representing public employees in the state of Alabama, working to protect their rights and interests. +``` +In the upcoming sections we will see how this answer can be improved with RAG. + +### Dataprep Microservice Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: ```bash kubectl port-forward svc/chatqna-data-prep 6007:6007 @@ -220,60 +241,49 @@ commands. The dataprep microservice extracts the texts from variety of data sources, chunks the data, embeds each chunk using embedding microservice and store the embedded vectors in the redis vector database. -Local File `nke-10k-2023.pdf` Upload: +this example leverages the OPEA document for its RAG based content. You can download the [OPEA document](https://opea-project.github.io/latest/_downloads/41c91aec1d47f20ca22350daa8c2cadc/what_is_opea.pdf) and upload it using the UI. + + +Local File `what_is_opea.pdf` Upload: ``` curl -X POST "http://localhost:6007/v1/dataprep" \ -H "Content-Type: multipart/form-data" \ - -F "files=@./nke-10k-2023.pdf" + -F "files=@./what_is_opea.pdf" ``` This command updates a knowledge base by uploading a local file for processing. Update the file path according to your environment. -Add Knowledge Base via HTTP Links: - +You should see the following output after successful execution: ``` -curl -X POST "http://localhost:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' +{"status":200,"message":"Data preparation succeeded"} ``` +For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29) -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -``` -curl -X POST "http://localhost:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" +### MegaService After RAG Dataprep +Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: +```bash +kubectl port-forward svc/chatqna 8888:8888 ``` +Similarly, follow the below steps in a different terminal. -To delete the file/link you uploaded you can use the following commands: - -#### Delete link ``` -# The dataprep service will add a .txt postfix for link file +curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ + "model": "Intel/neural-chat-7b-v3-3", + "messages": "What is OPEA?" + }' -curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" ``` +After uploading the pdf with information about OPEA, we can see that the pdf is being used as a context to answer the question correctly: -#### Delete file - +```bash +data: b' O', data: b'PE', data: b'A', data: b' (', data: b'Open', data: b' Platform', data: b' for', data: b' Enterprise', data: b' AI', data: b')', data: b' is', data: b' a', data: b' framework', data: b' that', data: b' focuses', data: b' on', data: b' creating', data: b' and', data: b' evalu', data: b'ating', data: b' open', data: b',', data: b' multi', data: b'-', data: b'provider', data: b',', data: b' robust', data: b',', data: b' and', data: b' compos', data: b'able', data: b' gener', data: b'ative', data: b' AI', data: b' (', data: b'Gen', data: b'AI', data: b')', data: b' solutions', data: b'.', data: b' It', data: b' aims', data: b' to', data: b' facilitate', data: b' the', data: b' implementation', data: b' of', data: b' enterprise', data: b'-', data: b'grade', data: b' composite', data: b' Gen', data: b'AI', data: b' solutions', data: b',', data: b' particularly', data: b' Ret', data: b'riev', data: b'al', data: b' Aug', data: b'ment', data: b'ed', data: b' Gener', data: b'ative', data: b' AI', data: b' (', data: b'R', data: b'AG', data: b'),', data: b' by', data: b' simpl', data: b'ifying', data: b' the', data: b' integration', data: b' of', data: b' secure', data: b',', data: b' perform', data: b'ant', data: b',', data: b' and', data: b' cost', data: b'-', data: b'effective', data: b' Gen', data: b'AI', data: b' work', data: b'fl', data: b'ows', data: b' into', data: b' business', data: b' systems', data: b'.', data: b'', data: b'', data: [DONE] ``` -curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "nke-10k-2023.pdf"}' \ - -H "Content-Type: application/json" +The above output has been parsed into the below sentence which shows how the LLM has picked up the right context to answer the question correctly after the document upload: ``` - -#### Delete all uploaded files and links - -``` -curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" +OPEN Platform for Enterprise AI (Open Platform for Enterprise AI) is a framework that focuses on creating and evaluating open, multi-provider, robust, and composable generative AI (GenAI) solutions. It aims to facilitate the implementation of enterprise-grade composite GenAI solutions, particularly Retrieval Augmented Generative AI (RAG), by simplifying the integration of secure, performant, and cost-effective GenAI workflows into business systems. ``` ### TEI Embedding Service @@ -297,23 +307,6 @@ curl http://localhost:6006/embed \ In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of length 768. -### Embedding Microservice -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-embedding-usvc 6000:6000 -``` -Follow the below steps in a different terminal. - -The embedding microservice depends on the TEI embedding service. In terms of -input parameters, it takes in a string, embeds it into a vector using the TEI -embedding service and pads other default parameters that are required for the -retrieval microservice and returns it. -``` -curl http://localhost:6000/v1/embeddings\ - -X POST \ - -d '{"text":"hello"}' \ - -H 'Content-Type: application/json' -``` ### Retriever Microservice Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: @@ -371,48 +364,8 @@ curl http://localhost:8808/rerank \ Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` -### Reranking Microservice -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-reranking-usvc 8000:8000 -``` -Follow the below steps in a different terminal. - -The reranking microservice consumes the TEI Reranking service and pads the -response with default parameters required for the llm microservice. - -``` -curl http://localhost:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?", "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -The input to the microservice is the `initial_query` and a list of retrieved -documents and it outputs the most relevant document to the initial query along -with other default parameter such as temperature, `repetition_penalty`, -`chat_template` and so on. We can also get top n documents by setting `top_n` as one -of the input parameters. For example: - -``` -curl http://localhost:8000/v1/reranking\ - -X POST \ - -d '{"initial_query":"What is Deep Learning?" ,"top_n":2, "retrieved_docs": \ - [{"text":"Deep Learning is not..."}, {"text":"Deep learning is..."}]}' \ - -H 'Content-Type: application/json' -``` - -Here is the output: - -``` -{"id":"e1eb0e44f56059fc01aa0334b1dac313","query":"Human: Answer the question based only on the following context:\n Deep learning is...\n Question: What is Deep Learning?","max_new_tokens":1024,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true} - -``` -You may notice reranking microservice are with state ('ID' and other meta data), -while reranking service are not. -### vLLM and TGI Service +### TGI Service Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: ```bash @@ -451,117 +404,92 @@ and the log shows model warm up, please wait for a while and try it later. 2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model ``` -### LLM Microservice +### Dataprep Microservice (Advanced) -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-llm-uservice 9000:9000 -``` -Follow the below steps in a different terminal. +Add Knowledge Base via HTTP Links: ``` -curl http://localhost:9000/v1/chat/completions\ - -X POST \ - -d '{"query":"What is Deep Learning?","max_new_tokens":17,"top_k":10,"top_p":0.95,\ - "typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ - -H 'Content-Type: application/json' - +curl -X POST "http://localhost:6007/v1/dataprep" \ + -H "Content-Type: multipart/form-data" \ + -F 'link_list=["https://opea.dev"]' ``` -You will get generated text from LLM: +This command updates a knowledge base by submitting a list of HTTP links for processing. + +Also, you are able to get the file list that you uploaded: ``` -data: b'\n' -data: b'\n' -data: b'Deep' -data: b' learning' -data: b' is' -data: b' a' -data: b' subset' -data: b' of' -data: b' machine' -data: b' learning' -data: b' that' -data: b' uses' -data: b' algorithms' -data: b' to' -data: b' learn' -data: b' from' -data: b' data' -data: [DONE] +curl -X POST "http://localhost:6007/v1/dataprep/get_file" \ + -H "Content-Type: application/json" + ``` -### MegaService -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kkubectl port-forward svc/chatqna 8888:8888 +To delete the file/link you uploaded you can use the following commands: + +#### Delete link ``` -Follow the below steps in a different terminal. +# The dataprep service will add a .txt postfix for link file +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "https://opea.dev.txt"}' \ + -H "Content-Type: application/json" ``` -curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ - "model": "Intel/neural-chat-7b-v3-3", - "messages": "What is the revenue of Nike in 2023?" - }' + +#### Delete file ``` +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "what_is_opea.pdf"}' \ + -H "Content-Type: application/json" +``` -Here is the output for your reference: +#### Delete all uploaded files and links ``` -data: b'\n' -data: b'An' -data: b'swer' -data: b':' -data: b' In' -data: b' fiscal' -data: b' ' -data: b'2' -data: b'0' -data: b'2' -data: b'3' -data: b',' -data: b' N' -data: b'I' -data: b'KE' -data: b',' -data: b' Inc' -data: b'.' -data: b' achieved' -data: b' record' -data: b' Rev' -data: b'en' -data: b'ues' -data: b' of' -data: b' $' -data: b'5' -data: b'1' -data: b'.' -data: b'2' -data: b' billion' -data: b'.' -data: b'' -data: [DONE] +curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ + -d '{"file_path": "all"}' \ + -H "Content-Type: application/json" ``` + + + ## Launch UI ### Basic UI To access the frontend, open the following URL in your browser: `http://{k8s-node-ip-address}:${port}` You can find the NGINX port using the following command: ```bash -export port=$(kubectl get service chatqna-nginx --output='jsonpath={.spec.ports[0].nodePort}') -echo $port +kubectl get service chatqna-nginx ``` -Open a browser to access `http://:${port}` - - By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `GenAIInfra/helm-charts/chatqna/values.yaml` file as shown below: +Which shows the Nginx port as follows: ``` -chatqna-ui: - image: - repository: "opea/chatqna-ui" - tag: "latest" - containerPort: "5173" +NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE +chatqna-nginx NodePort 10.201.220.120 80:30304/TCP 16h ``` +We can see that it is serving at port `30304` based on this configuration via a NodePort. + +Next step is to get the `` by running: +```bash +kubectl get nodes -o wide +``` +The command shows internal IPs for all the nodes in the cluster: +``` +NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME +minikube Ready control-plane 11d v1.31.0 190.128.49.1 Ubuntu 22.04.4 LTS 5.15.0-124-generic docker://27.2.0 +``` +When using a NodePort, all the nodes in the cluster will be listening at the specified port, which is `30304` in this example. The `` can be found under INTERNAL-IP. Here it is `190.128.49.1`. + +Open a browser to access `http://:${port}`. +From the configuration shown above, it would be `http://190.128.49.1:30304` + +Alternatively, You can also choose to use port forwarding as shown previously using: +```bash +kubectl port-forward service/chatqna-nginx 8080:80 +``` +and open a browser to access `http://localhost:8080` + + Visit this [link](https://opea-project.github.io/latest/getting-started/README.html#:~:text=tei%2Dembedding%2Dserver%20%20%20%20%20%20%20%20%20%7C-,Interact%20with%20ChatQnA,-%C2%B6) to see how to interact with the UI. + ### Stop the services Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: ``` From 301aeab007192aeedf7d6dd5f5543975365cfc2a Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 21:55:52 +0000 Subject: [PATCH 07/19] remove manifest, keep helm, address minor changes Signed-off-by: devpramod --- .../ChatQnA/deploy/k8s_getting_started.md | 50 +- examples/ChatQnA/deploy/k8s_helm.md | 35 +- examples/ChatQnA/deploy/k8s_manifest.md | 480 ------------------ 3 files changed, 36 insertions(+), 529 deletions(-) delete mode 100644 examples/ChatQnA/deploy/k8s_manifest.md diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md index f24aa273..38c84c68 100644 --- a/examples/ChatQnA/deploy/k8s_getting_started.md +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -1,17 +1,15 @@ # Getting Started with Kubernetes for ChatQnA ## Introduction -Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences: -- **Using GMC (GenAI Microservices Connector)**: GMC can be used to compose and adjust GenAI pipelines dynamically on kubernetes for enhanced service connectivity and management. -- **Using Manifests**: This involves deploying directly using Kubernetes manifest files without the GenAI Microservices Connector (GMC). -- **Using Helm Charts**: Facilitates deployment through Helm, which manages Kubernetes applications through packages of pre-configured Kubernetes resources. +Kubernetes is an orchestration platform for managing containerized applications, ideal for deploying microservices based architectures like ChatQnA. It offers robust mechanisms for automating deployment, scaling, and operations of application containers across clusters of hosts. Kubernetes supports different deployment modes for ChatQnA, which cater to various operational preferences. We will see how ChatQnA can be deployed via `Helm`, a package manager for Kubernetes that simplifies the deployment, management, and versioning of Kubernetes applications using pre-configured templates called charts. -This guide will provide detailed instructions on using these resources. If you're already familiar with Kubernetes, feel free to skip ahead to [Helm Deployment](./k8s_helm.md) + +This guide will provide detailed instructions on using Kubernetes and Helm. If you're already familiar with Kubernetes, feel free to skip ahead to [Helm Deployment](./k8s_helm.md) ### Kubernetes Cluster and Development Environment -**Setting Up the Kubernetes Cluster:** Before beginning deployment for the ChatQnA application, ensure that a Kubernetes cluster is ready. For guidance on setting up your Kubernetes cluster, please refer to the comprehensive setup instructions available on the [Opea Project deployment guide](https://opea-project.github.io/latest/deploy/index.html). +**Setting Up the Kubernetes Cluster:** Before beginning deployment for the ChatQnA application, ensure that a Kubernetes cluster is ready. For guidance on setting up your Kubernetes cluster, please refer to the comprehensive setup instructions available at [Kubernetes Installation Options](https://opea-project.github.io/latest/guide/installation/k8s_install/README.html). **Development Pre-requisites:** To prepare for the deployment, familiarize yourself with the necessary development tools and configurations by visiting the [GenAI Infrastructure development page](https://opea-project.github.io/latest/GenAIInfra/DEVELOPMENT.html). This page covers all the essential tools and settings needed for effective development within the Kubernetes environment. @@ -27,17 +25,6 @@ This guide will provide detailed instructions on using these resources. If you'r kubectl get nodes ``` -This command lists all the nodes in the cluster, verifying that `kubectl` is correctly configured and has the necessary permissions to interact with the cluster. - -Some commonly used kubectl commands and their functions that will help deploy ChatQnA successfully: -|Command |Function | -|------------------------------- |-----------------------------| -|`kubectl describe pod ` | Provides detailed information about a specific pod, including its current state, recent events, and configuration details. | -|`kubectl delete deployments --all` | Deletes all deployments in the current namespace, which effectively removes all the managed pods and associated resources. | -|`kubectl get pods -o wide` | Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on. | -|`kubectl logs ` | Fetches the logs generated by a container in a specific pod, useful for debugging and monitoring application behavior. | -|`kubectl get svc` | Lists all services in the current namespace, providing a quick overview of the network services and their status. - #### Create and Set Namespace A Kubernetes namespace is a logical division within a cluster that is used to isolate different environments, teams, or projects, allowing for finer control over resources and access management. To create a namespace called `chatqa`, use: ```bash @@ -54,6 +41,19 @@ If you want to avoid specifying the namespace with every command, you can set th kubectl config set-context --current --namespace=chatqa ``` +This command lists all the nodes in the cluster, verifying that `kubectl` is correctly configured and has the necessary permissions to interact with the cluster. + +Some commonly used kubectl commands and their functions that will help deploy ChatQnA successfully: +|Command |Function | +|------------------------------- |-----------------------------| +|`kubectl describe pod ` | Provides detailed information about a specific pod, including its current state, recent events, and configuration details. | +|`kubectl delete -f ` | Deletes all the resources in the current namespace, which effectively removes all the managed pods and associated resources. | +|`kubectl get pods -o wide` | Retrieves a detailed list of all pods in the current namespace, including additional information like IP addresses and the nodes they are running on. | +|`kubectl logs ` | Fetches the logs generated by a container in a specific pod, useful for debugging and monitoring application behavior. | +|`kubectl get svc` | Lists all services in the current namespace, providing a quick overview of the network services and their status. + + + ### Using Helm Charts to Deploy **What is Helm?** Helm is a package manager for Kubernetes, similar to how apt is for Ubuntu. It simplifies deploying and managing Kubernetes applications through Helm charts, which are packages of pre-configured Kubernetes resources. @@ -63,7 +63,7 @@ kubectl config set-context --current --namespace=chatqa | Component |Description | | --- | --- | | `Chart.yaml` | This file contains metadata about the chart such as name, version, and description. | -| `values.yaml` | Stores configuration values that can be customized depending on the deployment environment. These values override defaults set in the chart templates. | +| `values.yaml` | Overridable configuration values for the Helm chart deployment, used in the chart k8s object templates. | | `deployment.yaml` | Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. | **Update Dependencies:** @@ -75,16 +75,4 @@ kubectl config set-context --current --namespace=chatqa - `helm install [RELEASE_NAME] [CHART_NAME]`: This command deploys a Helm chart into your Kubernetes cluster, creating a new release. It is used to set up all the Kubernetes resources specified in the chart and track the version of the deployment. -For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). - -### Using Kubernetes Manifest to Deploy -Manifest files in YAML format define the Kubernetes resources you want to manage. The main components in a manifest file include: - -- **ConfigMap**: Stores configuration data that can be used by pods, allowing you to keep containerized applications portable without embedding configuration data directly within the application's images. For example, a ConfigMap might store the database URL and credentials that your application needs to connect to a database. - -- **Services**: Defines a logical set of Pods and a policy by which to access them. This resource abstracts the way you expose an application running on a set of Pods as a network service. - -- **Deployment**: Manages the state of replicated application instances. It automatically replaces instances that fail or are deleted, maintaining the desired state of the application. - - -For more detailed examples, you can view the [ChatQnA manifest file](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml) which includes definitions for services, deployments, and other resources essential for running the ChatQnA application. This file is a reference for understanding how Kubernetes resources for ChatQnA are defined and orchestrated. \ No newline at end of file +For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). \ No newline at end of file diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 15c1a86e..ce3b7260 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -1,6 +1,8 @@ # Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm -This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md). +This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. + +For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md). ## Overview @@ -14,7 +16,7 @@ GenAIComps to deploy a multi-node TGI megaservice solution. 4. Reranking 5. LLM with TGI -> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node. +> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node and it has resources (memory) for running all of them. ## Prerequisites @@ -62,7 +64,7 @@ global: ``` ## Use Case Setup -The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as Docker image sources and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common). +The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as container image name and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common). This use case employs a tailored combination of Helm charts and `values.yaml` configurations to deploy the following components and tools: |use case components | Tools | Model | Service Type | @@ -79,7 +81,7 @@ environment variable or `values.yaml` Set a new [namespace](#create-and-set-namespace) and switch to it if needed -To enable UI, uncomment the lines `56-62` in `GenAIInfra/helm-charts/chatqna/values.yaml`: +To enable UI, uncomment the following lines in `GenAIInfra/helm-charts/chatqna/values.yaml`: ```bash chatqna-ui: image: @@ -215,9 +217,8 @@ Follow the below steps in a different terminal. ``` curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "model": "Intel/neural-chat-7b-v3-3", - "messages": "What is the revenue of Nike in 2023?" + "messages": "What is OPEA?" }' - ``` Here is the output for your reference: ```bash @@ -259,7 +260,7 @@ You should see the following output after successful execution: ``` {"status":200,"message":"Data preparation succeeded"} ``` -For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29) +For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-advanced) ### MegaService After RAG Dataprep @@ -274,7 +275,6 @@ curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ "model": "Intel/neural-chat-7b-v3-3", "messages": "What is OPEA?" }' - ``` After uploading the pdf with information about OPEA, we can see that the pdf is being used as a context to answer the question correctly: @@ -304,7 +304,7 @@ curl http://localhost:6006/embed \ -H 'Content-Type: application/json' ``` -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of +In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the `curl` command is a embedded vector of length 768. @@ -330,7 +330,6 @@ curl http://localhost:7000/v1/retrieval \ -X POST \ -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ -H 'Content-Type: application/json' - ``` The output of the retriever microservice comprises of the a unique id for the request, initial query or the input to the retrieval microservice, a list of top @@ -378,16 +377,15 @@ curl http://localhost:9009/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ -H 'Content-Type: application/json' - ``` -TGI service generate text for the input prompt. Here is the expected result from TGI: +TGI service generates text for the input prompt. Here is the expected result from TGI: ``` {"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} ``` -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. +**NOTE**: After TGI service is started, it takes few minutes to load a LLM model and warm up, before reaching `Ready` state. If you get @@ -416,12 +414,11 @@ curl -X POST "http://localhost:6007/v1/dataprep" \ This command updates a knowledge base by submitting a list of HTTP links for processing. -Also, you are able to get the file list that you uploaded: +To get list of uploaded files: ``` curl -X POST "http://localhost:6007/v1/dataprep/get_file" \ -H "Content-Type: application/json" - ``` To delete the file/link you uploaded you can use the following commands: @@ -454,7 +451,7 @@ curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ ## Launch UI -### Basic UI +### Basic UI via NodePort To access the frontend, open the following URL in your browser: `http://{k8s-node-ip-address}:${port}` You can find the NGINX port using the following command: @@ -482,16 +479,18 @@ When using a NodePort, all the nodes in the cluster will be listening at the spe Open a browser to access `http://:${port}`. From the configuration shown above, it would be `http://190.128.49.1:30304` +### Basic UI via Port Forwarding + Alternatively, You can also choose to use port forwarding as shown previously using: ```bash kubectl port-forward service/chatqna-nginx 8080:80 ``` and open a browser to access `http://localhost:8080` - Visit this [link](https://opea-project.github.io/latest/getting-started/README.html#:~:text=tei%2Dembedding%2Dserver%20%20%20%20%20%20%20%20%20%7C-,Interact%20with%20ChatQnA,-%C2%B6) to see how to interact with the UI. + Visit this [link](https://opea-project.github.io/latest/getting-started/README.html#interact-with-chatqna) to see how to interact with the UI. ### Stop the services -Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +Once you are done with the entire pipeline and wish to stop and remove all the resources, use the command below: ``` helm uninstall chatqna ``` diff --git a/examples/ChatQnA/deploy/k8s_manifest.md b/examples/ChatQnA/deploy/k8s_manifest.md deleted file mode 100644 index f6677573..00000000 --- a/examples/ChatQnA/deploy/k8s_manifest.md +++ /dev/null @@ -1,480 +0,0 @@ -# Multi-node on-prem deployment with TGI on Xeon Scalable processors on a K8s cluster using Helm Charts - in helm update - -This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). -For a quick introduction on deploying using Mainfest files, visit the `Using Kubernetes Manifest to Deploy` section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md) - -## Overview - -There are several ways to setup a ChatQnA use case. Here in this tutorial, we -will walk through how to enable the below list of microservices from OPEA -[GenAIComps](https://github.com/opea-project/GenAIComps) to deploy a multi-node TGI megaservice solution. - -1. Data Prep -2. Embedding -3. Retriever -4. Reranking -5. LLM with TGI - -> **Note:** ChatQnA can also be deployed on a single node using Kubernetes, provided that all pods are configured to run on the same node. - -## Prerequisites - -### Clone Repository -To set up the workspace for deploying ChatQnA via Kubernetes, start by cloning the `GenAIExamples` repository and navigate to the ChatQnA Kubernetes manifests directory: - -```bash -https://github.com/opea-project/GenAIExamples.git -``` -Checkout the release tag -``` -cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest -git checkout tags/v1.1 -``` -### Bfloat16 Inference Optimization -We recommend using newer CPUs, such as 4th Gen Intel Xeon Scalable processors (code-named Sapphire Rapids) and later, that support the bfloat16 data type. If your hardware includes such CPUs and your model is compatible with bfloat16, adding the `--dtype bfloat16` argument to the HuggingFace `text-generation-inference` server can significantly reduce memory usage by half and provide a moderate speed boost. This change has already been configured in the `chatqna_bf16.yaml` file. To use it, follow these steps: - -Run `kubectl get nodes` and identify the nodes in your cluster with BFloat16 support and label the nodes to schedule the service on it automatically: - -```bash -kubectl label node node-type=node-bfloat16 -``` - ->**Note:** The manifest folder has several configuration pipelines that can be deployed for ChatQnA. In this example, we'll use the `chatqna_bf16.yaml` configuration. You can use `chatqna.yaml` instead if you don't have BFloat16 support in your nodes. - -### HF Token -The example can utilize model weights from HuggingFace and langchain. - -Setup your [HuggingFace](https://huggingface.co/) account and generate -[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). - -Add the HuggingFace token to the manifest -```bash -export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" -# write the token to appropriate places in the manifest file -sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna_bf16.yaml -``` - -### Proxy Settings -If you are behind a corporate VPN, proxy settings must be added for services requiring internet access, such as the LLM microservice, embedding service, reranking service, and other backend services. -Proxy settings can be set in the `ConfigMap` section across the manifest file. One example for `chatqna-tei-config` is shown below. - -To configure proxy settings for the Text Embedding Inference (TEI) microservice using a `ConfigMap` in the `chatqna_bf16.yaml` manifest, open `chatqna_bf16.yaml` in an editor and populate the `http_proxy`, `https_proxy` and `no_proxy` fields under `data` marked by `#1`, `#2` and `#3` as follows: - -```bash -vi chatqna_bf16.yaml -``` -```yaml -apiVersion: v1 -kind: ConfigMap -metadata: - name: chatqna-tei-config - labels: - helm.sh/chart: tei-1.0.0 - app.kubernetes.io/name: tei - app.kubernetes.io/instance: chatqna - app.kubernetes.io/version: "cpu-1.5" - app.kubernetes.io/managed-by: Helm -data: - MODEL_ID: "BAAI/bge-base-en-v1.5" - PORT: "2081" - http_proxy: "http://your-proxy-address:port" #1 - https_proxy: "http://your-proxy-address:port" #2 - no_proxy: "localhost,127.0.0.1,localaddress,.localdomain.com" #3 - NUMBA_CACHE_DIR: "/tmp" - TRANSFORMERS_CACHE: "/tmp/transformers_cache" - HF_HOME: "/tmp/.cache/huggingface" - MAX_WARMUP_SEQUENCE_LENGTH: "512" -``` -## Use Case Setup - -As mentioned the use case will use the following combination of the [GenAIComps](https://github.com/opea-project/GenAIComps) with the tools: - -|use case components | Tools | Model | Service Type | -|---------------- |--------------|-----------------------------|-------| -|Data Prep | LangChain | NA |OPEA Microservice | -|VectorDB | Redis | NA |Open source service| -|Embedding | TEI | BAAI/bge-base-en-v1.5 |OPEA Microservice | -|Reranking | TEI | BAAI/bge-reranker-base | OPEA Microservice | -|LLM | TGI |Intel/neural-chat-7b-v3-3 |OPEA Microservice | -|UI | | NA | Gateway Service | - -Tools and models mentioned in the table are configurable either through the -`chatqna_bf16.yaml` - - -## Deploy the use case - -Set a new [namespace](#create-and-set-namespace) and switch to it if needed and run: - - ```bash - kubectl apply -f chatqna_bf16.yaml -``` - -It takes a few minutes for all the microservices to be up and running. Go to the next section which is [Validate Microservices](#validate-microservices) to verify that the deployment is successful. - - - -### Validate microservice -#### Check the pod status -To check if all the pods have started, run: - -```bash -kubectl get pods -``` -You should expect a similar output as below: -``` -NAME READY STATUS RESTARTS AGE -chatqna-chatqna-ui-77dbdfc949-6dtms 1/1 Running 0 5m7s -chatqna-data-prep-798f59f447-4frqt 1/1 Running 0 5m7s -chatqna-df57cc766-t6lkg 1/1 Running 0 5m7s -chatqna-nginx-5dd47bfc7d-54x96 1/1 Running 0 5m7s -chatqna-redis-vector-db-7f489b6bb6-mvzbw 1/1 Running 0 5m7s -chatqna-retriever-usvc-6695979d67-z5jgx 1/1 Running 0 5m7s -chatqna-tei-769dc796c-gh5vx 1/1 Running 0 5m7s -chatqna-teirerank-54f58c596c-76xqz 1/1 Running 0 5m7s -chatqna-tgi-7b5556d46d-pnzph 1/1 Running 0 5m7s -``` -> [!NOTE] -> Use `kubectl get pods -o wide` to check the nodes that the respective pods are running on - -The ChatQnA deployment starts 9 Kubernetes services. Ensure that all associated pods are running, i.e., all the pods' statuses are 'Running'. - -When issues are encountered with a pod in the Kubernetes deployment, there are two primary commands to diagnose and potentially resolve problems: -1. **Checking Logs**: To view the logs of a specific pod, which can provide insight into what the application is doing and any errors it might be encountering, use: - ```bash - kubectl logs - ``` -2. **Describing Pods**: For a detailed view of the pod's current state, its configuration, and its operational events, run: - ```bash - kubectl describe pod - ``` -For example, if the status of the TGI service does not show 'Running', describe the pod using the name from the above table: -```bash -kubectl describe pod chatqna-tgi-778bb6598f-cv5cg -``` -or check logs using: -```bash -kubectl logs chatqna-tgi-778bb6598f-cv5cg -``` - -## Interacting with ChatQnA deployment -This section will walk you through what are the different ways to interact with -the microservices deployed - -Before starting the validation of microservices, check the network configuration of services using: -```bash -kubectl get svc -``` - This command will display a list of services along with their network-related details such as cluster IP and ports. - ``` -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -chatqna ClusterIP 10.108.186.198 8888/TCP 8m16s -chatqna-chatqna-ui ClusterIP 10.102.80.123 5173/TCP 8m16s -chatqna-data-prep ClusterIP 10.110.143.212 6007/TCP 8m16s -chatqna-nginx NodePort 10.100.224.12 80:30304/TCP 8m16s -chatqna-redis-vector-db ClusterIP 10.205.9.19 6379/TCP,8001/TCP 8m16s -chatqna-retriever-usvc ClusterIP 10.202.3.15 7000/TCP 8m16s -chatqna-tei ClusterIP 10.105.204.12 80/TCP 8m16s -chatqna-teirerank ClusterIP 10.115.146.21 80/TCP 8m16s -chatqna-tgi ClusterIP 10.108.195.244 80/TCP 8m16s -kubernetes ClusterIP 10.92.0.100 443/TCP 11d - ``` - To begin port forwarding, which maps a service's port from the cluster to local host for testing, use: - ```bash - kubectl port-forward svc/[service-name] [local-port]:[service-port] - ``` - Replace `[service-name]`, `[local-port]`, and `[service-port]` with the appropriate values from your services list (as shown in the output given by `kubectl get svc`). This setup enables interaction with the microservice directly from the local machine. In another terminal, use `curl` commands to test the functionality and response of the service. - -Use `ctrl+c` to end the port-forwarding to test other services. - -### MegaService Before RAG Dataprep - -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna 8888:8888 -``` -Follow the below steps in a different terminal. - -``` -curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ - "model": "Intel/neural-chat-7b-v3-3", - "messages": "What is the revenue of Nike in 2023?" - }' - -``` -Here is the output for your reference: -```bash -data: b' O', data: b'PE', data: b'A', data: b' stands', data: b' for', data: b' Organization', data: b' of', data: b' Public', data: b' Em', data: b'ploy', data: b'ees', data: b' of', data: b' Alabama', data: b'.', data: b' It', data: b' is', data: b' a', data: b' labor', data: b' union', data: b' representing', data: b' public', data: b' employees', data: b' in', data: b' the', data: b' state', data: b' of', data: b' Alabama', data: b',', data: b' working', data: b' to', data: b' protect', data: b' their', data: b' rights', data: b' and', data: b' interests', data: b'.', data: b'', data: b'', data: [DONE] -``` -which is essentially the following sentence: -``` -OPEA stands for Organization of Public Employees of Alabama. It is a labor union representing public employees in the state of Alabama, working to protect their rights and interests. -``` -In the upcoming sections we will see how this answer can be improved with RAG. - -### Dataprep Microservice -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-data-prep 6007:6007 -``` -Follow the below steps in a different terminal. - -If you want to add/update the default knowledge base, you can use the following -commands. The dataprep microservice extracts the texts from variety of data -sources, chunks the data, embeds each chunk using embedding microservice and -store the embedded vectors in the redis vector database. - -this example leverages the OPEA document for its RAG based content. You can download the [OPEA document](https://opea-project.github.io/latest/_downloads/41c91aec1d47f20ca22350daa8c2cadc/what_is_opea.pdf) and upload it using the UI. - - -Local File `what_is_opea.pdf` Upload: - -``` -curl -X POST "http://localhost:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F "files=@./what_is_opea.pdf" -``` - -This command updates a knowledge base by uploading a local file for processing. -Update the file path according to your environment. - -You should see the following output after successful execution: -``` -{"status":200,"message":"Data preparation succeeded"} -``` -For advanced usage of the dataprep microservice refer [here](#dataprep-microservice-%28advanced%29) - -### MegaService After RAG Dataprep - -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna 8888:8888 -``` -Similarly, follow the below steps in a different terminal. - -``` -curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ - "model": "Intel/neural-chat-7b-v3-3", - "messages": "What is OPEA?" - }' - -``` -After uploading the pdf with information about OPEA, we can see that the pdf is being used as a context to answer the question correctly: - -```bash -data: b' O', data: b'PE', data: b'A', data: b' (', data: b'Open', data: b' Platform', data: b' for', data: b' Enterprise', data: b' AI', data: b')', data: b' is', data: b' a', data: b' framework', data: b' that', data: b' focuses', data: b' on', data: b' creating', data: b' and', data: b' evalu', data: b'ating', data: b' open', data: b',', data: b' multi', data: b'-', data: b'provider', data: b',', data: b' robust', data: b',', data: b' and', data: b' compos', data: b'able', data: b' gener', data: b'ative', data: b' AI', data: b' (', data: b'Gen', data: b'AI', data: b')', data: b' solutions', data: b'.', data: b' It', data: b' aims', data: b' to', data: b' facilitate', data: b' the', data: b' implementation', data: b' of', data: b' enterprise', data: b'-', data: b'grade', data: b' composite', data: b' Gen', data: b'AI', data: b' solutions', data: b',', data: b' particularly', data: b' Ret', data: b'riev', data: b'al', data: b' Aug', data: b'ment', data: b'ed', data: b' Gener', data: b'ative', data: b' AI', data: b' (', data: b'R', data: b'AG', data: b'),', data: b' by', data: b' simpl', data: b'ifying', data: b' the', data: b' integration', data: b' of', data: b' secure', data: b',', data: b' perform', data: b'ant', data: b',', data: b' and', data: b' cost', data: b'-', data: b'effective', data: b' Gen', data: b'AI', data: b' work', data: b'fl', data: b'ows', data: b' into', data: b' business', data: b' systems', data: b'.', data: b'', data: b'', data: [DONE] -``` -The above output has been parsed into the below sentence which shows how the LLM has picked up the right context to answer the question correctly after the document upload: -``` -OPEN Platform for Enterprise AI (Open Platform for Enterprise AI) is a framework that focuses on creating and evaluating open, multi-provider, robust, and composable generative AI (GenAI) solutions. It aims to facilitate the implementation of enterprise-grade composite GenAI solutions, particularly Retrieval Augmented Generative AI (RAG), by simplifying the integration of secure, performant, and cost-effective GenAI workflows into business systems. -``` - -### TEI Embedding Service -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-tei 6006:80 -``` -Follow the below steps in a different terminal. - -The TEI embedding service takes in a string as input, embeds the string into a -vector of a specific length determined by the embedding model and returns this -embedded vector. - -``` -curl http://localhost:6006/embed \ - -X POST \ - -d '{"inputs":"What is Deep Learning?"}' \ - -H 'Content-Type: application/json' -``` - -In this example the embedding model used is "BAAI/bge-base-en-v1.5", which has a vector size of 768. So the output of the curl command is a embedded vector of -length 768. - - -### Retriever Microservice -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-retriever-usvc 7000:7000 -``` -Follow the below steps in a different terminal. - -To consume the retriever microservice, you need to generate a mock embedding -vector by Python script. The length of embedding vector is determined by the -embedding model. Here we use the -model EMBEDDING_MODEL_ID="BAAI/bge-base-en-v1.5", which vector size is 768. - -Check the vector dimension of your embedding model and set -`your_embedding` dimension equal to it. - -``` -export your_embedding=$(python3 -c "import random; embedding = [random.uniform(-1, 1) for _ in range(768)]; print(embedding)") - -curl http://localhost:7000/v1/retrieval \ - -X POST \ - -d "{\"text\":\"test\",\"embedding\":${your_embedding}}" \ - -H 'Content-Type: application/json' - -``` -The output of the retriever microservice comprises of the a unique id for the -request, initial query or the input to the retrieval microservice, a list of top -`n` retrieved documents relevant to the input query, and top_n where n refers to -the number of documents to be returned. - -The output is retrieved text that relevant to the input data: -``` -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } - -``` -### TEI Reranking Service - -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-teirerank 8808:80 -``` -Follow the below steps in a different terminal. - -The TEI Reranking Service reranks the documents returned by the retrieval -service. It consumes the query and list of documents and returns the document -index based on decreasing order of the similarity score. The document -corresponding to the returned index with the highest score is the most relevant -document for the input query. -``` -curl http://localhost:8808/rerank \ - -X POST \ - -d '{"query":"What is Deep Learning?", "texts": ["Deep Learning is not...", "Deep learning is..."]}' \ - -H 'Content-Type: application/json' -``` - -Output is: `[{"index":1,"score":0.9988041},{"index":0,"score":0.022948774}]` - - -### TGI Service - -Use the following command to forward traffic from your local machine to the service running in the Kubernetes cluster: -```bash -kubectl port-forward svc/chatqna-tgi 9009:80 -``` -Follow the below steps in a different terminal. - -``` -curl http://localhost:9009/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ - -H 'Content-Type: application/json' - -``` - -TGI service generate text for the input prompt. Here is the expected result from TGI: - -``` -{"generated_text":"We have all heard the buzzword, but our understanding of it is still growing. It’s a sub-field of Machine Learning, and it’s the cornerstone of today’s Machine Learning breakthroughs.\n\nDeep Learning makes machines act more like humans through their ability to generalize from very large"} -``` - -**NOTE**: After launch the TGI, it takes few minutes for TGI server to load LLM model and warm up. - -If you get - -``` -curl: (7) Failed to connect to localhost port 8008 after 0 ms: Connection refused -``` - -and the log shows model warm up, please wait for a while and try it later. - -``` -2024-06-05T05:45:27.707509646Z 2024-06-05T05:45:27.707361Z WARN text_generation_router: router/src/main.rs:357: `--revision` is not set -2024-06-05T05:45:27.707539740Z 2024-06-05T05:45:27.707379Z WARN text_generation_router: router/src/main.rs:358: We strongly advise to set it to a known supported commit. -2024-06-05T05:45:27.852525522Z 2024-06-05T05:45:27.852437Z INFO text_generation_router: router/src/main.rs:379: Serving revision bdd31cf498d13782cc7497cba5896996ce429f91 of model Intel/neural-chat-7b-v3-3 -2024-06-05T05:45:27.867833811Z 2024-06-05T05:45:27.867759Z INFO text_generation_router: router/src/main.rs:221: Warming up model -``` - -### Dataprep Microservice (Advanced) - -Add Knowledge Base via HTTP Links: - -``` -curl -X POST "http://localhost:6007/v1/dataprep" \ - -H "Content-Type: multipart/form-data" \ - -F 'link_list=["https://opea.dev"]' -``` - -This command updates a knowledge base by submitting a list of HTTP links for processing. - -Also, you are able to get the file list that you uploaded: - -``` -curl -X POST "http://localhost:6007/v1/dataprep/get_file" \ - -H "Content-Type: application/json" - -``` - -To delete the file/link you uploaded you can use the following commands: - -#### Delete link -``` -# The dataprep service will add a .txt postfix for link file - -curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "https://opea.dev.txt"}' \ - -H "Content-Type: application/json" -``` - -#### Delete file - -``` -curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "what_is_opea.pdf"}' \ - -H "Content-Type: application/json" -``` - -#### Delete all uploaded files and links - -``` -curl -X POST "http://localhost:6007/v1/dataprep/delete_file" \ - -d '{"file_path": "all"}' \ - -H "Content-Type: application/json" -``` - - - -## Launch UI -### Basic UI -To access the frontend, open the following URL in your browser: -`http://{k8s-node-ip-address}:${port}` -You can find the NGINX port using the following command: -```bash -kubectl get service chatqna-nginx -``` -Which shows the Nginx port as follows: -``` -NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE -chatqna-nginx NodePort 10.201.220.120 80:30304/TCP 16h -``` -We can see that it is serving at port `30304` based on this configuration via a NodePort. - -Next step is to get the `` by running: -```bash -kubectl get nodes -o wide -``` -The command shows internal IPs for all the nodes in the cluster: -``` -NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME -minikube Ready control-plane 11d v1.31.0 190.128.49.1 Ubuntu 22.04.4 LTS 5.15.0-124-generic docker://27.2.0 -``` -When using a NodePort, all the nodes in the cluster will be listening at the specified port, which is `30304` in this example. The `` can be found under INTERNAL-IP. Here it is `190.128.49.1`. - -Open a browser to access `http://:${port}`. -From the configuration shown above, it would be `http://190.128.49.1:30304` - -Alternatively, You can also choose to use port forwarding as shown previously using: -```bash -kubectl port-forward service/chatqna-nginx 8080:80 -``` -and open a browser to access `http://localhost:8080` - - Visit this [link](https://opea-project.github.io/latest/getting-started/README.html#:~:text=tei%2Dembedding%2Dserver%20%20%20%20%20%20%20%20%20%7C-,Interact%20with%20ChatQnA,-%C2%B6) to see how to interact with the UI. -### Stop the services -Once you are done with the entire pipeline and wish to stop and remove all the pods, use the command below: -``` -kubectl delete deployments --all -``` \ No newline at end of file From d1e61a8a7e4528a3ff626ccba76ba97c5a3f8aec Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 22:05:19 +0000 Subject: [PATCH 08/19] general container name instead of docker Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index ce3b7260..bf97eedc 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -64,7 +64,7 @@ global: ``` ## Use Case Setup -The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as container image name and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common). +The `GenAIInfra` repository utilizes a structured Helm chart approach, comprising a primary `Charts.yaml` and individual sub-charts for components like the LLM Service, Embedding Service, and Reranking Service. Each sub-chart includes its own `values.yaml` file, enabling specific configurations such as container image name/version and deployment parameters. This modular design facilitates flexible, scalable deployment and easy management of the GenAI application suite within Kubernetes environments. For detailed configurations and common components, visit the [GenAIInfra common components directory](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/common). This use case employs a tailored combination of Helm charts and `values.yaml` configurations to deploy the following components and tools: |use case components | Tools | Model | Service Type | From 52266899d162e54a0ff40c517b45b162f45c8621 Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 22:46:16 +0000 Subject: [PATCH 09/19] default modeldir empty Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index bf97eedc..95dad9a8 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -104,7 +104,7 @@ helm dependency update chatqna Set the necessary environment variables to setup the use case ```bash -export MODELDIR="/mnt/opea-models" #export MODELDIR="null" if you don't want to cache the model. +export MODELDIR="" #export MODELDIR="/mnt/opea-models" if you want to cache the model. export MODELNAME="Intel/neural-chat-7b-v3-3" export EMBEDDING_MODELNAME="BAAI/bge-base-en-v1.5" export RERANKER_MODELNAME="BAAI/bge-reranker-base" From 4bddc94acf53eba8e0198d13d3e8ed0617eccb27 Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 23:21:47 +0000 Subject: [PATCH 10/19] reranker add opea retrieved doc Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 95dad9a8..c031eb66 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -338,8 +338,7 @@ the number of documents to be returned. The output is retrieved text that relevant to the input data: ``` -{"id":"27210945c7c6c054fa7355bdd4cde818","retrieved_docs":[{"id":"0c1dd04b31ab87a5468d65f98e33a9f6","text":"Company: Nike. financial instruments are subject to master netting arrangements that allow for the offset of assets and liabilities in the event of default or early termination of the contract.\nAny amounts of cash collateral received related to these instruments associated with the Company's credit-related contingent features are recorded in Cash and\nequivalents and Accrued liabilities, the latter of which would further offset against the Company's derivative asset balance. Any amounts of cash collateral posted related\nto these instruments associated with the Company's credit-related contingent features are recorded in Prepaid expenses and other current assets, which would further\noffset against the Company's derivative liability balance. Cash collateral received or posted related to the Company's credit-related contingent features is presented in the\nCash provided by operations component of the Consolidated Statements of Cash Flows. The Company does not recognize amounts of non-cash collateral received, such\nas securities, on the Consolidated Balance Sheets. For further information related to credit risk, refer to Note 12 — Risk Management and Derivatives.\n2023 FORM 10-K 68Table of Contents\nThe following tables present information about the Company's derivative assets and liabilities measured at fair value on a recurring basis and indicate the level in the fair\nvalue hierarchy in which the Company classifies the fair value measurement:\nMAY 31, 2023\nDERIVATIVE ASSETS\nDERIVATIVE LIABILITIES"},{"id":"1d742199fb1a86aa8c3f7bcd580d94af","text": ... } - +{"id":"13617fc8ac716a9ca5df036fd297b9ad","retrieved_docs":[{"downstream_black_list":[],"id":"7e6f2e6584947f293d6d40cccb7ef58d","text":"applications.\nMicroservices: Flexible and Scalable Architecture\nThe GenAI Microservices documentation describes a suite of microservices. Each microservice is\ndesigned to perform a specific function or task within the application architecture. By breaking\ndown the system into these smaller, self-contained services, microservices promote modularity,\nflexibility, and scalability. This modular approach allows developers to independently develop,\ndeploy, and scale individual components of the application, making it easier to maintain and\nevolve over time. All of the microservices are containerized, allowing cloud native deployment.Megaservices: A Comprehensive Solution\nMegaservices are higher-level architectural constructs composed of one or more microservices.\nUnlike individual microservices, which focus on specific tasks or functions, a megaservice\norchestrates multiple microservices to deliver a comprehensive solution. Megaservices\nencapsulate complex business logic and workflow orchestration, coordinating the interactions\nbetween various microservices to fulfill specific application requirements. This approach enables\nthe creation of modular yet integrated applications. You can find a collection of use case-based\napplications in the GenAI Examples documentation\nGateways: Customized Access to Mega- and Microservices\nThe Gateway serves as the interface for users to access a megaservice, providing customized"},{"downstream_black_list":[],"id":"94197f8afc84ccabd1c95df2cfc91e6f","text":"The Gateway serves as the interface for users to access a megaservice, providing customized\naccess based on user requirements. It acts as the entry point for incoming requests, routing\nthem to the appropriate microservices within the megaservice architecture.\nGateways support API definition, API versioning, rate limiting, and request transformation,\nallowing for fine-grained control over how users interact with the underlying Microservices. By\nabstracting the complexity of the underlying infrastructure, Gateways provide a seamless and\nuser-friendly experience for interacting with the Megaservice.\nNext Step\nLinks to:\nGetting Started Guide\nGet Involved with the OPEA Open Source Community\nBrowse the OPEA wiki, mailing lists, and working groups:\nhttps://wiki.lfaidata.foundation/display/DL/OPEA+Home \nOpen Platform for Enterprise AI (OPEA) Framework Draft Proposal."},{"downstream_black_list":[],"id":"9636f9b479f2412bc8ce177db502c8c9","text":"Latest » OPEA Overview\nOPEA Overview\nOPEA (Open Platform for Enterprise AI) is a framework that enables the creation and evaluation\nof open, multi-provider, robust, and composable generative AI (GenAI) solutions. It harnesses\nthe best innovations across the ecosystem while keeping enterprise-level needs front and\ncenter.\nOPEA simplifies the implementation of enterprise-grade composite GenAI solutions, starting\nwith a focus on Retrieval Augmented Generative AI (RAG). The platform is designed to facilitate\nefficient integration of secure, performant, and cost-effective GenAI workflows into business\nsystems and manage its deployments, leading to quicker GenAI adoption and business value.\nThe OPEA platform includes:\nDetailed framework of composable microservices building blocks for state-of-the-art GenAI\nsystems including LLMs, data stores, and prompt engines\nArchitectural blueprints of retrieval-augmented GenAI component stack structure and end-\nto-end workflows\nMultiple micro- and megaservices to get your GenAI into production and deployed\nA four-step assessment for grading GenAI systems around performance, features,\ntrustworthiness and enterprise-grade readiness\nOPEA Project Architecture\nOPEA uses microservices to create high-quality GenAI applications for enterprises, simplifying\nthe scaling and deployment process for production. These microservices leverage a service\ncomposer that assembles them into a megaservice thereby creating real-world Enterprise AI\napplications."}],"initial_query":"test","top_n":1} ``` ### TEI Reranking Service From 40a3f035abfd54c1b7218525cb322e56da6eacaf Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 23:37:46 +0000 Subject: [PATCH 11/19] add bfloat16 for helm Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index c031eb66..bb8d01fe 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -102,6 +102,13 @@ Next, we will update the dependencies for all Helm charts in the specified direc helm dependency update chatqna ``` +To use the bfloat16 data type for the LLM in TGI, modify the `values.yaml` file located in `GenAIInfra/helm-charts/common/tgi/`. Uncomment or add the following line: + +```yaml +extraCmdArgs: ["--dtype","bfloat16"] +``` +This configuration ensures that TGI processes LLM operations in bfloat16 precision, enabling lower-precision computations for improved performance and reduced memory usage. + Set the necessary environment variables to setup the use case ```bash export MODELDIR="" #export MODELDIR="/mnt/opea-models" if you want to cache the model. From cb6b75d1e78fd823a62d8dc4895caa1ac6e18a93 Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 23:42:36 +0000 Subject: [PATCH 12/19] fix namespace link Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index bb8d01fe..b7876fd9 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -79,7 +79,7 @@ This use case employs a tailored combination of Helm charts and `values.yaml` co Tools and models mentioned in the table are configurable either through the environment variable or `values.yaml` -Set a new [namespace](#create-and-set-namespace) and switch to it if needed +Set a new [namespace](k8s_getting_started.md#create-and-set-namespace) and switch to it if needed To enable UI, uncomment the following lines in `GenAIInfra/helm-charts/chatqna/values.yaml`: ```bash From da2c2b1ae93d8c247a106220c672f039d38f95b2 Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 23:45:30 +0000 Subject: [PATCH 13/19] send users to deployment page Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_getting_started.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md index 38c84c68..c33797b9 100644 --- a/examples/ChatQnA/deploy/k8s_getting_started.md +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -75,4 +75,6 @@ Some commonly used kubectl commands and their functions that will help deploy Ch - `helm install [RELEASE_NAME] [CHART_NAME]`: This command deploys a Helm chart into your Kubernetes cluster, creating a new release. It is used to set up all the Kubernetes resources specified in the chart and track the version of the deployment. -For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). \ No newline at end of file +For more detailed instructions and explanations, you can refer to the [official Helm documentation](https://helm.sh/docs/). + +Continue to [Helm Deployment](./k8s_helm.md) to deploy ChatQnA via Helm. \ No newline at end of file From f98637757b29377191d229c16a2e6fa9430440ab Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 23:46:39 +0000 Subject: [PATCH 14/19] remove manifest in index.rst Signed-off-by: devpramod --- examples/ChatQnA/deploy/index.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/ChatQnA/deploy/index.rst b/examples/ChatQnA/deploy/index.rst index 5eba8254..6fa2e3b1 100644 --- a/examples/ChatQnA/deploy/index.rst +++ b/examples/ChatQnA/deploy/index.rst @@ -24,7 +24,6 @@ Kubernetes K8s Getting Started TGI on Xeon with Helm Charts - TGI on Xeon with Kubernetes Manifest Cloud Native ************ From 883d804be4f9d275053591697f24bb65a96557a9 Mon Sep 17 00:00:00 2001 From: devpramod Date: Thu, 5 Dec 2024 23:55:43 +0000 Subject: [PATCH 15/19] link intro to helm Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index b7876fd9..cb3147a0 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -2,7 +2,7 @@ This deployment section covers multi-node on-prem deployment of the ChatQnA example with OPEA comps to deploy using the TGI service. There are several slice-n-dice ways to enable RAG with vectordb and LLM models, but here we will be covering one option of doing it for convenience: we will be showcasing how to build an e2e chatQnA with Redis VectorDB and neural-chat-7b-v3-3 model, deployed on a Kubernetes cluster using Helm. -For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, visit the helm section in [Getting Started with Kubernetes for ChatQnA](./k8s_getting_started.md). +For more information on how to setup a Xeon based Kubernetes cluster along with the development pre-requisites, follow the instructions here [Kubernetes Cluster and Development Environment](./k8s_getting_started.md#kubernetes-cluster-and-development-environment). For a quick introduction on Helm Charts, click [here](k8s_getting_started.md#using-helm-charts-to-deploy). ## Overview From a2c98aeea025d782069a5febf951a8f47f7faf42 Mon Sep 17 00:00:00 2001 From: devpramod Date: Wed, 11 Dec 2024 15:33:03 +0000 Subject: [PATCH 16/19] added note for PVC Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 50 +++++++++++++++++++++++++++-- 1 file changed, 48 insertions(+), 2 deletions(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index cb3147a0..63223867 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -20,6 +20,7 @@ GenAIComps to deploy a multi-node TGI megaservice solution. ## Prerequisites + ### Install Helm First, ensure that Helm (version >= 3.15) is installed on your system. Helm is an essential tool for managing Kubernetes applications. It simplifies the deployment and management of Kubernetes applications using Helm charts. For detailed installation instructions, refer to the [Helm Installation Guide](https://helm.sh/docs/intro/install/) @@ -117,6 +118,14 @@ export EMBEDDING_MODELNAME="BAAI/bge-base-en-v1.5" export RERANKER_MODELNAME="BAAI/bge-reranker-base" ``` +> **Note:** +> +> Setting `MODELDIR` to an empty string will download the models without sharing them among worker nodes. This configuration is intended as a quick setup for testing in a single-node environment. +> +> In a multi-node environment, go to every K8s worker node to make sure that a ${MODELDIR} directory exists and is writable. +> +> Another option is to to use K8s persistent volume to share the model data files. For more information click [here](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume). + ## Deploy the use case The `helm install` command will initiate all the aforementioned services such as Kubernetes pods. @@ -228,9 +237,28 @@ curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ }' ``` Here is the output for your reference: + ```bash -data: b' O', data: b'PE', data: b'A', data: b' stands', data: b' for', data: b' Organization', data: b' of', data: b' Public', data: b' Em', data: b'ploy', data: b'ees', data: b' of', data: b' Alabama', data: b'.', data: b' It', data: b' is', data: b' a', data: b' labor', data: b' union', data: b' representing', data: b' public', data: b' employees', data: b' in', data: b' the', data: b' state', data: b' of', data: b' Alabama', data: b',', data: b' working', data: b' to', data: b' protect', data: b' their', data: b' rights', data: b' and', data: b' interests', data: b'.', data: b'', data: b'', data: [DONE] +data: b' O' +data: b'PE' +data: b'A' +data: b' stands' +data: b' Organization' +data: b' of' +data: b' Public' +data: b' Em' +data: b'ploy' +data: b'ees' +data: b' of' +data: b' Alabama' +. +. +. +data: b'' +data: b'' +data: [DONE] ``` + which is essentially the following sentence: ``` OPEA stands for Organization of Public Employees of Alabama. It is a labor union representing public employees in the state of Alabama, working to protect their rights and interests. @@ -286,8 +314,26 @@ curl http://localhost:8888/v1/chatqna -H "Content-Type: application/json" -d '{ After uploading the pdf with information about OPEA, we can see that the pdf is being used as a context to answer the question correctly: ```bash -data: b' O', data: b'PE', data: b'A', data: b' (', data: b'Open', data: b' Platform', data: b' for', data: b' Enterprise', data: b' AI', data: b')', data: b' is', data: b' a', data: b' framework', data: b' that', data: b' focuses', data: b' on', data: b' creating', data: b' and', data: b' evalu', data: b'ating', data: b' open', data: b',', data: b' multi', data: b'-', data: b'provider', data: b',', data: b' robust', data: b',', data: b' and', data: b' compos', data: b'able', data: b' gener', data: b'ative', data: b' AI', data: b' (', data: b'Gen', data: b'AI', data: b')', data: b' solutions', data: b'.', data: b' It', data: b' aims', data: b' to', data: b' facilitate', data: b' the', data: b' implementation', data: b' of', data: b' enterprise', data: b'-', data: b'grade', data: b' composite', data: b' Gen', data: b'AI', data: b' solutions', data: b',', data: b' particularly', data: b' Ret', data: b'riev', data: b'al', data: b' Aug', data: b'ment', data: b'ed', data: b' Gener', data: b'ative', data: b' AI', data: b' (', data: b'R', data: b'AG', data: b'),', data: b' by', data: b' simpl', data: b'ifying', data: b' the', data: b' integration', data: b' of', data: b' secure', data: b',', data: b' perform', data: b'ant', data: b',', data: b' and', data: b' cost', data: b'-', data: b'effective', data: b' Gen', data: b'AI', data: b' work', data: b'fl', data: b'ows', data: b' into', data: b' business', data: b' systems', data: b'.', data: b'', data: b'', data: [DONE] +data: b' O' +data: b'PE' +data: b'A' +data: b' (' +data: b'Open' +data: b' Platform' +data: b' for' +data: b' Enterprise' +data: b' AI' +data: b')', +. +. +. +data: b' systems' +data: b'.' +data: b'' +data: b'' +data: [DONE] ``` + The above output has been parsed into the below sentence which shows how the LLM has picked up the right context to answer the question correctly after the document upload: ``` OPEN Platform for Enterprise AI (Open Platform for Enterprise AI) is a framework that focuses on creating and evaluating open, multi-provider, robust, and composable generative AI (GenAI) solutions. It aims to facilitate the implementation of enterprise-grade composite GenAI solutions, particularly Retrieval Augmented Generative AI (RAG), by simplifying the integration of secure, performant, and cost-effective GenAI workflows into business systems. From 5cd26ffe361dd208c777d22c444ec8cd1b626447 Mon Sep 17 00:00:00 2001 From: devpramod Date: Wed, 11 Dec 2024 15:36:42 +0000 Subject: [PATCH 17/19] update helm key components Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_getting_started.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_getting_started.md b/examples/ChatQnA/deploy/k8s_getting_started.md index c33797b9..e4813476 100644 --- a/examples/ChatQnA/deploy/k8s_getting_started.md +++ b/examples/ChatQnA/deploy/k8s_getting_started.md @@ -64,7 +64,7 @@ Some commonly used kubectl commands and their functions that will help deploy Ch | --- | --- | | `Chart.yaml` | This file contains metadata about the chart such as name, version, and description. | | `values.yaml` | Overridable configuration values for the Helm chart deployment, used in the chart k8s object templates. | -| `deployment.yaml` | Part of the templates directory, this file describes how the Kubernetes resources should be deployed, such as Pods and Services. | +| `templates/` Directory | Contains YAML templates for Kubernetes objects, typically one file per object type (e.g., deployment.yaml for Deployments, service.yaml for Services). For more details, refer to the Helm Templates Best Practices. **Update Dependencies:** From 790bcff9af7ee93cf02be98460ecf58a8aa8d813 Mon Sep 17 00:00:00 2001 From: devpramod Date: Wed, 11 Dec 2024 16:30:24 +0000 Subject: [PATCH 18/19] minor fix in note Signed-off-by: devpramod --- examples/ChatQnA/deploy/k8s_helm.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 63223867..3e76827e 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -122,9 +122,9 @@ export RERANKER_MODELNAME="BAAI/bge-reranker-base" > > Setting `MODELDIR` to an empty string will download the models without sharing them among worker nodes. This configuration is intended as a quick setup for testing in a single-node environment. > -> In a multi-node environment, go to every K8s worker node to make sure that a ${MODELDIR} directory exists and is writable. +> In a multi-node environment, go to every k8s worker node to make sure that a ${MODELDIR} directory exists and is writable. > -> Another option is to to use K8s persistent volume to share the model data files. For more information click [here](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume). +> Another option is to to use k8s persistent volume to share the model data files. For more information see [Using Persistent Volume](https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/README.md#using-persistent-volume). ## Deploy the use case The `helm install` command will initiate all the aforementioned services such as Kubernetes pods. From 5d62c2d3a6b2efe8c5e2a1c4e0acee09afd09728 Mon Sep 17 00:00:00 2001 From: devpramod Date: Tue, 17 Dec 2024 12:10:58 -0500 Subject: [PATCH 19/19] Update k8s_helm.md hardware prereq --- examples/ChatQnA/deploy/k8s_helm.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/examples/ChatQnA/deploy/k8s_helm.md b/examples/ChatQnA/deploy/k8s_helm.md index 3e76827e..c4c650a9 100644 --- a/examples/ChatQnA/deploy/k8s_helm.md +++ b/examples/ChatQnA/deploy/k8s_helm.md @@ -20,6 +20,10 @@ GenAIComps to deploy a multi-node TGI megaservice solution. ## Prerequisites +### Hardware Prerequisites +For cloud deployments, the ChatQnA pipeline in this guide has been tested on an AWS `m7i.8xlarge` single node instance, which provides `32 vCPUs`, `128 GiB` memory and upgraded to `100 GB` of disk space. While the default deployment uses only `~24 GiB` of memory, similar instance types with at least 32 vCPUs and 32 GiB of memory are recommended to ensure smooth performance. + +By switching to bf16 from the default fp32, the memory requirement can be further relaxed. Instructions to switch to bf16 are provided in the [Use Case Setup](#use-case-setup) section below. ### Install Helm First, ensure that Helm (version >= 3.15) is installed on your system. Helm is an essential tool for managing Kubernetes applications. It simplifies the deployment and management of Kubernetes applications using Helm charts. @@ -108,7 +112,7 @@ To use the bfloat16 data type for the LLM in TGI, modify the `values.yaml` file ```yaml extraCmdArgs: ["--dtype","bfloat16"] ``` -This configuration ensures that TGI processes LLM operations in bfloat16 precision, enabling lower-precision computations for improved performance and reduced memory usage. +This configuration ensures that TGI processes LLM operations in bfloat16 precision, enabling lower-precision computations for improved performance and reduced memory usage. Bfloat16 operations are accelerated using Intel® AMX, the built-in AI accelerator on 4th Gen Intel® Xeon® Scalable processors and later. Set the necessary environment variables to setup the use case ```bash