From e0b3b579a3bf5636aaa586816bfa79c34eac73e2 Mon Sep 17 00:00:00 2001 From: kevinintel Date: Wed, 18 Sep 2024 15:21:28 +0800 Subject: [PATCH 1/9] [Doc] doc improvement (#811) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- ChatQnA/README.md | 171 ++++++++++++++++++++++++---------------------- 1 file changed, 90 insertions(+), 81 deletions(-) diff --git a/ChatQnA/README.md b/ChatQnA/README.md index 0efd64725..fa7156ad0 100644 --- a/ChatQnA/README.md +++ b/ChatQnA/README.md @@ -4,8 +4,88 @@ Chatbots are the most widely adopted use case for leveraging the powerful chat a RAG bridges the knowledge gap by dynamically fetching relevant information from external sources, ensuring that responses generated remain factual and current. The core of this architecture are vector databases, which are instrumental in enabling efficient and semantic retrieval of information. These databases store data as vectors, allowing RAG to swiftly access the most pertinent documents or data points based on semantic similarity. -ChatQnA architecture shows below: +## Deploy ChatQnA Service + +The ChatQnA service can be effortlessly deployed on Intel Gaudi2, Intel Xeon Scalable Processors and Nvidia GPU. + +Two types of ChatQnA pipeline are supported now: `ChatQnA with/without Rerank`. And the `ChatQnA without Rerank` pipeline (including Embedding, Retrieval, and LLM) is offered for Xeon customers who can not run rerank service on HPU yet require high performance and accuracy. + +Quick Start Deployment Steps: + +1. Set up the environment variables. +2. Run Docker Compose. +3. Consume the ChatQnA Service. + +### Quick Start: 1.Setup Environment Variable + +To set up environment variables for deploying ChatQnA services, follow these steps: + +1. Set the required environment variables: + + ```bash + # Example: host_ip="192.168.1.1" + export host_ip="External_Public_IP" + # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" + export no_proxy="Your_No_Proxy" + export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + ``` + +2. If you are in a proxy environment, also set the proxy-related environment variables: + + ```bash + export http_proxy="Your_HTTP_Proxy" + export https_proxy="Your_HTTPs_Proxy" + ``` + +3. Set up other environment variables: + + > Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly. + + ```bash + # on Gaudi + source ./docker_compose/intel/hpu/gaudi/set_env.sh + # on Xeon + source ./docker_compose/intel/cpu/xeon/set_env.sh + # on Nvidia GPU + source ./docker_compose/nvidia/gpu/set_env.sh + ``` + +### Quick Start: 2.Run Docker Compose + +Select the compose.yaml file that matches your hardware. +CPU example: + +```bash +cd GenAIExamples/ChatQnA/docker_compose/intel/cpu/xeon/ +# cd GenAIExamples/ChatQnA/docker_compose/intel/hpu/gaudi/ +# cd GenAIExamples/ChatQnA/docker_compose/nvidia/gpu/ +docker compose up -d +``` + +It will automatically download the docker image on `docker hub`: + +```bash +docker pull opea/chatqna:latest +docker pull opea/chatqna-ui:latest +``` + +If you want to build docker by yourself, please refer to `built from source`: [Guide](docker_compose/intel/cpu/xeon/README.md). + +> Note: The optional docker image **opea/chatqna-without-rerank:latest** has not been published yet, users need to build this docker image from source. + +### QuickStart: 3.Consume the ChatQnA Service + +```bash +curl http://${host_ip}:8888/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' +``` + +## Architecture and Deploy details +ChatQnA architecture shows below: ![architecture](./assets/img/chatqna_architecture.png) The ChatQnA example is implemented using the component-level microservices defined in [GenAIComps](https://github.com/opea-project/GenAIComps). The flow chart below shows the information flow between different microservices for this example. @@ -79,59 +159,22 @@ flowchart LR direction TB %% Vector DB interaction - R_RET <-.->VDB - DP <-.->VDB - - - + R_RET <-.->|d|VDB + DP <-.->|d|VDB ``` This ChatQnA use case performs RAG using LangChain, Redis VectorDB and Text Generation Inference on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) or [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html). In the below, we provide a table that describes for each microservice component in the ChatQnA architecture, the default configuration of the open source project, hardware, port, and endpoint. -
-Gaudi default compose.yaml - -| MicroService | Open Source Project | HW | Port | Endpoint | +Gaudi default compose.yaml +| MicroService | Open Source Project | HW | Port | Endpoint | | ------------ | ------------------- | ----- | ---- | -------------------- | -| Embedding | Langchain | Xeon | 6000 | /v1/embaddings | -| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval | -| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking | -| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions | -| Dataprep | Redis, Langchain | Xeon | 6007 | /v1/dataprep | - -
- -## Deploy ChatQnA Service - -The ChatQnA service can be effortlessly deployed on either Intel Gaudi2 or Intel Xeon Scalable Processors. - -Two types of ChatQnA pipeline are supported now: `ChatQnA with/without Rerank`. And the `ChatQnA without Rerank` pipeline (including Embedding, Retrieval, and LLM) is offered for Xeon customers who can not run rerank service on HPU yet require high performance and accuracy. - -### Prepare Docker Image - -Currently we support two ways of deploying ChatQnA services with docker compose: - -1. Using the docker image on `docker hub`: - - ```bash - docker pull opea/chatqna:latest - ``` - - Two type of UI are supported now, choose one you like and pull the referred docker image. - - If you choose conversational UI, follow the [instruction](https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker_compose/intel/hpu/gaudi#-launch-the-conversational-ui-optional) and modify the [compose.yaml](./docker_compose/intel/cpu/xeon/compose.yaml). - - ```bash - docker pull opea/chatqna-ui:latest - # or - docker pull opea/chatqna-conversation-ui:latest - ``` - -2. Using the docker images `built from source`: [Guide](docker_compose/intel/cpu/xeon/README.md) - - > Note: The **opea/chatqna-without-rerank:latest** docker image has not been published yet, users need to build this docker image from source. +| Embedding | Langchain | Xeon | 6000 | /v1/embaddings | +| Retriever | Langchain, Redis | Xeon | 7000 | /v1/retrieval | +| Reranking | Langchain, TEI | Gaudi | 8000 | /v1/reranking | +| LLM | Langchain, TGI | Gaudi | 9000 | /v1/chat/completions | +| Dataprep | Redis, Langchain | Xeon | 6007 | /v1/dataprep | ### Required Models @@ -147,40 +190,6 @@ Change the `xxx_MODEL_ID` in `docker_compose/xxx/set_env.sh` for your needs. For customers with proxy issues, the models from [ModelScope](https://www.modelscope.cn/models) are also supported in ChatQnA. Refer to [this readme](docker_compose/intel/cpu/xeon/README.md) for details. -### Setup Environment Variable - -To set up environment variables for deploying ChatQnA services, follow these steps: - -1. Set the required environment variables: - - ```bash - # Example: host_ip="192.168.1.1" - export host_ip="External_Public_IP" - # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" - export no_proxy="Your_No_Proxy" - export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" - ``` - -2. If you are in a proxy environment, also set the proxy-related environment variables: - - ```bash - export http_proxy="Your_HTTP_Proxy" - export https_proxy="Your_HTTPs_Proxy" - ``` - -3. Set up other environment variables: - - > Notice that you can only choose **one** command below to set up envs according to your hardware. Other that the port numbers may be set incorrectly. - - ```bash - # on Gaudi - source ./docker_compose/intel/hpu/gaudi/set_env.sh - # on Xeon - source ./docker_compose/intel/cpu/xeon/set_env.sh - # on Nvidia GPU - source ./docker_compose/nvidia/gpu/set_env.sh - ``` - ### Deploy ChatQnA on Gaudi Find the corresponding [compose.yaml](./docker_compose/intel/hpu/gaudi/compose.yaml). From 0bb0abb0d32f8b2d848910c37daaefc29d2834ee Mon Sep 17 00:00:00 2001 From: ZePan110 Date: Wed, 18 Sep 2024 16:17:58 +0800 Subject: [PATCH 2/9] Fix issue (#826) Signed-off-by: ZePan110 --- .github/workflows/pr-path-detection.yml | 24 ++++++++++++++++++++---- 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/.github/workflows/pr-path-detection.yml b/.github/workflows/pr-path-detection.yml index 6ad53c0aa..858d913fc 100644 --- a/.github/workflows/pr-path-detection.yml +++ b/.github/workflows/pr-path-detection.yml @@ -94,7 +94,8 @@ jobs: run: | cd ${{github.workspace}} fail="FALSE" - link_head="https://github.com/opea-project/GenAIExamples/blob/main/" + branch="https://github.com/opea-project/GenAIExamples/blob/${{ github.event.pull_request.head.ref }}" + link_head="https://github.com/opea-project/GenAIExamples/blob/main" png_lines=$(grep -Eo '\]\([^)]+\)' -r -I .|grep -Ev 'http') if [ -n "$png_lines" ]; then for png_line in $png_lines; do @@ -102,6 +103,8 @@ jobs: png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1) if [[ "${png_path:0:1}" == "/" ]]; then check_path=${{github.workspace}}$png_path + elif [[ "${png_path:0:1}" == "#" ]]; then + check_path=${{github.workspace}}/$refer_path$png_path else check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path fi @@ -110,7 +113,7 @@ jobs: echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist" fail="TRUE" else - url=$link_head$(echo "$real_path" | sed 's|.*/GenAIExamples/||') + url=$link_head$(echo "$real_path" | sed 's|.*/GenAIExamples||') response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url") if [ "$response" -ne 200 ]; then echo "**********Validation failed, try again**********" @@ -118,8 +121,21 @@ jobs: if [ "$response_retry" -eq 200 ]; then echo "*****Retry successfully*****" else - echo "Invalid link from $check_path: $url" - fail="TRUE" + echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}" + url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIExamples||') + response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev") + if [ "$response" -ne 200 ]; then + echo "**********Validation failed, try again**********" + response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev") + if [ "$response_retry" -eq 200 ]; then + echo "*****Retry successfully*****" + else + echo "Invalid link from $real_path: $url_dev" + fail="TRUE" + fi + else + echo "Check branch ${{ github.event.pull_request.head.ref }} successfully." + fi fi fi fi From 96d5cd912792fcb8d074a133c21dac583acc92c8 Mon Sep 17 00:00:00 2001 From: kevinintel Date: Wed, 18 Sep 2024 17:13:35 +0800 Subject: [PATCH 3/9] Update supported_examples (#825) Signed-off-by: Xinyao Wang Co-authored-by: Xinyao Wang Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- supported_examples.md | 97 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 89 insertions(+), 8 deletions(-) diff --git a/supported_examples.md b/supported_examples.md index fe2965bdf..42a0a60e2 100644 --- a/supported_examples.md +++ b/supported_examples.md @@ -6,13 +6,58 @@ This document introduces the supported examples of GenAIExamples. The supported [ChatQnA](./ChatQnA/README.md) is an example of chatbot for question and answering through retrieval augmented generation (RAG). -| Framework | LLM | Embedding | Vector Database | Serving | HW | Description | -| ------------------------------------------------------------------------------ | ----------------------------------------------------------------- | --------------------------------------------------- | ------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------- | --------------- | ----------- | -| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [NeuralChat-7B](https://huggingface.co/Intel/neural-chat-7b-v3-3) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Redis](https://redis.io/) | [TGI](https://github.com/huggingface/text-generation-inference) [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2/GPU | Chatbot | -| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [NeuralChat-7B](https://huggingface.co/Intel/neural-chat-7b-v3-3) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Chroma](https://www.trychroma.com/) | [TGI](https://github.com/huggingface/text-generation-inference) [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Chatbot | -| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Redis](https://redis.io/) | [TGI](https://github.com/huggingface/text-generation-inference) [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Chatbot | -| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [Mistral-7B](https://huggingface.co/mistralai/Mistral-7B-v0.1) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Qdrant](https://qdrant.tech/) | [TGI](https://github.com/huggingface/text-generation-inference) [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Chatbot | -| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [Qwen2-7B](https://huggingface.co/Qwen/Qwen2-7B) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Redis](https://redis.io/) | [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Chatbot | + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
FrameworkLLMEmbeddingVector DatabaseServingHWDescription
LangChain/LlamaIndex NeuralChat-7B BGE-Base Redis TGI TEI Xeon/Gaudi2/GPU Chatbot
NeuralChat-7B BGE-Base Chroma TGI TEI Xeon/Gaudi2 Chatbot
Mistral-7B BGE-Base Redis TGI TEI Xeon/Gaudi2 Chatbot
Mistral-7B BGE-Base Qdrant TGI TEI Xeon/Gaudi2 Chatbot
Qwen2-7B BGE-Base Redis TGI Xeon/Gaudi2 Chatbot
### CodeGen @@ -101,7 +146,7 @@ The DocRetriever example demonstrates how to match user queries with free-text r | Framework | Embedding | Vector Database | Serving | HW | Description | | ------------------------------------------------------------------------------ | --------------------------------------------------- | -------------------------- | --------------------------------------------------------------- | ----------- | -------------------------- | -| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Redis](https://redis.io/) | [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Document Retrieval Service | +| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [BGE-Base](https://huggingface.co/BAAI/bge-base-en) | [Redis](https://redis.io/) | [TEI](https://github.com/huggingface/text-embeddings-inference) | Xeon/Gaudi2 | Document Retrieval service | ### AgentQnA @@ -110,3 +155,39 @@ The AgentQnA example demonstrates a hierarchical, multi-agent system designed fo Worker agent uses open-source websearch tool (duckduckgo), agents use OpenAI GPT-4o-mini as llm backend. > **_NOTE:_** This example is in active development. The code structure of these use cases are subject to change. + +### AudioQnA + +The AudioQnA example demonstrates the integration of Generative AI (GenAI) models for performing question-answering (QnA) on audio files, with the added functionality of Text-to-Speech (TTS) for generating spoken responses. The example showcases how to convert audio input to text using Automatic Speech Recognition (ASR), generate answers to user queries using a language model, and then convert those answers back to speech using Text-to-Speech (TTS). + + + + + + + + + + + + + + + + +
ASRTTSLLMHWDescription
openai/whisper-small microsoft/SpeechT5 TGI Xeon/Gaudi2 Talkingbot service
+ +### FaqGen + +FAQ Generation Application leverages the power of large language models (LLMs) to revolutionize the way you interact with and comprehend complex textual data. By harnessing cutting-edge natural language processing techniques, our application can automatically generate comprehensive and natural-sounding frequently asked questions (FAQs) from your documents, legal texts, customer queries, and other sources. In this example use case, we utilize LangChain to implement FAQ Generation and facilitate LLM inference using Text Generation Inference on Intel Xeon and Gaudi2 processors. +| Framework | LLM | Serving | HW | Description | +| ------------------------------------------------------------------------------ | ----------------------------------------------------------------- | --------------------------------------------------------------- | ----------- | ----------- | +| [LangChain](https://www.langchain.com)/[LlamaIndex](https://www.llamaindex.ai) | [Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | [TGI](https://github.com/huggingface/text-generation-inference) | Xeon/Gaudi2 | Chatbot | + +### MultimodalQnA + +[MultimodalQnA](./MultimodalQnA/README.md) addresses your questions by dynamically fetching the most pertinent multimodal information (frames, transcripts, and/or captions) from your collection of videos. + +### ProductivitySuite + +[Productivity Suite](./ProductivitySuite/README.md) streamlines your workflow to boost productivity. It leverages the OPEA microservices to provide a comprehensive suite of features to cater to the diverse needs of modern enterprises. From 412a0b00c39db083752d8e54892f45053278726f Mon Sep 17 00:00:00 2001 From: ZePan110 Date: Wed, 18 Sep 2024 20:33:09 +0800 Subject: [PATCH 4/9] Fix other repo issue. (#829) Signed-off-by: ZePan110 --- .github/workflows/pr-path-detection.yml | 10 ++++++++-- GenAIExamples | 1 + 2 files changed, 9 insertions(+), 2 deletions(-) create mode 160000 GenAIExamples diff --git a/.github/workflows/pr-path-detection.yml b/.github/workflows/pr-path-detection.yml index 858d913fc..e45aca0df 100644 --- a/.github/workflows/pr-path-detection.yml +++ b/.github/workflows/pr-path-detection.yml @@ -94,9 +94,15 @@ jobs: run: | cd ${{github.workspace}} fail="FALSE" - branch="https://github.com/opea-project/GenAIExamples/blob/${{ github.event.pull_request.head.ref }}" + repo_name=${{ github.event.pull_request.head.repo.full_name }} + if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then + owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1) + branch="https://github.com/$owner/GenAIExamples/tree/${{ github.event.pull_request.head.ref }}" + else + branch="https://github.com/opea-project/GenAIExamples/blob/${{ github.event.pull_request.head.ref }}" + fi link_head="https://github.com/opea-project/GenAIExamples/blob/main" - png_lines=$(grep -Eo '\]\([^)]+\)' -r -I .|grep -Ev 'http') + png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http') if [ -n "$png_lines" ]; then for png_line in $png_lines; do refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-) diff --git a/GenAIExamples b/GenAIExamples new file mode 160000 index 000000000..de397d104 --- /dev/null +++ b/GenAIExamples @@ -0,0 +1 @@ +Subproject commit de397d104153c2f5538a7d2eefa32b43b2918e43 From 3b70fb0d4216cc18aecfb01bbf902d35c7fd7c8b Mon Sep 17 00:00:00 2001 From: kevinintel Date: Wed, 18 Sep 2024 22:23:22 +0800 Subject: [PATCH 5/9] Refine the quick start of ChatQnA (#828) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- .../docker_compose/intel/cpu/xeon/README.md | 59 ++++++++++++++++++ .../docker_compose/intel/hpu/gaudi/README.md | 60 +++++++++++++++++++ ChatQnA/docker_compose/nvidia/gpu/README.md | 60 +++++++++++++++++++ 3 files changed, 179 insertions(+) diff --git a/ChatQnA/docker_compose/intel/cpu/xeon/README.md b/ChatQnA/docker_compose/intel/cpu/xeon/README.md index 4868a5ec0..7eb75431a 100644 --- a/ChatQnA/docker_compose/intel/cpu/xeon/README.md +++ b/ChatQnA/docker_compose/intel/cpu/xeon/README.md @@ -2,6 +2,65 @@ This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Xeon server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as `embedding`, `retriever`, `rerank`, and `llm`. We will publish the Docker images to Docker Hub soon, it will simplify the deployment process for this service. +Quick Start: + +1. Set up the environment variables. +2. Run Docker Compose. +3. Consume the ChatQnA Service. + +## Quick Start: 1.Setup Environment Variable + +To set up environment variables for deploying ChatQnA services, follow these steps: + +1. Set the required environment variables: + + ```bash + # Example: host_ip="192.168.1.1" + export host_ip="External_Public_IP" + # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" + export no_proxy="Your_No_Proxy" + export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + ``` + +2. If you are in a proxy environment, also set the proxy-related environment variables: + + ```bash + export http_proxy="Your_HTTP_Proxy" + export https_proxy="Your_HTTPs_Proxy" + ``` + +3. Set up other environment variables: + ```bash + source ./set_env.sh + ``` + +## Quick Start: 2.Run Docker Compose + +```bash +docker compose up -d +``` + +It will automatically download the docker image on `docker hub`: + +```bash +docker pull opea/chatqna:latest +docker pull opea/chatqna-ui:latest +``` + +If you want to build docker by yourself, please refer to 'Build Docker Images' in below. + +> Note: The optional docker image **opea/chatqna-without-rerank:latest** has not been published yet, users need to build this docker image from source. + +## QuickStart: 3.Consume the ChatQnA Service + +```bash +curl http://${host_ip}:8888/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' +``` + ## πŸš€ Apply Xeon Server on AWS To apply a Xeon server on AWS, start by creating an AWS account if you don't have one already. Then, head to the [EC2 Console](https://console.aws.amazon.com/ec2/v2/home) to begin the process. Within the EC2 service, select the Amazon EC2 M7i or M7i-flex instance type to leverage 4th Generation Intel Xeon Scalable processors that are optimized for demanding workloads. diff --git a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md index 03f5229d4..bc41c782a 100644 --- a/ChatQnA/docker_compose/intel/hpu/gaudi/README.md +++ b/ChatQnA/docker_compose/intel/hpu/gaudi/README.md @@ -2,6 +2,66 @@ This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on Intel Gaudi server. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service. +Quick Start: + +1. Set up the environment variables. +2. Run Docker Compose. +3. Consume the ChatQnA Service. + +## Quick Start: 1.Setup Environment Variable + +To set up environment variables for deploying ChatQnA services, follow these steps: + +1. Set the required environment variables: + + ```bash + # Example: host_ip="192.168.1.1" + export host_ip="External_Public_IP" + # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" + export no_proxy="Your_No_Proxy" + export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + ``` + +2. If you are in a proxy environment, also set the proxy-related environment variables: + + ```bash + export http_proxy="Your_HTTP_Proxy" + export https_proxy="Your_HTTPs_Proxy" + ``` + +3. Set up other environment variables: + + ```bash + source ./set_env.sh + ``` + +## Quick Start: 2.Run Docker Compose + +```bash +docker compose up -d +``` + +It will automatically download the docker image on `docker hub`: + +```bash +docker pull opea/chatqna:latest +docker pull opea/chatqna-ui:latest +``` + +If you want to build docker by yourself, please refer to 'Build Docker Images' in below. + +> Note: The optional docker image **opea/chatqna-without-rerank:latest** has not been published yet, users need to build this docker image from source. + +## QuickStart: 3.Consume the ChatQnA Service + +```bash +curl http://${host_ip}:8888/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' +``` + ## πŸš€ Build Docker Images First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub. diff --git a/ChatQnA/docker_compose/nvidia/gpu/README.md b/ChatQnA/docker_compose/nvidia/gpu/README.md index 17b7dfd5e..cfdda158f 100644 --- a/ChatQnA/docker_compose/nvidia/gpu/README.md +++ b/ChatQnA/docker_compose/nvidia/gpu/README.md @@ -2,6 +2,66 @@ This document outlines the deployment process for a ChatQnA application utilizing the [GenAIComps](https://github.com/opea-project/GenAIComps.git) microservice pipeline on NVIDIA GPU platform. The steps include Docker image creation, container deployment via Docker Compose, and service execution to integrate microservices such as embedding, retriever, rerank, and llm. We will publish the Docker images to Docker Hub, it will simplify the deployment process for this service. +Quick Start Deployment Steps: + +1. Set up the environment variables. +2. Run Docker Compose. +3. Consume the ChatQnA Service. + +## Quick Start: 1.Setup Environment Variable + +To set up environment variables for deploying ChatQnA services, follow these steps: + +1. Set the required environment variables: + + ```bash + # Example: host_ip="192.168.1.1" + export host_ip="External_Public_IP" + # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" + export no_proxy="Your_No_Proxy" + export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + ``` + +2. If you are in a proxy environment, also set the proxy-related environment variables: + + ```bash + export http_proxy="Your_HTTP_Proxy" + export https_proxy="Your_HTTPs_Proxy" + ``` + +3. Set up other environment variables: + + ```bash + source ./set_env.sh + ``` + +## Quick Start: 2.Run Docker Compose + +```bash +docker compose up -d +``` + +It will automatically download the docker image on `docker hub`: + +```bash +docker pull opea/chatqna:latest +docker pull opea/chatqna-ui:latest +``` + +If you want to build docker by yourself, please refer to 'Build Docker Images' in below. + +> Note: The optional docker image **opea/chatqna-without-rerank:latest** has not been published yet, users need to build this docker image from source. + +## QuickStart: 3.Consume the ChatQnA Service + +```bash +curl http://${host_ip}:8888/v1/chatqna \ + -H "Content-Type: application/json" \ + -d '{ + "messages": "What is the revenue of Nike in 2023?" + }' +``` + ## πŸš€ Build Docker Images First of all, you need to build Docker Images locally. This step can be ignored after the Docker images published to Docker hub. From b205dc7571f00bed1a5c8964f8b8a923587c0b9b Mon Sep 17 00:00:00 2001 From: Ying Hu Date: Wed, 18 Sep 2024 23:25:05 +0800 Subject: [PATCH 6/9] Update README.md for Multiplatforms (#834) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- MultimodalQnA/README.md | 4 +++- Translation/README.md | 2 +- VisualQnA/README.md | 4 +++- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/MultimodalQnA/README.md b/MultimodalQnA/README.md index fe6d1fd9a..022455042 100644 --- a/MultimodalQnA/README.md +++ b/MultimodalQnA/README.md @@ -91,7 +91,9 @@ flowchart LR ``` -This MultimodalQnA use case performs Multimodal-RAG using LangChain, Redis VectorDB and Text Generation Inference on Intel Gaudi2 or Intel Xeon Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details. +This MultimodalQnA use case performs Multimodal-RAG using LangChain, Redis VectorDB and Text Generation Inference on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand the example. + +The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details. In the below, we provide a table that describes for each microservice component in the MultimodalQnA architecture, the default configuration of the open source project, hardware, port, and endpoint. diff --git a/Translation/README.md b/Translation/README.md index 37bfdd902..2df513baa 100644 --- a/Translation/README.md +++ b/Translation/README.md @@ -6,7 +6,7 @@ Translation architecture shows below: ![architecture](./assets/img/translation_architecture.png) -This Translation use case performs Language Translation Inference on Intel Gaudi2 or Intel Xeon Scalable Processors. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products/) for more details. +This Translation use case performs Language Translation Inference across multiple platforms. Currently, we provide the example for [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html), and we invite contributions from other hardware vendors to expand OPEA ecosystem. ## Deploy Translation Service diff --git a/VisualQnA/README.md b/VisualQnA/README.md index 3fe738754..d5f5c646b 100644 --- a/VisualQnA/README.md +++ b/VisualQnA/README.md @@ -13,7 +13,9 @@ General architecture of VQA shows below: ![VQA](./assets/img/vqa.png) -This example guides you through how to deploy a [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) (Open Large Multimodal Models) model on Intel Gaudi2 to do visual question and answering task. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products/) for more details. +This example guides you through how to deploy a [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) (Open Large Multimodal Models) model [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html). We invite contributions from other hardware vendors to expand OPEA ecosystem. + +The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details. ![llava screenshot](./assets/img/llava_screenshot1.png) ![llava-screenshot](./assets/img/llava_screenshot2.png) From 1e130314d93ab481deb5be58485aa5a2ec6ebcef Mon Sep 17 00:00:00 2001 From: Letong Han <106566639+letonghan@users.noreply.github.com> Date: Thu, 19 Sep 2024 07:08:13 +0800 Subject: [PATCH 7/9] [Translation] Support manifests and nginx (#812) Signed-off-by: letonghan Signed-off-by: root Co-authored-by: root Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- .github/CODEOWNERS | 4 +- README.md | 2 +- .../docker_compose/intel/cpu/xeon/README.md | 61 ++- .../intel/cpu/xeon/compose.yaml | 30 +- .../docker_compose/intel/hpu/gaudi/README.md | 63 ++- .../intel/hpu/gaudi/compose.yaml | 22 +- Translation/docker_compose/set_env.sh | 18 + Translation/docker_image_build/build.yaml | 6 + Translation/kubernetes/intel/README.md | 41 ++ .../intel/cpu/xeon/manifest/translation.yaml | 495 +++++++++++++++++ .../intel/hpu/gaudi/manifest/translation.yaml | 497 ++++++++++++++++++ Translation/tests/test_compose_on_gaudi.sh | 18 +- Translation/tests/test_compose_on_xeon.sh | 21 +- Translation/tests/test_manifest_on_gaudi.sh | 91 ++++ Translation/tests/test_manifest_on_xeon.sh | 90 ++++ 15 files changed, 1422 insertions(+), 37 deletions(-) mode change 100644 => 100755 .github/CODEOWNERS create mode 100644 Translation/docker_compose/set_env.sh create mode 100644 Translation/kubernetes/intel/README.md create mode 100644 Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml create mode 100644 Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml create mode 100755 Translation/tests/test_manifest_on_gaudi.sh create mode 100755 Translation/tests/test_manifest_on_xeon.sh diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS old mode 100644 new mode 100755 index 5853274a1..3a6070efd --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -3,10 +3,10 @@ /ChatQnA/ liang1.lv@intel.com /CodeGen/ liang1.lv@intel.com /CodeTrans/ sihan.chen@intel.com -/DocSum/ sihan.chen@intel.com +/DocSum/ letong.han@intel.com /DocIndexRetriever/ xuhui.ren@intel.com chendi.xue@intel.com /FaqGen/ xinyao.wang@intel.com -/SearchQnA/ letong.han@intel.com +/SearchQnA/ sihan.chen@intel.com /Translation/ liang1.lv@intel.com /VisualQnA/ liang1.lv@intel.com /ProductivitySuite/ hoong.tee.yeoh@intel.com diff --git a/README.md b/README.md index 5a168648b..cbcabbe2f 100644 --- a/README.md +++ b/README.md @@ -45,7 +45,7 @@ Deployment are based on released docker images by default, check [docker image l | DocSum | [Xeon Instructions](DocSum/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](DocSum/docker_compose/intel/hpu/gaudi/README.md) | [DocSum with Manifests](DocSum/kubernetes/intel/README.md) | [DocSum with Helm Charts](https://github.com/opea-project/GenAIInfra/tree/main/helm-charts/docsum/README.md) | [DocSum with GMC](DocSum/kubernetes/intel/README_gmc.md) | | SearchQnA | [Xeon Instructions](SearchQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](SearchQnA/docker_compose/intel/hpu/gaudi/README.md) | Not Supported | Not Supported | [SearchQnA with GMC](SearchQnA/kubernetes/intel/README_gmc.md) | | FaqGen | [Xeon Instructions](FaqGen/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](FaqGen/docker_compose/intel/hpu/gaudi/README.md) | [FaqGen with Manifests](FaqGen/kubernetes/intel/README.md) | Not Supported | [FaqGen with GMC](FaqGen/kubernetes/intel/README_gmc.md) | -| Translation | [Xeon Instructions](Translation/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](Translation/docker_compose/intel/hpu/gaudi/README.md) | Not Supported | Not Supported | [Translation with GMC](Translation/kubernetes/intel/README_gmc.md) | +| Translation | [Xeon Instructions](Translation/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](Translation/docker_compose/intel/hpu/gaudi/README.md) | [Translation with Manifests](Translation/kubernetes/intel/README.md) | Not Supported | [Translation with GMC](Translation/kubernetes/intel/README_gmc.md) | | AudioQnA | [Xeon Instructions](AudioQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](AudioQnA/docker_compose/intel/hpu/gaudi/README.md) | [AudioQnA with Manifests](AudioQnA/kubernetes/intel/README.md) | Not Supported | [AudioQnA with GMC](AudioQnA/kubernetes/intel/README_gmc.md) | | VisualQnA | [Xeon Instructions](VisualQnA/docker_compose/intel/cpu/xeon/README.md) | [Gaudi Instructions](VisualQnA/docker_compose/intel/hpu/gaudi/README.md) | [VisualQnA with Manifests](VisualQnA/kubernetes/intel/README.md) | Not Supported | [VisualQnA with GMC](VisualQnA/kubernetes/intel/README_gmc.md) | | ProductivitySuite | [Xeon Instructions](ProductivitySuite/docker_compose/intel/cpu/xeon/README.md) | Not Supported | [ProductivitySuite with Manifests](ProductivitySuite/kubernetes/intel/README.md) | Not Supported | Not Supported | diff --git a/Translation/docker_compose/intel/cpu/xeon/README.md b/Translation/docker_compose/intel/cpu/xeon/README.md index 31e6e9654..306f8e35d 100644 --- a/Translation/docker_compose/intel/cpu/xeon/README.md +++ b/Translation/docker_compose/intel/cpu/xeon/README.md @@ -41,30 +41,59 @@ cd GenAIExamples/Translation/ui docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f docker/Dockerfile . ``` +### 4. Build Nginx Docker Image + +```bash +cd GenAIComps +docker build -t opea/translation-nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile . +``` + Then run the command `docker images`, you will have the following Docker Images: 1. `opea/llm-tgi:latest` 2. `opea/translation:latest` 3. `opea/translation-ui:latest` +4. `opea/translation-nginx:latest` ## πŸš€ Start Microservices +### Required Models + +By default, the LLM model is set to a default value as listed below: + +| Service | Model | +| ------- | ----------------- | +| LLM | haoranxu/ALMA-13B | + +Change the `LLM_MODEL_ID` below for your needs. + ### Setup Environment Variables -Since the `compose.yaml` will consume some environment variables, you need to set up them in advance as below. +1. Set the required environment variables: -```bash -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -export LLM_MODEL_ID="haoranxu/ALMA-13B" -export TGI_LLM_ENDPOINT="http://${host_ip}:8008" -export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} -export MEGA_SERVICE_HOST_IP=${host_ip} -export LLM_SERVICE_HOST_IP=${host_ip} -export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/translation" -``` + ```bash + # Example: host_ip="192.168.1.1" + export host_ip="External_Public_IP" + # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" + export no_proxy="Your_No_Proxy" + export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + # Example: NGINX_PORT=80 + export NGINX_PORT=${your_nginx_port} + ``` + +2. If you are in a proxy environment, also set the proxy-related environment variables: + + ```bash + export http_proxy="Your_HTTP_Proxy" + export https_proxy="Your_HTTPs_Proxy" + ``` + +3. Set up other environment variables: -Note: Please replace with `host_ip` with you external IP address, do not use localhost. + ```bash + cd ../../../ + source set_env.sh + ``` ### Start Microservice Docker Containers @@ -99,6 +128,14 @@ docker compose up -d "language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' ``` +4. Nginx Service + + ```bash + curl http://${host_ip}:${NGINX_PORT}/v1/translation \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' + ``` + Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. ## πŸš€ Launch the UI diff --git a/Translation/docker_compose/intel/cpu/xeon/compose.yaml b/Translation/docker_compose/intel/cpu/xeon/compose.yaml index 4ba224bf3..e8eafca4f 100644 --- a/Translation/docker_compose/intel/cpu/xeon/compose.yaml +++ b/Translation/docker_compose/intel/cpu/xeon/compose.yaml @@ -8,10 +8,12 @@ services: ports: - "8008:80" environment: + no_proxy: ${no_proxy} http_proxy: ${http_proxy} https_proxy: ${https_proxy} - TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT} - HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 volumes: - "./data:/data" shm_size: 1g @@ -25,10 +27,13 @@ services: - "9000:9000" ipc: host environment: + no_proxy: ${no_proxy} http_proxy: ${http_proxy} https_proxy: ${https_proxy} TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT} HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 restart: unless-stopped translation-xeon-backend-server: image: ${REGISTRY:-opea}/translation:${TAG:-latest} @@ -39,6 +44,7 @@ services: ports: - "8888:8888" environment: + - no_proxy=${no_proxy} - https_proxy=${https_proxy} - http_proxy=${http_proxy} - MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP} @@ -53,11 +59,31 @@ services: ports: - "5173:5173" environment: + - no_proxy=${no_proxy} - https_proxy=${https_proxy} - http_proxy=${http_proxy} - BASE_URL=${BACKEND_SERVICE_ENDPOINT} ipc: host restart: always + translation-xeon-nginx-server: + image: ${REGISTRY:-opea}/translation-nginx:${TAG:-latest} + container_name: translation-xeon-nginx-server + depends_on: + - translation-xeon-backend-server + - translation-xeon-ui-server + ports: + - "${NGINX_PORT:-80}:80" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP} + - FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT} + - BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME} + - BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP} + - BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT} + ipc: host + restart: always networks: default: driver: bridge diff --git a/Translation/docker_compose/intel/hpu/gaudi/README.md b/Translation/docker_compose/intel/hpu/gaudi/README.md index 1f8f82837..9f234496c 100644 --- a/Translation/docker_compose/intel/hpu/gaudi/README.md +++ b/Translation/docker_compose/intel/hpu/gaudi/README.md @@ -29,34 +29,63 @@ docker build -t opea/translation:latest --build-arg https_proxy=$https_proxy --b Construct the frontend Docker image using the command below: ```bash -cd GenAIExamples/Translation +cd GenAIExamples/Translation/ui/ docker build -t opea/translation-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . ``` +### 4. Build Nginx Docker Image + +```bash +cd GenAIComps +docker build -t opea/translation-nginx:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/nginx/Dockerfile . +``` + Then run the command `docker images`, you will have the following four Docker Images: 1. `opea/llm-tgi:latest` 2. `opea/translation:latest` 3. `opea/translation-ui:latest` +4. `opea/translation-nginx:latest` ## πŸš€ Start Microservices +### Required Models + +By default, the LLM model is set to a default value as listed below: + +| Service | Model | +| ------- | ----------------- | +| LLM | haoranxu/ALMA-13B | + +Change the `LLM_MODEL_ID` below for your needs. + ### Setup Environment Variables -Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below. +1. Set the required environment variables: -```bash -export http_proxy=${your_http_proxy} -export https_proxy=${your_http_proxy} -export LLM_MODEL_ID="haoranxu/ALMA-13B" -export TGI_LLM_ENDPOINT="http://${host_ip}:8008" -export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} -export MEGA_SERVICE_HOST_IP=${host_ip} -export LLM_SERVICE_HOST_IP=${host_ip} -export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/translation" -``` + ```bash + # Example: host_ip="192.168.1.1" + export host_ip="External_Public_IP" + # Example: no_proxy="localhost, 127.0.0.1, 192.168.1.1" + export no_proxy="Your_No_Proxy" + export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" + # Example: NGINX_PORT=80 + export NGINX_PORT=${your_nginx_port} + ``` + +2. If you are in a proxy environment, also set the proxy-related environment variables: + + ```bash + export http_proxy="Your_HTTP_Proxy" + export https_proxy="Your_HTTPs_Proxy" + ``` + +3. Set up other environment variables: -Note: Please replace with `host_ip` with you external IP address, do not use localhost. + ```bash + cd ../../../ + source set_env.sh + ``` ### Start Microservice Docker Containers @@ -91,6 +120,14 @@ docker compose up -d "language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' ``` +4. Nginx Service + + ```bash + curl http://${host_ip}:${NGINX_PORT}/v1/translation \ + -H "Content-Type: application/json" \ + -d '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' + ``` + Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. ## πŸš€ Launch the UI diff --git a/Translation/docker_compose/intel/hpu/gaudi/compose.yaml b/Translation/docker_compose/intel/hpu/gaudi/compose.yaml index 32dbfdc3e..6eefd6492 100644 --- a/Translation/docker_compose/intel/hpu/gaudi/compose.yaml +++ b/Translation/docker_compose/intel/hpu/gaudi/compose.yaml @@ -10,7 +10,6 @@ services: environment: http_proxy: ${http_proxy} https_proxy: ${https_proxy} - TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT} HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} HF_HUB_DISABLE_PROGRESS_BARS: 1 HF_HUB_ENABLE_HF_TRANSFER: 0 @@ -36,6 +35,8 @@ services: https_proxy: ${https_proxy} TGI_LLM_ENDPOINT: ${TGI_LLM_ENDPOINT} HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + HF_HUB_DISABLE_PROGRESS_BARS: 1 + HF_HUB_ENABLE_HF_TRANSFER: 0 restart: unless-stopped translation-gaudi-backend-server: image: ${REGISTRY:-opea}/translation:${TAG:-latest} @@ -65,6 +66,25 @@ services: - BASE_URL=${BACKEND_SERVICE_ENDPOINT} ipc: host restart: always + translation-gaudi-nginx-server: + image: ${REGISTRY:-opea}/translation-nginx:${TAG:-latest} + container_name: translation-gaudi-nginx-server + depends_on: + - translation-gaudi-backend-server + - translation-gaudi-ui-server + ports: + - "${NGINX_PORT:-80}:80" + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - FRONTEND_SERVICE_IP=${FRONTEND_SERVICE_IP} + - FRONTEND_SERVICE_PORT=${FRONTEND_SERVICE_PORT} + - BACKEND_SERVICE_NAME=${BACKEND_SERVICE_NAME} + - BACKEND_SERVICE_IP=${BACKEND_SERVICE_IP} + - BACKEND_SERVICE_PORT=${BACKEND_SERVICE_PORT} + ipc: host + restart: always networks: default: diff --git a/Translation/docker_compose/set_env.sh b/Translation/docker_compose/set_env.sh new file mode 100644 index 000000000..c82c8d360 --- /dev/null +++ b/Translation/docker_compose/set_env.sh @@ -0,0 +1,18 @@ +#!/usr/bin/env bash + +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + + +export LLM_MODEL_ID="haoranxu/ALMA-13B" +export TGI_LLM_ENDPOINT="http://${host_ip}:8008" +export HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} +export MEGA_SERVICE_HOST_IP=${host_ip} +export LLM_SERVICE_HOST_IP=${host_ip} +export BACKEND_SERVICE_ENDPOINT="http://${host_ip}:8888/v1/translation" +export NGINX_PORT=80 +export FRONTEND_SERVICE_IP=${host_ip} +export FRONTEND_SERVICE_PORT=5173 +export BACKEND_SERVICE_NAME=translation +export BACKEND_SERVICE_IP=${host_ip} +export BACKEND_SERVICE_PORT=8888 diff --git a/Translation/docker_image_build/build.yaml b/Translation/docker_image_build/build.yaml index b326b125b..a1562060b 100644 --- a/Translation/docker_image_build/build.yaml +++ b/Translation/docker_image_build/build.yaml @@ -23,3 +23,9 @@ services: dockerfile: comps/llms/text-generation/tgi/Dockerfile extends: translation image: ${REGISTRY:-opea}/llm-tgi:${TAG:-latest} + nginx: + build: + context: GenAIComps + dockerfile: comps/nginx/Dockerfile + extends: translation + image: ${REGISTRY:-opea}/translation-nginx:${TAG:-latest} diff --git a/Translation/kubernetes/intel/README.md b/Translation/kubernetes/intel/README.md new file mode 100644 index 000000000..7ca89d372 --- /dev/null +++ b/Translation/kubernetes/intel/README.md @@ -0,0 +1,41 @@ +# Deploy Translation in Kubernetes Cluster + +> [NOTE] +> The following values must be set before you can deploy: +> HUGGINGFACEHUB_API_TOKEN +> +> You can also customize the "MODEL_ID" if needed. +> +> You need to make sure you have created the directory `/mnt/opea-models` to save the cached model on the node where the Translation workload is running. Otherwise, you need to modify the `translation.yaml` file to change the `model-volume` to a directory that exists on the node. + +## Deploy On Xeon + +``` +cd GenAIExamples/Translation/kubernetes/intel/cpu/xeon/manifests +export HUGGINGFACEHUB_API_TOKEN="YourOwnToken" +sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" translation.yaml +kubectl apply -f translation.yaml +``` + +## Deploy On Gaudi + +``` +cd GenAIExamples/Translation/kubernetes/intel/hpu/gaudi/manifests +export HUGGINGFACEHUB_API_TOKEN="YourOwnToken" +sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" translation.yaml +kubectl apply -f translation.yaml +``` + +## Verify Services + +To verify the installation, run the command `kubectl get pod` to make sure all pods are running. + +Then run the command `kubectl port-forward svc/translation 8888:8888` to expose the Translation service for access. + +Open another terminal and run the following command to verify the service if working: + +```console +curl http://localhost:8888/v1/translation \ + -H 'Content-Type: application/json' \ + -d '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' +``` diff --git a/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml b/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml new file mode 100644 index 000000000..e30fee338 --- /dev/null +++ b/Translation/kubernetes/intel/cpu/xeon/manifest/translation.yaml @@ -0,0 +1,495 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: translation-tgi-config + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "2.1.0" +data: + LLM_MODEL_ID: "haoranxu/ALMA-13B" + PORT: "2080" + HF_TOKEN: "insert-your-huggingface-token-here" + http_proxy: "" + https_proxy: "" + no_proxy: "" + HABANA_LOGS: "/tmp/habana_logs" + NUMBA_CACHE_DIR: "/tmp" + HF_HOME: "/tmp/.cache/huggingface" + CUDA_GRAPHS: "0" +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: translation-llm-uservice-config + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +data: + TGI_LLM_ENDPOINT: "http://translation-tgi" + HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here" + http_proxy: "" + https_proxy: "" + no_proxy: "" + LOGFLAG: "" +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: translation-ui-config + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +data: + BASE_URL: "/v1/translation" +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +data: + default.conf: |+ + # Copyright (C) 2024 Intel Corporation + # SPDX-License-Identifier: Apache-2.0 + + + server { + listen 80; + listen [::]:80; + + location /home { + alias /usr/share/nginx/html/index.html; + } + + location / { + proxy_pass http://translation-ui:5173; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/translation { + proxy_pass http://translation:8888; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + } + +kind: ConfigMap +metadata: + name: translation-nginx-config +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation-ui + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + type: ClusterIP + ports: + - port: 5173 + targetPort: ui + protocol: TCP + name: ui + selector: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation-llm-uservice + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + type: ClusterIP + ports: + - port: 9000 + targetPort: 9000 + protocol: TCP + name: llm-uservice + selector: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation-tgi + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "2.1.0" +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2080 + protocol: TCP + name: tgi + selector: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation +--- +apiVersion: v1 +kind: Service +metadata: + name: translation-nginx +spec: + ports: + - port: 80 + protocol: TCP + targetPort: 80 + selector: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation-nginx + type: NodePort +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + type: ClusterIP + ports: + - port: 8888 + targetPort: 8888 + protocol: TCP + name: translation + selector: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-ui + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + template: + metadata: + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" + spec: + securityContext: + {} + containers: + - name: translation-ui + envFrom: + - configMapRef: + name: translation-ui-config + securityContext: + {} + image: "opea/translation-ui:latest" + imagePullPolicy: IfNotPresent + ports: + - name: ui + containerPort: 80 + protocol: TCP + resources: + {} + volumeMounts: + - mountPath: /tmp + name: tmp + volumes: + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-llm-uservice + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + template: + metadata: + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + spec: + securityContext: + {} + containers: + - name: translation + envFrom: + - configMapRef: + name: translation-llm-uservice-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: false + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/llm-tgi:latest" + imagePullPolicy: IfNotPresent + ports: + - name: llm-uservice + containerPort: 9000 + protocol: TCP + volumeMounts: + - mountPath: /tmp + name: tmp + livenessProbe: + failureThreshold: 24 + httpGet: + path: v1/health_check + port: llm-uservice + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: v1/health_check + port: llm-uservice + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: v1/health_check + port: llm-uservice + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + {} + volumes: + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-tgi + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "2.1.0" +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + template: + metadata: + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + spec: + securityContext: + {} + containers: + - name: tgi + envFrom: + - configMapRef: + name: translation-tgi-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu" + imagePullPolicy: IfNotPresent + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2080 + protocol: TCP + livenessProbe: + failureThreshold: 24 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + readinessProbe: + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + startupProbe: + failureThreshold: 120 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + resources: + {} + volumes: + - name: model-volume + emptyDir: {} + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" + app: translation +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation + template: + metadata: + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation + spec: + securityContext: + null + containers: + - name: translation + env: + - name: LLM_SERVICE_HOST_IP + value: translation-llm-uservice + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/translation:latest" + imagePullPolicy: IfNotPresent + volumeMounts: + - mountPath: /tmp + name: tmp + ports: + - name: translation + containerPort: 8888 + protocol: TCP + resources: + null + volumes: + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-nginx + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" + app: translation-nginx +spec: + selector: + matchLabels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation-nginx + template: + metadata: + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation-nginx + spec: + containers: + - image: nginx:1.27.1 + imagePullPolicy: IfNotPresent + name: nginx + volumeMounts: + - mountPath: /etc/nginx/conf.d + name: nginx-config-volume + securityContext: {} + volumes: + - configMap: + defaultMode: 420 + name: translation-nginx-config + name: nginx-config-volume diff --git a/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml b/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml new file mode 100644 index 000000000..52d6c9b10 --- /dev/null +++ b/Translation/kubernetes/intel/hpu/gaudi/manifest/translation.yaml @@ -0,0 +1,497 @@ +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: translation-tgi-config + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "2.1.0" +data: + LLM_MODEL_ID: "haoranxu/ALMA-13B" + PORT: "2080" + HF_TOKEN: "insert-your-huggingface-token-here" + http_proxy: "" + https_proxy: "" + no_proxy: "" + HABANA_LOGS: "/tmp/habana_logs" + NUMBA_CACHE_DIR: "/tmp" + HF_HOME: "/tmp/.cache/huggingface" + MAX_INPUT_LENGTH: "1024" + MAX_TOTAL_TOKENS: "2048" +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: translation-llm-uservice-config + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +data: + TGI_LLM_ENDPOINT: "http://translation-tgi" + HUGGINGFACEHUB_API_TOKEN: "insert-your-huggingface-token-here" + http_proxy: "" + https_proxy: "" + no_proxy: "" + LOGFLAG: "" +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: ConfigMap +metadata: + name: translation-ui-config + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +data: + BASE_URL: "/v1/translation" +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +data: + default.conf: |+ + # Copyright (C) 2024 Intel Corporation + # SPDX-License-Identifier: Apache-2.0 + + + server { + listen 80; + listen [::]:80; + + location /home { + alias /usr/share/nginx/html/index.html; + } + + location / { + proxy_pass http://translation-ui:5173; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + + location /v1/translation { + proxy_pass http://translation; + proxy_set_header Host $host; + proxy_set_header X-Real-IP $remote_addr; + proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; + proxy_set_header X-Forwarded-Proto $scheme; + } + } + +kind: ConfigMap +metadata: + name: translation-nginx-config +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation-ui + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + type: ClusterIP + ports: + - port: 5173 + targetPort: ui + protocol: TCP + name: ui + selector: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation-llm-uservice + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + type: ClusterIP + ports: + - port: 9000 + targetPort: 9000 + protocol: TCP + name: llm-uservice + selector: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation-tgi + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "2.1.0" +spec: + type: ClusterIP + ports: + - port: 80 + targetPort: 2080 + protocol: TCP + name: tgi + selector: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation +--- +apiVersion: v1 +kind: Service +metadata: + name: translation-nginx +spec: + ports: + - port: 80 + protocol: TCP + targetPort: 80 + selector: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation-nginx + type: NodePort +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: v1 +kind: Service +metadata: + name: translation + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + type: ClusterIP + ports: + - port: 8888 + targetPort: 8888 + protocol: TCP + name: translation + selector: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-ui + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + template: + metadata: + labels: + app.kubernetes.io/name: translation-ui + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" + spec: + securityContext: + {} + containers: + - name: translation-ui + envFrom: + - configMapRef: + name: translation-ui-config + securityContext: + {} + image: "opea/translation-ui:latest" + imagePullPolicy: IfNotPresent + ports: + - name: ui + containerPort: 80 + protocol: TCP + resources: + {} + volumeMounts: + - mountPath: /tmp + name: tmp + volumes: + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-llm-uservice + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + template: + metadata: + labels: + app.kubernetes.io/name: llm-uservice + app.kubernetes.io/instance: translation + spec: + securityContext: + {} + containers: + - name: translation + envFrom: + - configMapRef: + name: translation-llm-uservice-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: false + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/llm-tgi:latest" + imagePullPolicy: IfNotPresent + ports: + - name: llm-uservice + containerPort: 9000 + protocol: TCP + volumeMounts: + - mountPath: /tmp + name: tmp + livenessProbe: + failureThreshold: 24 + httpGet: + path: v1/health_check + port: llm-uservice + initialDelaySeconds: 5 + periodSeconds: 5 + readinessProbe: + httpGet: + path: v1/health_check + port: llm-uservice + initialDelaySeconds: 5 + periodSeconds: 5 + startupProbe: + failureThreshold: 120 + httpGet: + path: v1/health_check + port: llm-uservice + initialDelaySeconds: 5 + periodSeconds: 5 + resources: + {} + volumes: + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-tgi + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "2.1.0" +spec: + # use explicit replica counts only of HorizontalPodAutoscaler is disabled + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + template: + metadata: + labels: + app.kubernetes.io/name: tgi + app.kubernetes.io/instance: translation + spec: + securityContext: + {} + containers: + - name: tgi + envFrom: + - configMapRef: + name: translation-tgi-config + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "ghcr.io/huggingface/tgi-gaudi:2.0.1" + imagePullPolicy: IfNotPresent + volumeMounts: + - mountPath: /data + name: model-volume + - mountPath: /tmp + name: tmp + ports: + - name: http + containerPort: 2080 + protocol: TCP + livenessProbe: + failureThreshold: 24 + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + readinessProbe: + initialDelaySeconds: 5 + periodSeconds: 5 + tcpSocket: + port: http + startupProbe: + failureThreshold: 120 + initialDelaySeconds: 20 + periodSeconds: 5 + tcpSocket: + port: http + resources: + limits: + habana.ai/gaudi: 1 + volumes: + - name: model-volume + emptyDir: {} + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" + app: translation +spec: + replicas: 1 + selector: + matchLabels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation + template: + metadata: + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation + spec: + securityContext: + null + containers: + - name: translation + env: + - name: LLM_SERVICE_HOST_IP + value: translation-llm-uservice + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + readOnlyRootFilesystem: true + runAsNonRoot: true + runAsUser: 1000 + seccompProfile: + type: RuntimeDefault + image: "opea/translation:latest" + imagePullPolicy: IfNotPresent + volumeMounts: + - mountPath: /tmp + name: tmp + ports: + - name: translation + containerPort: 8888 + protocol: TCP + resources: + null + volumes: + - name: tmp + emptyDir: {} +--- +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +apiVersion: apps/v1 +kind: Deployment +metadata: + name: translation-nginx + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app.kubernetes.io/version: "v1.0" + app: translation-nginx +spec: + selector: + matchLabels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation-nginx + template: + metadata: + labels: + app.kubernetes.io/name: translation + app.kubernetes.io/instance: translation + app: translation-nginx + spec: + containers: + - image: nginx:1.27.1 + imagePullPolicy: IfNotPresent + name: nginx + volumeMounts: + - mountPath: /etc/nginx/conf.d + name: nginx-config-volume + securityContext: {} + volumes: + - configMap: + defaultMode: 420 + name: translation-nginx-config + name: nginx-config-volume diff --git a/Translation/tests/test_compose_on_gaudi.sh b/Translation/tests/test_compose_on_gaudi.sh index f66af96cb..558ec9e28 100644 --- a/Translation/tests/test_compose_on_gaudi.sh +++ b/Translation/tests/test_compose_on_gaudi.sh @@ -19,7 +19,7 @@ function build_docker_images() { git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ echo "Build all the images with --no-cache, check docker_image_build.log for details..." - service_list="translation translation-ui llm-tgi" + service_list="translation translation-ui llm-tgi nginx" docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log docker pull ghcr.io/huggingface/tgi-gaudi:2.0.1 @@ -35,6 +35,12 @@ function start_services() { export MEGA_SERVICE_HOST_IP=${ip_address} export LLM_SERVICE_HOST_IP=${ip_address} export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/translation" + export NGINX_PORT=80 + export FRONTEND_SERVICE_IP=${ip_address} + export FRONTEND_SERVICE_PORT=5173 + export BACKEND_SERVICE_NAME=translation + export BACKEND_SERVICE_IP=${ip_address} + export BACKEND_SERVICE_PORT=8888 sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env @@ -80,8 +86,6 @@ function validate_services() { sleep 1s } - - function validate_microservices() { # Check if the microservices are running correctly. @@ -110,6 +114,14 @@ function validate_megaservice() { "mega-translation" \ "translation-gaudi-backend-server" \ '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' + + # test the megeservice via nginx + validate_services \ + "${ip_address}:80/v1/translation" \ + "translation" \ + "mega-translation-nginx" \ + "translation-gaudi-nginx-server" \ + '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' } function validate_frontend() { diff --git a/Translation/tests/test_compose_on_xeon.sh b/Translation/tests/test_compose_on_xeon.sh index a648ba832..2d0c5306d 100644 --- a/Translation/tests/test_compose_on_xeon.sh +++ b/Translation/tests/test_compose_on_xeon.sh @@ -19,10 +19,10 @@ function build_docker_images() { git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ echo "Build all the images with --no-cache, check docker_image_build.log for details..." - service_list="translation translation-ui llm-tgi" + service_list="translation translation-ui llm-tgi nginx" docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log - docker pull ghcr.io/huggingface/text-generation-inference:1.4 + docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu docker images && sleep 1s } @@ -35,6 +35,12 @@ function start_services() { export MEGA_SERVICE_HOST_IP=${ip_address} export LLM_SERVICE_HOST_IP=${ip_address} export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/translation" + export NGINX_PORT=80 + export FRONTEND_SERVICE_IP=${ip_address} + export FRONTEND_SERVICE_PORT=5173 + export BACKEND_SERVICE_NAME=translation + export BACKEND_SERVICE_IP=${ip_address} + export BACKEND_SERVICE_PORT=8888 sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env @@ -42,7 +48,8 @@ function start_services() { docker compose up -d > ${LOG_PATH}/start_services_with_compose.log n=0 - until [[ "$n" -ge 100 ]]; do + # wait long for llm model download + until [[ "$n" -ge 500 ]]; do docker logs tgi-service > ${LOG_PATH}/tgi_service_start.log if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then break @@ -108,6 +115,14 @@ function validate_megaservice() { "mega-translation" \ "translation-xeon-backend-server" \ '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' + + # test the megeservice via nginx + validate_services \ + "${ip_address}:80/v1/translation" \ + "translation" \ + "mega-translation-nginx" \ + "translation-xeon-nginx-server" \ + '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' } function validate_frontend() { diff --git a/Translation/tests/test_manifest_on_gaudi.sh b/Translation/tests/test_manifest_on_gaudi.sh new file mode 100755 index 000000000..6e4edbeb4 --- /dev/null +++ b/Translation/tests/test_manifest_on_gaudi.sh @@ -0,0 +1,91 @@ +#!/bin/bash +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +set -xe +USER_ID=$(whoami) +LOG_PATH=/home/$(whoami)/logs +MOUNT_DIR=/home/$USER_ID/.cache/huggingface/hub +IMAGE_REPO=${IMAGE_REPO:-} +IMAGE_TAG=${IMAGE_TAG:-latest} + +function init_translation() { + # executed under path manifest/translation/xeon + # replace the mount dir "path: /mnt/model" with "path: $CHART_MOUNT" + find . -name '*.yaml' -type f -exec sed -i "s#path: /mnt/opea-models#path: $MOUNT_DIR#g" {} \; + if [ $CONTEXT == "CI" ]; then + # replace megaservice image tag + find . -name '*.yaml' -type f -exec sed -i "s#image: \"opea/translation:latest#image: \"opea/translation:${IMAGE_TAG}#g" {} \; + else + # replace microservice image tag + find . -name '*.yaml' -type f -exec sed -i "s#image: \"opea/\(.*\):latest#image: \"opea/\1:${IMAGE_TAG}#g" {} \; + fi + # replace the repository "image: opea/*" with "image: $IMAGE_REPO/opea/" + find . -name '*.yaml' -type f -exec sed -i "s#image: \"opea/*#image: \"${IMAGE_REPO}opea/#g" {} \; + # set huggingface token + find . -name '*.yaml' -type f -exec sed -i "s#insert-your-huggingface-token-here#$(cat /home/$USER_ID/.cache/huggingface/token)#g" {} \; +} + +function install_translation { + echo "namespace is $NAMESPACE" + kubectl apply -f translation.yaml -n $NAMESPACE + sleep 50s +} + +function validate_translation() { + ip_address=$(kubectl get svc $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.spec.clusterIP}') + port=$(kubectl get svc $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.spec.ports[0].port}') + echo "try to curl http://${ip_address}:${port}/v1/translation..." + + # generate a random logfile name to avoid conflict among multiple runners + LOGFILE=$LOG_PATH/curlmega_$NAMESPACE.log + # Curl the Mega Service + curl http://${ip_address}:${port}/v1/translation \ + -H 'Content-Type: application/json' \ + -d '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' > $LOGFILE + exit_code=$? + if [ $exit_code -ne 0 ]; then + echo "Megaservice translation failed, please check the logs in $LOGFILE!" + exit 1 + fi + + echo "Checking response results, make sure the output is reasonable. " + local status=false + if [[ -f $LOGFILE ]] && \ + [[ $(grep -c "translation" $LOGFILE) != 0 ]]; then + status=true + fi + + if [ $status == false ]; then + echo "Response check failed, please check the logs in artifacts!" + else + echo "Response check succeed!" + fi +} + +if [ $# -eq 0 ]; then + echo "Usage: $0 " + exit 1 +fi + +case "$1" in + init_Translation) + pushd Translation/kubernetes/intel/hpu/gaudi/manifest + init_translation + popd + ;; + install_Translation) + pushd Translation/kubernetes/intel/hpu/gaudi/manifest + NAMESPACE=$2 + install_translation + popd + ;; + validate_Translation) + NAMESPACE=$2 + SERVICE_NAME=translation + validate_translation + ;; + *) + echo "Unknown function: $1" + ;; +esac diff --git a/Translation/tests/test_manifest_on_xeon.sh b/Translation/tests/test_manifest_on_xeon.sh new file mode 100755 index 000000000..34f04f5ab --- /dev/null +++ b/Translation/tests/test_manifest_on_xeon.sh @@ -0,0 +1,90 @@ +#!/bin/bash +# Copyright (C) 2024 Intel Corporation +# SPDX-License-Identifier: Apache-2.0 + +set -xe +USER_ID=$(whoami) +LOG_PATH=/home/$(whoami)/logs +MOUNT_DIR=/home/$USER_ID/.cache/huggingface/hub +IMAGE_REPO=${IMAGE_REPO:-} +IMAGE_TAG=${IMAGE_TAG:-latest} + +function init_translation() { + # executed under path manifest/translation/xeon + # replace the mount dir "path: /mnt/model" with "path: $CHART_MOUNT" + find . -name '*.yaml' -type f -exec sed -i "s#path: /mnt/opea-models#path: $MOUNT_DIR#g" {} \; + if [ $CONTEXT == "CI" ]; then + # replace megaservice image tag + find . -name '*.yaml' -type f -exec sed -i "s#image: \"opea/translation:latest#image: \"opea/translation:${IMAGE_TAG}#g" {} \; + else + # replace microservice image tag + find . -name '*.yaml' -type f -exec sed -i "s#image: \"opea/\(.*\):latest#image: \"opea/\1:${IMAGE_TAG}#g" {} \; + fi + # replace the repository "image: opea/*" with "image: $IMAGE_REPO/opea/" + find . -name '*.yaml' -type f -exec sed -i "s#image: \"opea/*#image: \"${IMAGE_REPO}opea/#g" {} \; + # set huggingface token + find . -name '*.yaml' -type f -exec sed -i "s#insert-your-huggingface-token-here#$(cat /home/$USER_ID/.cache/huggingface/token)#g" {} \; +} + +function install_translation { + echo "namespace is $NAMESPACE" + kubectl apply -f translation.yaml -n $NAMESPACE +} + +function validate_translation() { + ip_address=$(kubectl get svc $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.spec.clusterIP}') + port=$(kubectl get svc $SERVICE_NAME -n $NAMESPACE -o jsonpath='{.spec.ports[0].port}') + echo "try to curl http://${ip_address}:${port}/v1/translation..." + + # generate a random logfile name to avoid conflict among multiple runners + LOGFILE=$LOG_PATH/curlmega_$NAMESPACE.log + # Curl the Mega Service + curl http://${ip_address}:${port}/v1/translation \ + -H 'Content-Type: application/json' \ + -d '{"language_from": "Chinese","language_to": "English","source_language": "ζˆ‘ηˆ±ζœΊε™¨ηΏ»θ―‘γ€‚"}' > $LOGFILE + exit_code=$? + if [ $exit_code -ne 0 ]; then + echo "Megaservice translation failed, please check the logs in $LOGFILE!" + exit 1 + fi + + echo "Checking response results, make sure the output is reasonable. " + local status=false + if [[ -f $LOGFILE ]] && \ + [[ $(grep -c "translation" $LOGFILE) != 0 ]]; then + status=true + fi + + if [ $status == false ]; then + echo "Response check failed, please check the logs in artifacts!" + else + echo "Response check succeed!" + fi +} + +if [ $# -eq 0 ]; then + echo "Usage: $0 " + exit 1 +fi + +case "$1" in + init_Translation) + pushd Translation/kubernetes/intel/cpu/xeon/manifest + init_translation + popd + ;; + install_Translation) + pushd Translation/kubernetes/intel/cpu/xeon/manifest + NAMESPACE=$2 + install_translation + popd + ;; + validate_Translation) + NAMESPACE=$2 + SERVICE_NAME=translation + validate_translation + ;; + *) + echo "Unknown function: $1" + ;; +esac From dc94026d9827b15c1f48f34554aef4f3b237fc7c Mon Sep 17 00:00:00 2001 From: rbrugaro Date: Wed, 18 Sep 2024 18:20:55 -0700 Subject: [PATCH 8/9] doc PR to main instead of of v1.0r (#838) Signed-off-by: rbrugaro Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> --- README.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index cbcabbe2f..40aa39e1a 100644 --- a/README.md +++ b/README.md @@ -54,9 +54,18 @@ Deployment are based on released docker images by default, check [docker image l Check [here](./supported_examples.md) for detailed information of supported examples, models, hardwares, etc. +## Contributing to OPEA + +Welcome to the OPEA open-source community! We are thrilled to have you here and excited about the potential contributions you can bring to the OPEA platform. Whether you are fixing bugs, adding new GenAI components, improving documentation, or sharing your unique use cases, your contributions are invaluable. + +Together, we can make OPEA the go-to platform for enterprise AI solutions. Let's work together to push the boundaries of what's possible and create a future where AI is accessible, efficient, and impactful for everyone. + +Please check the [Contributing guidelines](https://github.com/opea-project/docs/tree/main/community/CONTRIBUTING.md) for a detailed guide on how to contribute a GenAI component and all the ways you can contribute! + +Thank you for being a part of this journey. We can't wait to see what we can achieve together! + ## Additional Content - [Code of Conduct](https://github.com/opea-project/docs/tree/main/community/CODE_OF_CONDUCT.md) -- [Contribution](https://github.com/opea-project/docs/tree/main/community/CONTRIBUTING.md) - [Security Policy](https://github.com/opea-project/docs/tree/main/community/SECURITY.md) - [Legal Information](/LEGAL_INFORMATION.md) From d85ec0947c6ec282cd2b94bd839cc08423cb253a Mon Sep 17 00:00:00 2001 From: Malini Bhandaru Date: Wed, 18 Sep 2024 18:27:01 -0700 Subject: [PATCH 9/9] Remove marketing materials (#837) Signed-off-by: Malini Bhandaru --- VisualQnA/README.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/VisualQnA/README.md b/VisualQnA/README.md index d5f5c646b..99cdb26a2 100644 --- a/VisualQnA/README.md +++ b/VisualQnA/README.md @@ -13,9 +13,7 @@ General architecture of VQA shows below: ![VQA](./assets/img/vqa.png) -This example guides you through how to deploy a [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) (Open Large Multimodal Models) model [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html). We invite contributions from other hardware vendors to expand OPEA ecosystem. - -The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Visit [Habana AI products](https://habana.ai/products) for more details. +This example guides you through how to deploy a [LLaVA-NeXT](https://github.com/LLaVA-VL/LLaVA-NeXT) (Open Large Multimodal Models) model on [Intel Gaudi2](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html) and [Intel Xeon Scalable Processors](https://www.intel.com/content/www/us/en/products/details/processors/xeon.html). We invite contributions from other hardware vendors to expand the OPEA ecosystem. ![llava screenshot](./assets/img/llava_screenshot1.png) ![llava-screenshot](./assets/img/llava_screenshot2.png)