From 4c644cc37cc16590996ae9a834f7460bf7630396 Mon Sep 17 00:00:00 2001 From: alexsin368 Date: Tue, 12 Nov 2024 16:26:08 -0800 Subject: [PATCH] adding codegen sample guide for gaudi deployment Signed-off-by: alexsin368 --- examples/CodeGen/deploy/gaudi.md | 372 +++++++++++++++++++++++++++++++ 1 file changed, 372 insertions(+) create mode 100644 examples/CodeGen/deploy/gaudi.md diff --git a/examples/CodeGen/deploy/gaudi.md b/examples/CodeGen/deploy/gaudi.md new file mode 100644 index 00000000..9ad3971d --- /dev/null +++ b/examples/CodeGen/deploy/gaudi.md @@ -0,0 +1,372 @@ +# Single node on-prem deployment with vLLM or TGI on Gaudi AI Accelerator + +This deployment section covers single-node on-prem deployment of the CodeGen +example with OPEA comps to deploy using the TGI service. We will be showcasing how +to build an e2e CodeGen solution with the CodeLlama-7b-hf model, +deployed on Intel® Tiber™ AI Cloud (ITAC). To quickly learn about OPEA in just 5 minutes and set up the required hardware and software, please follow the instructions in the +[Getting Started](https://opea-project.github.io/latest/getting-started/README.html) section. If you do +not have an ITAC instance or the hardware is not supported in the ITAC yet, you can still run this on-prem. + +## Overview + +The CodeGen use case uses a single microservice called LLM. In this tutorial, we +will walk through the steps on how on enable it from OPEA GenAIComps to deploy on +a single node TGI megaservice solution. + +The solution is aimed to show how to use the CodeLlama-7b-hf model on the Intel® +Gaudi® AI Accelerator. We will go through how to setup docker containers to start +the microservice and megaservice. The solution will then take text input as the +prompt and generate code accordingly. It is deployed with a UI with 2 modes to +choose from: + +1. Svelte-Based UI +2. React-Based UI + +The React-based UI is optional, but this feature is supported in this example if you +are interested in using it. + +Below is the list of content we will be covering in this tutorial: + +1. Prerequisites +2. Prepare (Building / Pulling) Docker images +3. Use case setup +4. Deploy the use case +5. Interacting with CodeGen deployment + +## Prerequisites + +The first step is to clone the GenAIExamples and GenAIComps. GenAIComps are +fundamental necessary components used to build examples you find in +GenAIExamples and deploy them as microservices. + +```bash +git clone https://github.com/opea-project/GenAIComps.git +git clone https://github.com/opea-project/GenAIExamples.git +``` + +The examples utilize model weights from HuggingFace and langchain. + +Setup your [HuggingFace](https://huggingface.co/) account and generate +[user access token](https://huggingface.co/docs/transformers.js/en/guides/private#step-1-generating-a-user-access-token). + +Setup the HuggingFace token +``` +export HUGGINGFACEHUB_API_TOKEN="Your_Huggingface_API_Token" +``` + +Additionally, if you plan to use the default model CodeLlama-7b-hf, you will +need to [request access](https://huggingface.co/meta-llama/CodeLlama-7b-hf) from HuggingFace. + +The example requires you to set the `host_ip` to deploy the microservices on +endpoint enabled with ports. Set the host_ip env variable +``` +export host_ip=$(hostname -I | awk '{print $1}') +``` + +Make sure to setup Proxies if you are behind a firewall +``` +export no_proxy=${your_no_proxy},$host_ip +export http_proxy=${your_http_proxy} +export https_proxy=${your_http_proxy} +``` + +## Prepare (Building / Pulling) Docker images + +This step will involve building/pulling relevant docker +images with step-by-step process along with sanity check in the end. For +CodeGen, the following docker images will be needed: LLM with TGI. +Additionally, you will need to build docker images for the +CodeGen megaservice, and UI (React UI is optional). In total, +there are **3 required docker images** and an optional docker image. + +### Build/Pull Microservice image + +::::::{tab-set} + +:::::{tab-item} Pull +:sync: Pull + +If you decide to pull the docker containers and not build them locally, +you can proceed to the next step where all the necessary containers will +be pulled in from dockerhub. + +::::: +:::::{tab-item} Build +:sync: Build + +From within the `GenAIComps` folder, checkout the release tag. +``` +cd GenAIComps +git checkout tags/v1.1 +``` + +#### Build LLM Image + +```bash +docker build --no-cache -t opea/llm-tgi:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f comps/llms/text-generation/tgi/Dockerfile . +``` + +### Build Mega Service images + +The Megaservice is a pipeline that channels data through different +microservices, each performing varied tasks. The LLM microservice and +flow of data are defined in the `codegen.py` file. You can also add or +remove microservices and customize the megaservice to suit your needs. + +Build the megaservice image for this use case + +```bash +cd .. +cd GenAIExamples +git checkout tags/v1.1 +cd CodeGen +``` + +```bash +docker build --no-cache -t opea/codegen:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f Dockerfile . +cd ../.. +``` + +### Build the UI Image + +You can build 2 modes of UI + +*Svelte UI* + +```bash +cd GenAIExamples/CodeGen/ui/ +docker build --no-cache -t opea/codegen-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile . +cd ../../.. +``` + +*React UI (Optional)* +If you want a React-based frontend. + +```bash +cd GenAIExamples/CodeGen/ui/ +docker build --no-cache -t opea/codegen-react-ui:latest --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy -f ./docker/Dockerfile.react . +cd ../../.. +``` + +### Sanity Check +Check if you have the following set of docker images by running the command `docker images` before moving on to the next step: + +* `opea/llm-tgi:latest` +* `opea/codegen:latest` +* `opea/codegen-ui:latest` +* `opea/codegen-react-ui:latest` (optional) + +::::: +:::::: + +## Use Case Setup + +The use case will use the following combination of GenAIComps and tools + +|Use Case Components | Tools | Model | Service Type | +|---------------- |--------------|-----------------------------|-------| +|LLM | TGI | meta-llama/CodeLlama-7b-hf | OPEA Microservice | +|UI | | NA | Gateway Service | + +Tools and models mentioned in the table are configurable either through the +environment variables or `compose.yaml` file. + +Set the necessary environment variables to setup the use case case by running the `set_env.sh` script. +Here is where the environment variable `LLM_MODEL_ID` is set, and you can change it to another model +by specifying the HuggingFace model card ID. + +```bash +cd GenAIExamples/CodeGen/docker_compose/ +source ./set_env.sh +cd ../../.. +``` + +## Deploy the Use Case + +In this tutorial, we will be deploying via docker compose with the provided +YAML file. The docker compose instructions should be starting all the +above mentioned services as containers. + +```bash +cd GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi +docker compose up -d +``` + + +### Checks to Ensure the Services are Running +#### Check Startup and Env Variables +Check the start up log by running `docker compose logs` to ensure there are no errors. +The warning messages print out the variables if they are **NOT** set. + +Here are some sample messages if proxy environment variables are not set: + + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "no_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "http_proxy" variable is not set. Defaulting to a blank string. + WARN[0000] The "https_proxy" variable is not set. Defaulting to a blank string. + +#### Check the Container Status + +Check if all the containers launched via docker compose has started. + +The CodeGen example starts 4 docker containers. Check that these docker +containers are all running, i.e, all the containers `STATUS` are `Up`. +You can do this with the `docker ps -a` command. + +``` +CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES +bbd235074c3d opea/codegen-ui:latest "docker-entrypoint.s…" About a minute ago Up About a minute 0.0.0.0:5173->5173/tcp, :::5173->5173/tcp codegen-gaudi-ui-server +8d3872ca66fa opea/codegen:latest "python codegen.py" About a minute ago Up About a minute 0.0.0.0:7778->7778/tcp, :::7778->7778/tcp codegen-gaudi-backend-server +b9fc39f51cdb opea/llm-tgi:latest "bash entrypoint.sh" About a minute ago Up About a minute 0.0.0.0:9000->9000/tcp, :::9000->9000/tcp llm-tgi-gaudi-server +39994e007f15 ghcr.io/huggingface/tgi-gaudi:2.0.1 "text-generation-lau…" About a minute ago Up About a minute 0.0.0.0:8028->80/tcp, :::8028->80/tcp tgi-gaudi-server +``` + +## Interacting with CodeGen for Deployment + +This section will walk you through the different ways to interact with +the microservices deployed. After a couple minutes, rerun `docker ps -a` +to ensure all the docker containers are still up and running. Then proceed +to validate each microservice and megaservice. + +### TGI Service + +```bash +curl http://${host_ip}:8028/generate \ + -X POST \ + -d '{"inputs":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","parameters":{"max_new_tokens":256, "do_sample": true}}' \ + -H 'Content-Type: application/json' +``` + +Here is the output: + +``` +{"generated_text":"\n\nIO iflow diagram:\n\n![IO flow diagram(s)](TodoList.iflow.svg)\n\n### TDD Kata walkthrough\n\n1. Start with a user story. We will add story tests later. In this case, we'll choose a story about adding a TODO:\n ```ruby\n as a user,\n i want to add a todo,\n so that i can get a todo list.\n\n conformance:\n - a new todo is added to the list\n - if the todo text is empty, raise an exception\n ```\n\n1. Write the first test:\n ```ruby\n feature Testing the addition of a todo to the list\n\n given a todo list empty list\n when a user adds a todo\n the todo should be added to the list\n\n inputs:\n when_values: [[\"A\"]]\n\n output validations:\n - todo_list contains { text:\"A\" }\n ```\n\n1. Write the first step implementation in any programming language you like. In this case, we will choose Ruby:\n ```ruby\n def add_"} +``` + +### LLM Microservice + +```bash +curl http://${host_ip}:9000/v1/chat/completions\ + -X POST \ + -d '{"query":"Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception.","max_tokens":256,"top_k":10,"top_p":0.95,"typical_p":0.95,"temperature":0.01,"repetition_penalty":1.03,"streaming":true}' \ + -H 'Content-Type: application/json' +``` + +The output is given one character at a time. It is too long to show +here but the last item will be +``` +data: [DONE] +``` + +### MegaService + +```bash +curl http://${host_ip}:7778/v1/codegen -H "Content-Type: application/json" -d '{ + "messages": "Implement a high-level API for a TODO list application. The API takes as input an operation request and updates the TODO list in place. If the request is invalid, raise an exception." + }' +``` + +The output is given one character at a time. It is too long to show +here but the last item will be +``` +data: [DONE] +``` + +## Launch UI +### Svelte UI +To access the frontend, open the following URL in your browser: http://{host_ip}:5173. By default, the UI runs on port 5173 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +```bash + codegen-gaudi-ui-server: + image: ${REGISTRY:-opea}/codegen-ui:${TAG:-latest} + ... + ports: + - "5173:5173" +``` + +### React-Based UI (Optional) +To access the React-based frontend, modify the UI service in the `compose.yaml` file. Replace `codegen-gaudi-ui-server` service with the codegen-gaudi-react-ui-server service as per the config below: +```bash +codegen-gaudi-react-ui-server: + image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest} + container_name: codegen-gaudi-react-ui-server + environment: + - no_proxy=${no_proxy} + - https_proxy=${https_proxy} + - http_proxy=${http_proxy} + - APP_CODE_GEN_URL=${BACKEND_SERVICE_ENDPOINT} + depends_on: + - codegen-gaudi-backend-server + ports: + - "5174:80" + ipc: host + restart: always +``` +Once the services are up, open the following URL in your browser: http://{host_ip}:5174. By default, the UI runs on port 80 internally. If you prefer to use a different host port to access the frontend, you can modify the port mapping in the `compose.yaml` file as shown below: +```bash + codegen-gaudi-react-ui-server: + image: ${REGISTRY:-opea}/codegen-react-ui:${TAG:-latest} + ... + ports: + - "80:80" +``` + +## Check Docker Container Logs + +You can check the log of a container by running this command: + +```bash +docker logs -t +``` + +You can also check the overall logs with the following command, where the +`compose.yaml` is the megaservice docker-compose configuration file. + +Assumming you are still in this directory `GenAIExamples/CodeGen/docker_compose/intel/hpu/gaudi`, +run the following command to check the logs: +```bash +docker compose -f compose.yaml logs +``` + +View the docker input parameters in `./CodeGen/docker_compose/intel/hpu/gaudi/compose.yaml` + +```yaml + tgi-service: + image: ghcr.io/huggingface/tgi-gaudi:2.0.1 + container_name: tgi-gaudi-server + ports: + - "8028:80" + volumes: + - "./data:/data" + environment: + no_proxy: ${no_proxy} + http_proxy: ${http_proxy} + https_proxy: ${https_proxy} + HABANA_VISIBLE_DEVICES: all + OMPI_MCA_btl_vader_single_copy_mechanism: none + HF_TOKEN: ${HUGGINGFACEHUB_API_TOKEN} + runtime: habana + cap_add: + - SYS_NICE + ipc: host + command: --model-id ${LLM_MODEL_ID} --max-input-length 1024 --max-total-tokens 2048 +``` + +The input `--model-id` is `${LLM_MODEL_ID}`. Ensure the environment variable `LLM_MODEL_ID` +is set correctly. Check spelling. Whenever this is changed, restart the containers to use +the newly selected model. + + +## Stop the services + +Once you are done with the entire pipeline and wish to stop and remove all the containers, use the command below: +``` +docker compose down +```