-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add docker_compose example for AMD ROCm deployment #1053
Closed
astafevav
wants to merge
6
commits into
opea-project:main
from
astafevav:add-faqgen-docker-compose-example
+360
−0
Closed
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
ec1f0ab
Add compose deploy example for FaqGen on AMD ROCm
astafevav c2b21ef
Update FaqGen/docker_compose/amd/gpu/rocm/README.md
astafevav 06aca48
Update FaqGen/docker_compose/amd/gpu/rocm/README.md
astafevav 03564bc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] a0040b4
Merge branch 'main' into add-faqgen-docker-compose-example
chensuyue 3001aa5
Fix test script
astafevav File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
## 🚀 Start Microservices and MegaService | ||
|
||
### Required Models | ||
|
||
Default model is "meta-llama/Meta-Llama-3-8B-Instruct". Change "LLM_MODEL_ID" in environment variables below if you want to use another model. | ||
|
||
For gated models, you also need to provide [HuggingFace token](https://huggingface.co/docs/hub/security-tokens) in "HUGGINGFACEHUB_API_TOKEN" environment variable. | ||
|
||
### Setup Environment Variables | ||
|
||
Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below. | ||
|
||
```bash | ||
export FAQGEN_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" | ||
export HOST_IP=${your_no_proxy} | ||
export FAQGEN_TGI_SERVICE_PORT=8008 | ||
export FAQGEN_LLM_SERVER_PORT=9000 | ||
export FAQGEN_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token} | ||
export FAQGEN_BACKEND_SERVER_PORT=8888 | ||
export FAGGEN_UI_PORT=5173 | ||
``` | ||
|
||
Note: Please replace with `host_ip` with your external IP address, do not use localhost. | ||
|
||
Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered<node>, where <node> is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) | ||
|
||
Example for set isolation for 1 GPU | ||
|
||
``` | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD128:/dev/dri/renderD128 | ||
``` | ||
|
||
Example for set isolation for 2 GPUs | ||
|
||
``` | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD128:/dev/dri/renderD128 | ||
- /dev/dri/card0:/dev/dri/card0 | ||
- /dev/dri/renderD129:/dev/dri/renderD129 | ||
``` | ||
|
||
Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus) | ||
|
||
### Start Microservice Docker Containers | ||
|
||
```bash | ||
cd GenAIExamples/FaqGen/docker_compose/amd/gpu/rocm/ | ||
docker compose up -d | ||
``` | ||
|
||
### Validate Microservices | ||
|
||
1. TGI Service | ||
|
||
```bash | ||
curl http://${host_ip}:8008/generate \ | ||
-X POST \ | ||
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
2. LLM Microservice | ||
|
||
```bash | ||
curl http://${host_ip}:9000/v1/faqgen \ | ||
-X POST \ | ||
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \ | ||
-H 'Content-Type: application/json' | ||
``` | ||
|
||
3. MegaService | ||
|
||
```bash | ||
curl http://${host_ip}:8888/v1/faqgen -H "Content-Type: application/json" -d '{ | ||
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5." | ||
}' | ||
``` | ||
|
||
Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service. | ||
|
||
## 🚀 Launch the UI | ||
|
||
Open this URL `http://{host_ip}:5173` in your browser to access the frontend. | ||
|
||
![project-screenshot](../../../../assets/img/faqgen_ui_text.png) | ||
|
||
## 🚀 Launch the React UI (Optional) | ||
|
||
To access the FAQGen (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `faqgen-rocm-ui-server` service with the `faqgen-rocm-react-ui-server` service as per the config below: | ||
|
||
```bash | ||
faqgen-rocm-react-ui-server: | ||
image: opea/faqgen-react-ui:latest | ||
container_name: faqgen-rocm-react-ui-server | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
ports: | ||
- 5174:80 | ||
depends_on: | ||
- faqgen-rocm-backend-server | ||
ipc: host | ||
restart: always | ||
``` | ||
|
||
Open this URL `http://{host_ip}:5174` in your browser to access the react based frontend. | ||
|
||
- Create FAQs from Text input | ||
![project-screenshot](../../../../assets/img/faqgen_react_ui_text.png) | ||
|
||
- Create FAQs from Text Files | ||
![project-screenshot](../../../../assets/img/faqgen_react_ui_text_file.png) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
services: | ||
faqgen-tgi-service: | ||
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm | ||
container_name: faggen-tgi-service | ||
ports: | ||
- "${FAQGEN_TGI_SERVICE_PORT}:80" | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${FAQGEN_TGI_SERVICE_PORT}" | ||
HUGGINGFACEHUB_API_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
HUGGING_FACE_HUB_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
volumes: | ||
- "./data:/data" | ||
shm_size: 1g | ||
devices: | ||
- /dev/kfd:/dev/kfd | ||
- /dev/dri/ | ||
cap_add: | ||
- SYS_PTRACE | ||
group_add: | ||
- video | ||
security_opt: | ||
- seccomp:unconfined | ||
ipc: host | ||
command: --model-id ${FAQGEN_LLM_MODEL_ID} | ||
faqgen-llm-server: | ||
image: ${REGISTRY:-opea}/llm-faqgen-tgi:${TAG:-latest} | ||
container_name: faqgen-llm-server | ||
depends_on: | ||
- faqgen-tgi-service | ||
ports: | ||
- "${FAQGEN_LLM_SERVER_PORT}:9000" | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${FAQGEN_TGI_SERVICE_PORT}" | ||
HUGGINGFACEHUB_API_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
HUGGING_FACE_HUB_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN} | ||
restart: unless-stopped | ||
faqgen-backend-server: | ||
image: ${REGISTRY:-opea}/faqgen:${TAG:-latest} | ||
container_name: faqgen-backend-server | ||
depends_on: | ||
- faqgen-tgi-service | ||
- faqgen-llm-server | ||
ports: | ||
- "${FAQGEN_BACKEND_SERVER_PORT}:8888" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- MEGA_SERVICE_HOST_IP=${HOST_IP} | ||
- LLM_SERVICE_HOST_IP=${HOST_IP} | ||
ipc: host | ||
restart: always | ||
faqgen-ui-server: | ||
image: ${REGISTRY:-opea}/faqgen-ui:${TAG:-latest} | ||
container_name: faqgen-ui-server | ||
depends_on: | ||
- faqgen-backend-server | ||
ports: | ||
- "${FAGGEN_UI_PORT}:5173" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- DOC_BASE_URL="http://${HOST_IP}:${FAQGEN_BACKEND_SERVER_PORT}/v1/faqgen" | ||
ipc: host | ||
restart: always | ||
networks: | ||
default: | ||
driver: bridge |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,167 @@ | ||
#!/bin/bash | ||
# Copyright (C) 2024 Advanced Micro Devices, Inc. | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
set -xe | ||
IMAGE_REPO=${IMAGE_REPO:-"opea"} | ||
IMAGE_TAG=${IMAGE_TAG:-"latest"} | ||
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}" | ||
echo "TAG=IMAGE_TAG=${IMAGE_TAG}" | ||
export REGISTRY=${IMAGE_REPO} | ||
export TAG=${IMAGE_TAG} | ||
|
||
WORKPATH=$(dirname "$PWD") | ||
LOG_PATH="$WORKPATH/tests" | ||
ip_address=$(hostname -I | awk '{print $1}') | ||
|
||
function build_docker_images() { | ||
cd $WORKPATH/docker_image_build | ||
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../ | ||
|
||
echo "Build all the images with --no-cache, check docker_image_build.log for details..." | ||
service_list="faqgen faqgen-ui llm-faqgen-tgi" | ||
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log | ||
|
||
docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm | ||
docker images && sleep 1s | ||
} | ||
|
||
function start_services() { | ||
cd $WORKPATH/docker_compose/amd/gpu/rocm | ||
|
||
export FAQGEN_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct" | ||
export HOST_IP=${ip_address} | ||
export FAQGEN_TGI_SERVICE_PORT=8008 | ||
export FAQGEN_LLM_SERVER_PORT=9000 | ||
export FAQGEN_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN} | ||
export FAQGEN_BACKEND_SERVER_PORT=8888 | ||
export FAGGEN_UI_PORT=5173 | ||
export TGI_LLM_ENDPOINT="http://${ip_address}:8008" | ||
export MEGA_SERVICE_HOST_IP=${ip_address} | ||
export LLM_SERVICE_HOST_IP=${ip_address} | ||
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/faqgen" | ||
|
||
sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env | ||
|
||
# Start Docker Containers | ||
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log | ||
|
||
n=0 | ||
until [[ "$n" -ge 100 ]]; do | ||
docker logs faggen-tgi-service > ${LOG_PATH}/tgi_service_start.log | ||
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then | ||
break | ||
fi | ||
sleep 5s | ||
n=$((n+1)) | ||
done | ||
} | ||
|
||
function validate_services() { | ||
local URL="$1" | ||
local EXPECTED_RESULT="$2" | ||
local SERVICE_NAME="$3" | ||
local DOCKER_NAME="$4" | ||
local INPUT_DATA="$5" | ||
|
||
local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL") | ||
if [ "$HTTP_STATUS" -eq 200 ]; then | ||
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..." | ||
|
||
local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log) | ||
|
||
if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then | ||
echo "[ $SERVICE_NAME ] Content is as expected." | ||
else | ||
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT" | ||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log | ||
exit 1 | ||
fi | ||
else | ||
echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS" | ||
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log | ||
exit 1 | ||
fi | ||
sleep 1s | ||
} | ||
|
||
function validate_microservices() { | ||
# Check if the microservices are running correctly. | ||
|
||
# tgi for llm service | ||
validate_services \ | ||
"${ip_address}:8008/generate" \ | ||
"generated_text" \ | ||
"tgi-service" \ | ||
"faqgen-tgi-service" \ | ||
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' | ||
|
||
# llm microservice | ||
validate_services \ | ||
"${ip_address}:9000/v1/faqgen" \ | ||
"data: " \ | ||
"llm" \ | ||
"faqgen-llm-server" \ | ||
'{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' | ||
} | ||
|
||
function validate_megaservice() { | ||
# Curl the Mega Service | ||
validate_services \ | ||
"${ip_address}:8888/v1/faqgen" \ | ||
"Text Embeddings Inference" \ | ||
"mega-faqgen" \ | ||
"faqgen-backend-server" \ | ||
'{"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' | ||
} | ||
|
||
function validate_frontend() { | ||
cd $WORKPATH/ui/svelte | ||
local conda_env_name="OPEA_e2e" | ||
export PATH=${HOME}/miniforge3/bin/:$PATH | ||
if conda info --envs | grep -q "$conda_env_name"; then | ||
echo "$conda_env_name exist!" | ||
else | ||
conda create -n ${conda_env_name} python=3.12 -y | ||
fi | ||
source activate ${conda_env_name} | ||
|
||
sed -i "s/localhost/$ip_address/g" playwright.config.ts | ||
|
||
conda install -c conda-forge nodejs -y | ||
npm install && npm ci && npx playwright install --with-deps | ||
node -v && npm -v && pip list | ||
|
||
exit_status=0 | ||
npx playwright test || exit_status=$? | ||
|
||
if [ $exit_status -ne 0 ]; then | ||
echo "[TEST INFO]: ---------frontend test failed---------" | ||
exit $exit_status | ||
else | ||
echo "[TEST INFO]: ---------frontend test passed---------" | ||
fi | ||
} | ||
|
||
function stop_docker() { | ||
cd $WORKPATH/docker_compose/amd/gpu/rocm | ||
docker compose stop && docker compose rm -f | ||
} | ||
|
||
function main() { | ||
|
||
stop_docker | ||
|
||
if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be good to make all scripts |
||
start_services | ||
|
||
validate_microservices | ||
validate_megaservice | ||
# validate_frontend | ||
|
||
stop_docker | ||
echo y | docker system prune | ||
|
||
} | ||
|
||
main |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intended for development, not production, as it disables security measures instead of adding them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD KB recommend to use this option for HPC environments to enable enables memory mapping,
Probably will change in the future when GPU path-through method will be changed to use "--gpus" option for docker and docker compose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the PTRACE? Surely that's needed only for debugging, and better done with separate container spec adding just that capability, and other extra tooling?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TGI app is crashing when trying to run rocm image without PTRACE, so this capability is still needed,
All docker options were taken from PyTorch installation for ROCm manual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
That's very odd though. Any idea why it requires that, or do you have a backtrace where it's crashing?