Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docker_compose example for AMD ROCm deployment #1053

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions FaqGen/docker_compose/amd/gpu/rocm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
## 🚀 Start Microservices and MegaService

### Required Models

Default model is "meta-llama/Meta-Llama-3-8B-Instruct". Change "LLM_MODEL_ID" in environment variables below if you want to use another model.

For gated models, you also need to provide [HuggingFace token](https://huggingface.co/docs/hub/security-tokens) in "HUGGINGFACEHUB_API_TOKEN" environment variable.

### Setup Environment Variables

Since the `compose.yaml` will consume some environment variables, you need to setup them in advance as below.

```bash
export FAQGEN_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export HOST_IP=${your_no_proxy}
export FAQGEN_TGI_SERVICE_PORT=8008
export FAQGEN_LLM_SERVER_PORT=9000
export FAQGEN_HUGGINGFACEHUB_API_TOKEN=${your_hf_api_token}
export FAQGEN_BACKEND_SERVER_PORT=8888
export FAGGEN_UI_PORT=5173
```

Note: Please replace with `host_ip` with your external IP address, do not use localhost.

Note: In order to limit access to a subset of GPUs, please pass each device individually using one or more -device /dev/dri/rendered<node>, where <node> is the card index, starting from 128. (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)

Example for set isolation for 1 GPU

```
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
```

Example for set isolation for 2 GPUs

```
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD128:/dev/dri/renderD128
- /dev/dri/card0:/dev/dri/card0
- /dev/dri/renderD129:/dev/dri/renderD129
```

Please find more information about accessing and restricting AMD GPUs in the link (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/how-to/docker.html#docker-restrict-gpus)

### Start Microservice Docker Containers

```bash
cd GenAIExamples/FaqGen/docker_compose/amd/gpu/rocm/
docker compose up -d
```

### Validate Microservices

1. TGI Service

```bash
curl http://${host_ip}:8008/generate \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}' \
-H 'Content-Type: application/json'
```

2. LLM Microservice

```bash
curl http://${host_ip}:9000/v1/faqgen \
-X POST \
-d '{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}' \
-H 'Content-Type: application/json'
```

3. MegaService

```bash
curl http://${host_ip}:8888/v1/faqgen -H "Content-Type: application/json" -d '{
"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."
}'
```

Following the validation of all aforementioned microservices, we are now prepared to construct a mega-service.

## 🚀 Launch the UI

Open this URL `http://{host_ip}:5173` in your browser to access the frontend.

![project-screenshot](../../../../assets/img/faqgen_ui_text.png)

## 🚀 Launch the React UI (Optional)

To access the FAQGen (react based) frontend, modify the UI service in the `compose.yaml` file. Replace `faqgen-rocm-ui-server` service with the `faqgen-rocm-react-ui-server` service as per the config below:

```bash
faqgen-rocm-react-ui-server:
image: opea/faqgen-react-ui:latest
container_name: faqgen-rocm-react-ui-server
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
ports:
- 5174:80
depends_on:
- faqgen-rocm-backend-server
ipc: host
restart: always
```

Open this URL `http://{host_ip}:5174` in your browser to access the react based frontend.

- Create FAQs from Text input
![project-screenshot](../../../../assets/img/faqgen_react_ui_text.png)

- Create FAQs from Text Files
![project-screenshot](../../../../assets/img/faqgen_react_ui_text_file.png)
79 changes: 79 additions & 0 deletions FaqGen/docker_compose/amd/gpu/rocm/compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

services:
faqgen-tgi-service:
image: ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
container_name: faggen-tgi-service
ports:
- "${FAQGEN_TGI_SERVICE_PORT}:80"
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${FAQGEN_TGI_SERVICE_PORT}"
HUGGINGFACEHUB_API_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN}
volumes:
- "./data:/data"
shm_size: 1g
devices:
- /dev/kfd:/dev/kfd
- /dev/dri/
cap_add:
- SYS_PTRACE
group_add:
- video
security_opt:
- seccomp:unconfined
Comment on lines +23 to +28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intended for development, not production, as it disables security measures instead of adding them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD KB recommend to use this option for HPC environments to enable enables memory mapping,
Probably will change in the future when GPU path-through method will be changed to use "--gpus" option for docker and docker compose.

Copy link
Contributor

@eero-t eero-t Nov 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the PTRACE? Surely that's needed only for debugging, and better done with separate container spec adding just that capability, and other extra tooling?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TGI app is crashing when trying to run rocm image without PTRACE, so this capability is still needed,

All docker options were taken from PyTorch installation for ROCm manual

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TGI app is crashing when trying to run rocm image without PTRACE, so this capability is still needed,

Ok.

That's very odd though. Any idea why it requires that, or do you have a backtrace where it's crashing?

ipc: host
command: --model-id ${FAQGEN_LLM_MODEL_ID}
faqgen-llm-server:
image: ${REGISTRY:-opea}/llm-faqgen-tgi:${TAG:-latest}
container_name: faqgen-llm-server
depends_on:
- faqgen-tgi-service
ports:
- "${FAQGEN_LLM_SERVER_PORT}:9000"
ipc: host
environment:
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
TGI_LLM_ENDPOINT: "http://${HOST_IP}:${FAQGEN_TGI_SERVICE_PORT}"
HUGGINGFACEHUB_API_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN}
HUGGING_FACE_HUB_TOKEN: ${FAQGEN_HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
faqgen-backend-server:
image: ${REGISTRY:-opea}/faqgen:${TAG:-latest}
container_name: faqgen-backend-server
depends_on:
- faqgen-tgi-service
- faqgen-llm-server
ports:
- "${FAQGEN_BACKEND_SERVER_PORT}:8888"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- MEGA_SERVICE_HOST_IP=${HOST_IP}
- LLM_SERVICE_HOST_IP=${HOST_IP}
ipc: host
restart: always
faqgen-ui-server:
image: ${REGISTRY:-opea}/faqgen-ui:${TAG:-latest}
container_name: faqgen-ui-server
depends_on:
- faqgen-backend-server
ports:
- "${FAGGEN_UI_PORT}:5173"
environment:
- no_proxy=${no_proxy}
- https_proxy=${https_proxy}
- http_proxy=${http_proxy}
- DOC_BASE_URL="http://${HOST_IP}:${FAQGEN_BACKEND_SERVER_PORT}/v1/faqgen"
ipc: host
restart: always
networks:
default:
driver: bridge
167 changes: 167 additions & 0 deletions FaqGen/tests/test_compose_on_rocm.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
#!/bin/bash
# Copyright (C) 2024 Advanced Micro Devices, Inc.
# SPDX-License-Identifier: Apache-2.0

set -xe
IMAGE_REPO=${IMAGE_REPO:-"opea"}
IMAGE_TAG=${IMAGE_TAG:-"latest"}
echo "REGISTRY=IMAGE_REPO=${IMAGE_REPO}"
echo "TAG=IMAGE_TAG=${IMAGE_TAG}"
export REGISTRY=${IMAGE_REPO}
export TAG=${IMAGE_TAG}

WORKPATH=$(dirname "$PWD")
LOG_PATH="$WORKPATH/tests"
ip_address=$(hostname -I | awk '{print $1}')

function build_docker_images() {
cd $WORKPATH/docker_image_build
git clone https://github.com/opea-project/GenAIComps.git && cd GenAIComps && git checkout "${opea_branch:-"main"}" && cd ../

echo "Build all the images with --no-cache, check docker_image_build.log for details..."
service_list="faqgen faqgen-ui llm-faqgen-tgi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-generation-inference:2.3.1-rocm
docker images && sleep 1s
}

function start_services() {
cd $WORKPATH/docker_compose/amd/gpu/rocm

export FAQGEN_LLM_MODEL_ID="meta-llama/Meta-Llama-3-8B-Instruct"
export HOST_IP=${ip_address}
export FAQGEN_TGI_SERVICE_PORT=8008
export FAQGEN_LLM_SERVER_PORT=9000
export FAQGEN_HUGGINGFACEHUB_API_TOKEN=${HUGGINGFACEHUB_API_TOKEN}
export FAQGEN_BACKEND_SERVER_PORT=8888
export FAGGEN_UI_PORT=5173
export TGI_LLM_ENDPOINT="http://${ip_address}:8008"
export MEGA_SERVICE_HOST_IP=${ip_address}
export LLM_SERVICE_HOST_IP=${ip_address}
export BACKEND_SERVICE_ENDPOINT="http://${ip_address}:8888/v1/faqgen"

sed -i "s/backend_address/$ip_address/g" $WORKPATH/ui/svelte/.env

# Start Docker Containers
docker compose up -d > ${LOG_PATH}/start_services_with_compose.log

n=0
until [[ "$n" -ge 100 ]]; do
docker logs faggen-tgi-service > ${LOG_PATH}/tgi_service_start.log
if grep -q Connected ${LOG_PATH}/tgi_service_start.log; then
break
fi
sleep 5s
n=$((n+1))
done
}

function validate_services() {
local URL="$1"
local EXPECTED_RESULT="$2"
local SERVICE_NAME="$3"
local DOCKER_NAME="$4"
local INPUT_DATA="$5"

local HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL")
if [ "$HTTP_STATUS" -eq 200 ]; then
echo "[ $SERVICE_NAME ] HTTP status is 200. Checking content..."

local CONTENT=$(curl -s -X POST -d "$INPUT_DATA" -H 'Content-Type: application/json' "$URL" | tee ${LOG_PATH}/${SERVICE_NAME}.log)

if echo "$CONTENT" | grep -q "$EXPECTED_RESULT"; then
echo "[ $SERVICE_NAME ] Content is as expected."
else
echo "[ $SERVICE_NAME ] Content does not match the expected result: $CONTENT"
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
exit 1
fi
else
echo "[ $SERVICE_NAME ] HTTP status is not 200. Received status was $HTTP_STATUS"
docker logs ${DOCKER_NAME} >> ${LOG_PATH}/${SERVICE_NAME}.log
exit 1
fi
sleep 1s
}

function validate_microservices() {
# Check if the microservices are running correctly.

# tgi for llm service
validate_services \
"${ip_address}:8008/generate" \
"generated_text" \
"tgi-service" \
"faqgen-tgi-service" \
'{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":17, "do_sample": true}}'

# llm microservice
validate_services \
"${ip_address}:9000/v1/faqgen" \
"data: " \
"llm" \
"faqgen-llm-server" \
'{"query":"Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
}

function validate_megaservice() {
# Curl the Mega Service
validate_services \
"${ip_address}:8888/v1/faqgen" \
"Text Embeddings Inference" \
"mega-faqgen" \
"faqgen-backend-server" \
'{"messages": "Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings and sequence classification models. TEI enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5."}'
}

function validate_frontend() {
cd $WORKPATH/ui/svelte
local conda_env_name="OPEA_e2e"
export PATH=${HOME}/miniforge3/bin/:$PATH
if conda info --envs | grep -q "$conda_env_name"; then
echo "$conda_env_name exist!"
else
conda create -n ${conda_env_name} python=3.12 -y
fi
source activate ${conda_env_name}

sed -i "s/localhost/$ip_address/g" playwright.config.ts

conda install -c conda-forge nodejs -y
npm install && npm ci && npx playwright install --with-deps
node -v && npm -v && pip list

exit_status=0
npx playwright test || exit_status=$?

if [ $exit_status -ne 0 ]; then
echo "[TEST INFO]: ---------frontend test failed---------"
exit $exit_status
else
echo "[TEST INFO]: ---------frontend test passed---------"
fi
}

function stop_docker() {
cd $WORKPATH/docker_compose/amd/gpu/rocm
docker compose stop && docker compose rm -f
}

function main() {

stop_docker

if [[ "$IMAGE_REPO" == "opea" ]]; then build_docker_images; fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to make all scripts shellcheck clean (apt install shellscheck; shellcheck *.sh).

start_services

validate_microservices
validate_megaservice
# validate_frontend

stop_docker
echo y | docker system prune

}

main