Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChatQnA with Remote Inference Endpoints (Kubernetes) #1149

Merged
merged 10 commits into from
Nov 18, 2024
49 changes: 47 additions & 2 deletions ChatQnA/kubernetes/intel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
```
cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
kubectl apply -f chatqna.yaml
```

Expand All @@ -35,10 +35,55 @@ kubectl apply -f chatqna_bf16.yaml
```
cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
sed -i "s/insert-your-huggingface-token-here/${HUGGINGFACEHUB_API_TOKEN}/g" chatqna.yaml
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna.yaml
kubectl apply -f chatqna.yaml
```

## Deploy on Xeon with Remote LLM Model

```
cd GenAIExamples/ChatQnA/kubernetes/intel/cpu/xeon/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
export vLLM_ENDPOINT="Your Remote Inference Endpoint"
sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-remote-inference.yaml
sed -i "s|insert-your-remote-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-remote-inference.yaml
```

### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)

If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config" ConfigMap



### Deploy
```
kubectl apply -f chatqna-remote-inference.yaml
```

## Deploy on Gaudi with TEI, Rerank, and vLLM Models Running Remotely

```
cd GenAIExamples/ChatQnA/kubernetes/intel/hpu/gaudi/manifest
export HUGGINGFACEHUB_API_TOKEN="YourOwnToken"
export vLLM_ENDPOINT="Your Remote Inference Endpoint"
export TEI_EMBEDDING_ENDPOINT="Your Remote TEI Embedding Endpoint"
export TEI_RERANKING_ENDPOINT="Your Remote Reranking Endpoint"

sed -i "s|insert-your-huggingface-token-here|${HUGGINGFACEHUB_API_TOKEN}|g" chatqna-vllm-remote-inference.yaml
sed -i "s|insert-your-remote-vllm-inference-endpoint|${vLLM_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
sed -i "s|insert-your-remote-embedding-endpoint|${TEI_EMBEDDING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
sed -i "s|insert-your-remote-reranking-endpoint|${TEI_RERANKING_ENDPOINT}|g" chatqna-vllm-remote-inference.yaml
```

### Additional Steps for Remote Endpoints with Authentication (If No Authentication Skip This Step)

If your remote inference endpoint is protected with OAuth Client Credentials authentication, update CLIENTID, CLIENT_SECRET and TOKEN_URL with the correct values in "chatqna-llm-uservice-config", "chatqna-data-prep-config", "chatqna-embedding-usvc-config", "chatqna-reranking-usvc-config", "chatqna-retriever-usvc-config" ConfigMaps

### Deploy
```
kubectl apply -f chatqna-vllm-remote-inference.yaml
```

## Verify Services

To verify the installation, run the command `kubectl get pod` to make sure all pods are running.
Expand Down
Loading
Loading