Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add OPEA deployment design #10

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions community/rfcs/24-05-17-OPEA-001-Deployment-Design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
**Author**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), [irisdingbj](https://github.com/irisdingbj), [KfreeZ](https://github.com/kfreez), [zhlsunshine](https://github.com/zhlsunshine) **Edit Here to add your id**

**Status**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

Under Review

**Objective**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

Have a clear and good design for users to deploy their own GenAI applications on docker or Kubernetes environment.


**Motivation**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

This RFC presents the OPEA deployment-related design for community discussion.

**Design Proposal**

Refer to this [OPEA overall architecture design document](24-05-16-OPEA-001-Overall-Design.md).

The proposed OPEA deployment workflow is

<a target="_blank" href="opea_deploy_workflow.png">
chickenrae marked this conversation as resolved.
Show resolved Hide resolved
<img src="opea_deploy_workflow.png" alt="Deployment" width=480 height=310>
</a>

We provide two interfaces for deploying GenAI applications:

1. Docker deployment by python

Here is a python example for constructing a RAG (Retrieval-Augmented Generation) application:

```python
from comps import MicroService, ServiceOrchestrator
class ChatQnAService:
def __init__(self, port=8080):
self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna")
def add_remote_service(self):
embedding = MicroService(
name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True
)
retriever = MicroService(
name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True
)
rerank = MicroService(
name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True
)
llm = MicroService(
name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True
)
self.service_builder.add(embedding).add(retriever).add(rerank).add(llm)
self.service_builder.flow_to(embedding, retriever)
self.service_builder.flow_to(retriever, rerank)
self.service_builder.flow_to(rerank, llm)

```

2. Kubernetes deployment using YAML

Here is a YAML example for constructing a RAG (Retrieval-Augmented Generation) application:

```yaml
opea_micro_services:
embedding:
endpoint: /v1/embeddings
port: 6000
retrieval:
endpoint: /v1/retrieval
port: 7000
reranking:
endpoint: /v1/reranking
port: 8000
llm:
endpoint: /v1/chat/completions
port: 9000

opea_mega_service:
port: 8080
mega_flow:
- embedding >> retrieval >> reranking >> llm

```

When deploying the GenAI application to Kubernetes environment, you should define and convert the YAML configuration file to an appropriate `docker compose`, `kubernetes manifest`, , `kubernetes helm chart` or `[GenAI Microservice Connector-(GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector) custom resource` file.

A sample GMC Custom Resource is like below:
```yaml
apiVersion: gmc.opea.io/v1alpha3
kind: GMConnector
metadata:
labels:
app.kubernetes.io/name: gmconnector
name: chatqna
namespace: gmcsample
spec:
routerConfig:
name: router
serviceName: router-service
nodes:
root:
routerType: Sequence
steps:
- name: Embedding
internalService:
serviceName: embedding-service
config:
endpoint: /v1/embeddings
- name: TeiEmbedding
internalService:
serviceName: tei-embedding-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/tei
modelId: BAAI/bge-base-en-v1.5
endpoint: /embed
isDownstreamService: true
- name: Retriever
data: $response
internalService:
serviceName: retriever-redis-server
config:
RedisUrl: redis-vector-db
IndexName: rag-redis
tei_endpoint: tei-embedding-service
endpoint: /v1/retrieval
- name: VectorDB
internalService:
serviceName: redis-vector-db
isDownstreamService: true
- name: Reranking
data: $response
internalService:
serviceName: reranking-service
config:
tei_reranking_endpoint: tei-reranking-service
gmcTokenSecret: gmc-tokens
endpoint: /v1/reranking
- name: TeiReranking
internalService:
serviceName: tei-reranking-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/rerank
modelId: BAAI/bge-reranker-large
endpoint: /rerank
isDownstreamService: true
- name: Llm
data: $response
internalService:
serviceName: llm-service
config:
tgi_endpoint: tgi-service
gmcTokenSecret: gmc-tokens
endpoint: /v1/chat/completions
- name: Tgi
internalService:
serviceName: tgi-service
config:
gmcTokenSecret: gmc-tokens
hostPath: /root/GMC/data/tgi
modelId: Intel/neural-chat-7b-v3-3
endpoint: /generate
isDownstreamService: true
```
There should be an available `gmconnectors.gmc.opea.io` CR named `chatqna` under the namespace `gmcsample`, showing below:
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

```bash
$kubectl get gmconnectors.gmc.opea.io -n gmcsample
NAME URL READY AGE
chatqa http://router-service.gmcsample.svc.cluster.local:8080 Success 3m
```

And the user can access the application pipeline via the value of `URL` field in above.

The whole deployment process illustrated by the diagram below.
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

<a target="_blank" href="opea_deploy_process.png">
<img src="opea_deploy_process_v2.png" alt="Deployment Process" width=480 height=310>
</a>


**Alternatives Considered**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

n/a

**Compatibility**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

n/a

**Miscs**
chickenrae marked this conversation as resolved.
Show resolved Hide resolved

- TODO List:

- [ ] one click deployment on AWS, GCP, Azure cloud
- [ ] static cloud resource allocator vs dynamic cloud resource allocator
- [ ] k8s GMC with istio



Binary file added community/rfcs/opea_deploy_process_v0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added community/rfcs/opea_deploy_process_v1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added community/rfcs/opea_deploy_process_v2.png
chickenrae marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added community/rfcs/opea_deploy_workflow.png
chickenrae marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.