diff --git a/community/rfcs/24-05-17-OPEA-001-Deployment-Design.md b/community/rfcs/24-05-17-OPEA-001-Deployment-Design.md new file mode 100644 index 00000000..23c1e2b2 --- /dev/null +++ b/community/rfcs/24-05-17-OPEA-001-Deployment-Design.md @@ -0,0 +1,203 @@ +**Author** + +[ftian1](https://github.com/ftian1), [lvliang-intel](https://github.com/lvliang-intel), [hshen14](https://github.com/hshen14), [mkbhanda](https://github.com/mkbhanda), [irisdingbj](https://github.com/irisdingbj), [KfreeZ](https://github.com/kfreez), [zhlsunshine](https://github.com/zhlsunshine) **Edit Here to add your id** + +**Status** + +Under Review + +**Objective** + +Have a clear and good design for users to deploy their own GenAI applications on docker or Kubernetes environment. + + +**Motivation** + +This RFC presents the OPEA deployment-related design for community discussion. + +**Design Proposal** + +Refer to this [OPEA overall architecture design document](24-05-16-OPEA-001-Overall-Design.md). + +The proposed OPEA deployment workflow is + + + Deployment + + +We provide two interfaces for deploying GenAI applications: + +1. Docker deployment by python + + Here is a python example for constructing a RAG (Retrieval-Augmented Generation) application: + + ```python + from comps import MicroService, ServiceOrchestrator + class ChatQnAService: + def __init__(self, port=8080): + self.service_builder = ServiceOrchestrator(port=port, endpoint="/v1/chatqna") + def add_remote_service(self): + embedding = MicroService( + name="embedding", port=6000, expose_endpoint="/v1/embeddings", use_remote_service=True + ) + retriever = MicroService( + name="retriever", port=7000, expose_endpoint="/v1/retrieval", use_remote_service=True + ) + rerank = MicroService( + name="rerank", port=8000, expose_endpoint="/v1/reranking", use_remote_service=True + ) + llm = MicroService( + name="llm", port=9000, expose_endpoint="/v1/chat/completions", use_remote_service=True + ) + self.service_builder.add(embedding).add(retriever).add(rerank).add(llm) + self.service_builder.flow_to(embedding, retriever) + self.service_builder.flow_to(retriever, rerank) + self.service_builder.flow_to(rerank, llm) + + ``` + +2. Kubernetes deployment using YAML + + Here is a YAML example for constructing a RAG (Retrieval-Augmented Generation) application: + + ```yaml + opea_micro_services: + embedding: + endpoint: /v1/embeddings + port: 6000 + retrieval: + endpoint: /v1/retrieval + port: 7000 + reranking: + endpoint: /v1/reranking + port: 8000 + llm: + endpoint: /v1/chat/completions + port: 9000 + + opea_mega_service: + port: 8080 + mega_flow: + - embedding >> retrieval >> reranking >> llm + + ``` +This YAML will be acting as a unified language interface for end user to define their GenAI Application. + +When deploying the GenAI application to Kubernetes environment, you should define and convert the YAML configuration file to an appropriate [docker compose](https://docs.docker.com/compose/), or [GenAI Microservice Connector-(GMC)](https://github.com/opea-project/GenAIInfra/tree/main/microservices-connector) custom resource file. + +Note: A convert tool will be provided for OPEA to convert unified language interface to docker componse or GMC. + +A sample GMC [Custom Resource](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) is like below: + +```yaml + apiVersion: gmc.opea.io/v1alpha3 + kind: GMConnector + metadata: + labels: + app.kubernetes.io/name: gmconnector + name: chatqna + namespace: gmcsample + spec: + routerConfig: + name: router + serviceName: router-service + nodes: + root: + routerType: Sequence + steps: + - name: Embedding + internalService: + serviceName: embedding-service + config: + endpoint: /v1/embeddings + - name: TeiEmbedding + internalService: + serviceName: tei-embedding-service + config: + gmcTokenSecret: gmc-tokens + hostPath: /root/GMC/data/tei + modelId: BAAI/bge-base-en-v1.5 + endpoint: /embed + isDownstreamService: true + - name: Retriever + data: $response + internalService: + serviceName: retriever-redis-server + config: + RedisUrl: redis-vector-db + IndexName: rag-redis + tei_endpoint: tei-embedding-service + endpoint: /v1/retrieval + - name: VectorDB + internalService: + serviceName: redis-vector-db + isDownstreamService: true + - name: Reranking + data: $response + internalService: + serviceName: reranking-service + config: + tei_reranking_endpoint: tei-reranking-service + gmcTokenSecret: gmc-tokens + endpoint: /v1/reranking + - name: TeiReranking + internalService: + serviceName: tei-reranking-service + config: + gmcTokenSecret: gmc-tokens + hostPath: /root/GMC/data/rerank + modelId: BAAI/bge-reranker-large + endpoint: /rerank + isDownstreamService: true + - name: Llm + data: $response + internalService: + serviceName: llm-service + config: + tgi_endpoint: tgi-service + gmcTokenSecret: gmc-tokens + endpoint: /v1/chat/completions + - name: Tgi + internalService: + serviceName: tgi-service + config: + gmcTokenSecret: gmc-tokens + hostPath: /root/GMC/data/tgi + modelId: Intel/neural-chat-7b-v3-3 + endpoint: /generate + isDownstreamService: true +``` +There should be an available `gmconnectors.gmc.opea.io` CR named `chatqna` under the namespace `gmcsample`, showing below: + +```bash +$kubectl get gmconnectors.gmc.opea.io -n gmcsample +NAME URL READY AGE +chatqa http://router-service.gmcsample.svc.cluster.local:8080 Success 3m +``` + +And the user can access the application pipeline via the value of `URL` field in above. + +The whole deployment process illustrated by the diagram below. + + + Deployment Process + + + +**Alternatives Considered** + +[Kserve](https://github.com/kserve/kserve): has provided [InferenceGraph](https://kserve.github.io/website/0.9/modelserving/inference_graph/), however it only supports inference service and lack of deployment support. + + +**Compatibility** + +n/a + +**Miscs** + +- TODO List: + + - [ ] one click deployment on AWS, GCP, Azure cloud + - [ ] static cloud resource allocator vs dynamic cloud resource allocator + - [ ] k8s GMC with istio + diff --git a/community/rfcs/opea_deploy_process_v0.png b/community/rfcs/opea_deploy_process_v0.png new file mode 100644 index 00000000..3178d2d1 Binary files /dev/null and b/community/rfcs/opea_deploy_process_v0.png differ diff --git a/community/rfcs/opea_deploy_process_v1.png b/community/rfcs/opea_deploy_process_v1.png new file mode 100644 index 00000000..6b5cf4af Binary files /dev/null and b/community/rfcs/opea_deploy_process_v1.png differ diff --git a/community/rfcs/opea_deploy_process_v2.png b/community/rfcs/opea_deploy_process_v2.png new file mode 100644 index 00000000..9337451a Binary files /dev/null and b/community/rfcs/opea_deploy_process_v2.png differ diff --git a/community/rfcs/opea_deploy_workflow.png b/community/rfcs/opea_deploy_workflow.png new file mode 100644 index 00000000..18462285 Binary files /dev/null and b/community/rfcs/opea_deploy_workflow.png differ