diff --git a/community/rfcs/24-07-11-OPEA-Agent.md b/community/rfcs/24-07-11-OPEA-Agent.md index 9c8a4571..a006fe4b 100644 --- a/community/rfcs/24-07-11-OPEA-Agent.md +++ b/community/rfcs/24-07-11-OPEA-Agent.md @@ -1,3 +1,11 @@ +# 24-07-11-OPEA-Agent + +Agent + +## Author + +[xuechendi](https://github.com/xuechendi) + ## Status v0.1 team sharing completed(07/10/24) @@ -16,7 +24,7 @@ Single Agent Example: ![image](https://github.com/xuechendi/docs/assets/4355494/02232f5b-8034-44f9-a10c-545a13ec5e40) - + * ‘Multi Agent' system: Multi Agents refer to a design that leveraging a Hierarchical Agent Teams to complete sub-tasks through individual agent working groups. Benefits of multi-agents’ design: (1) Grouping tools/responsibilities can give better results. An agent is more likely to succeed on a focused task than if it must select from dozens of tools. (2) Each agent will have their own assets including prompt, llm model, planning strategy and toolsets. (3) User can easily use yaml files or few lines of python to build a 'Hierarchical Multi Agent' megaservice by cherry-picking ready-to-use individual agents. (4) For small tasks which can be perfectly performed by single Agent, user can directly use 'Agent' microservice with simple/easy resource management. Multi Agent example: @@ -33,17 +41,19 @@ This RFC aims to provide low-code / no-code agents as new microservice / megaser ## Persona -We use the listed terms to define different persona mentioned in this document. +We use the listed terms to define different persona mentioned in this document. * OPEA developer: OPEA developers describe who will follow current OPEA API SPEC or expand OPEA API SPEC to add new solutions. OPEA developers are expected to use this RFC to understand how this microservice communicates with other microservices and chained in megaflow. OPEA developer develops OPEA agent codes and add new Agent Implementation by extending current Agent library with advanced agent strategies. - + * Enterprise User (Devops): Devops describe who will follow OPEA yaml configuration format to update settings according to their real need, or tune some of the configuration to get better performance, who will also use their updated configuration to launch all microservices and get functional endpoint and API calling. Devops are expected to use this RFC to understand the keywords, how these keywords works and rules of using this microservice. Devops are expected to follow customer tool template to provide their own tools and register to Agent microservice. -* End user: End user describe who writes application which will use OPEA exposed endpoints and API to fulfill task goals. End users are expected to use this RFC to understand API keywords and rules. +* End user: End user describe who writes application which will use OPEA exposed endpoints and API to fulfill task goals. End users are expected to use this RFC to understand API keywords and rules. ## Design Proposal - ### Execution Plan + +### Execution Plan + v0.8 (PR ready or merge to opea - agent branch) * Agent component v0.1 * Support chat-completion API @@ -65,14 +75,15 @@ V1.0 * Scaling * Concurrency - ### Part 1. API SPEC +### Part 1. API SPEC + Provide two types of API for different client application. - 1. openAI chat completion API. + 1. openAI chat completion API. > Reference: https://platform.openai.com/docs/api-reference/chat/create - Advantage and limitation: + Advantage and limitation: * Most common API, should be working with any existing client uses openAI. - * will not be able to memorize user historical session, human_in_loop agent will not work using this API. + * will not be able to memorize user historical session, human_in_loop agent will not work using this API. ``` "/v1/chat/completions": { @@ -85,7 +96,7 @@ V1.0 2. openAI assistant API > Reference: https://platform.openai.com/docs/api-reference/assistants - Advantage and limitation: + Advantage and limitation: * User can create a session thread memorizing previous conversation as long-term memory. And Human-In-Loop agent will only works use this API. * User client application may need codes change to work with this new API. * openAI assistant API is tagged with ‘beta’, not stable @@ -97,32 +108,32 @@ V1.0 "name": str, "tools": list } - + # threads API is to used maintain conversation session with one user. It can be resumed from previous, can tracking long term memories. - "/v1/threads/ ": { # empty is allowed } - - + + # threads messages API is to add a task content to thread_1 (the thread created by threads API) - "/v1/threads/thread_1/messages": { "role": str, "content": str } - + # threads run API is to start to execute agent thread using run api - + - "/v1/threads/thread_1/runs": { 'assistant_id': str, 'instructions': str, } ``` - ### Part 2. 'Agent' genAI Component definition +### Part 2. 'Agent' genAI Component definition 'Agent' genAI Component is regarded as the resource management unit in “Agent” design. It will be launched as one microservice and can be instantiated as ‘Agent’, ‘Planner’ or ‘Executor’ according to configuration. Tools will be registered to 'Agent' microservice during launch or runetime. ![image](https://github.com/user-attachments/assets/38e83fa4-57d8-4146-9061-e5153472b5f4) - #### SPEC for any agent Role - agent, planner, executor +#### SPEC for any agent Role - agent, planner, executor ``` "/v1/chat/completions": { "model": str, @@ -145,8 +156,8 @@ V1.0 } ``` - #### Agent Role microservice definition - 'Agent': - A complete implementation of Agent, which contains LLM endpoint as planner, strategy algorithm for plan execution, Tools, and database handler to keep track of historical state and conversation. +#### Agent Role microservice definition - 'Agent': + A complete implementation of Agent, which contains LLM endpoint as planner, strategy algorithm for plan execution, Tools, and database handler to keep track of historical state and conversation. configuration: ``` @@ -157,7 +168,7 @@ V1.0 llm_model_id: str recursion_limit: int tools: file_path or dict - + # Tools definition [tool_name]: description: str @@ -171,8 +182,8 @@ V1.0 return_output: str ``` - #### Agent Role microservice definition - 'Planner': - Agent without tools. Planner only contains LLM endpoints as planner, certain strategies to complete an optimized plan. +#### Agent Role microservice definition - 'Planner': + Agent without tools. Planner only contains LLM endpoints as planner, certain strategies to complete an optimized plan. configuration: ``` @@ -185,7 +196,7 @@ V1.0 require_human_feedback: bool ``` - #### Agent Role microservice definition - 'Executor': +#### Agent Role microservice definition - 'Executor': Tools executors. Executor is used to process input with registered tools. Configuration: @@ -203,7 +214,7 @@ V1.0 ``` > Any microservcice follow this spec can be registered as role in Part3-graph-based - + ### Part3. 'Multi Agent' system overview We planned to provide multi-agent system in two phases. @@ -217,15 +228,16 @@ We planned to provide multi-agent system in two phases. * Phase II: Graph-Based Multi Agent - 1. In this design, we provide user a new SDK to compose a graph-based multi agents system with conditional edge to define all strategic rules. - 2. Enterprise user will be able to use python code to wrap either ‘agent’, ‘planner’ or tools as ‘Role’ and add conditional edges between them for complex task agent design. + 1. In this design, we provide user a new SDK to compose a graph-based multi agents system with conditional edge to define all strategic rules. + 2. Enterprise user will be able to use python code to wrap either ‘agent’, ‘planner’ or tools as ‘Role’ and add conditional edges between them for complex task agent design. 3. This design provides user enough flexibility to handle very complex tasks and also provide flexibility to handle resource management when certain tools are running way slower than others. > Detailed configuration please refer to Part3.2 ![image](https://github.com/user-attachments/assets/35b36f64-eaa1-4f05-b25e-b8bea013680d) #### Part3.1 Hierarchical Multi Agents + __Example 1__: ‘Single Agent megaservice’ -Only 1 agent is presented in this configuration. +Only 1 agent is presented in this configuration. ![image](https://github.com/user-attachments/assets/2e716dd4-2923-4ebd-97bf-fe7a44161280) 3 tools are registered to this agent through custom_tools.yaml @@ -235,7 +247,7 @@ Only 1 agent is presented in this configuration. ![image](https://github.com/user-attachments/assets/ec89e35b-8ccc-474b-9fb7-3ed7210acc10) __Example 2__: ‘Hierarchical Multi Agents’ -3 agents are presented in this configuration, 1st layer supervisor agent is the gateway to interact with user, and 1st layer agent will manage 2nd layer worker agents. +3 agents are presented in this configuration, 1st layer supervisor agent is the gateway to interact with user, and 1st layer agent will manage 2nd layer worker agents. ![image](https://github.com/user-attachments/assets/a83b51e6-ee08-473f-b389-51df48f1054f) @@ -254,7 +266,7 @@ User can also chain agent into a multi-step mega service. audioAgent_megaservice ![image](https://github.com/user-attachments/assets/5fb18d75-9c08-4d7b-97f7-25d7227147dd) #### Part3.2 Graph-Based Multi Agent -In Phase II, we propose to provide a graph-based multi agents system, which enterprise user will be able to define edges and conditional edges between agent nodes, planner nodes and tools for complex task agent design. +In Phase II, we propose to provide a graph-based multi agents system, which enterprise user will be able to define edges and conditional edges between agent nodes, planner nodes and tools for complex task agent design. ![image](https://github.com/user-attachments/assets/7c07e651-43ed-4056-b20a-cd39f3f883ee) @@ -264,7 +276,7 @@ The user can build and launch the graph-based message group by the combination o The yaml file contains the basic config information for each single “Role” in the agent architecture. The user can build a MessageGroup to define the link connection information and the data flow via “edges” and “conditional_edges”. The “edges” mean the output of the head_node is the input of the tail_node. The “conditional_edges” means there is a decision-making among the candidate tail_nodes based on the output of the head_node. The logic of this selection part is defined by the state component “Should_Continue”. ![image](https://github.com/user-attachments/assets/55ecb718-b134-4546-9496-40ac3a427a7b) -Appending agents/roles in MessageGroup. +Appending agents/roles in MessageGroup. Define the role class define the action of the role  add edges  recompile the messagegroup ![image](https://github.com/user-attachments/assets/65a3fc1d-89f3-4bb3-a078-75db91400c58) diff --git a/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md b/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md index afc36425..f6bea8d3 100644 --- a/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md +++ b/community/rfcs/24-08-20-OPEA-001-AI_Gateway_API.md @@ -1,27 +1,25 @@ -## RFC Title +# 24-08-20-OPEA-001-AI Gateway API AI Gateway API -## RFC Content - -### Author +## Author [daixiang0](https://github.com/daixiang0), [zhixie](https://github.com/zhxie), [gyohuangxin](https://github.com/gyohuangxin), [Forrest-zhao](https://github.com/Forrest-zhao), [ruijin-intel](https://github.com/ruijin-intel) -### Status +## Status Under Review -### Objective +## Objective Design the API for AI Gateway. -### Motivation +## Motivation - Introduce gateway to do mTLS, traffic control, observability and so on - Introduce AI Gateway API to use existing gateway sloutions rather than implement our own one. -### Design Proposal +## Design Proposal The AI gateway is at the front of all microservices: @@ -34,7 +32,7 @@ graph TD; A-->B(Any microservice); ``` -#### API overall +### API overall To make the most of current resources, we choose to follow [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) since it is the gateway API standard that all gateways support. @@ -43,7 +41,7 @@ Since AI specific features of Kubernetes Gateway API are still [under discussion - **Kubernetes Gateway API** for features it already supports - **Extension API for** all other features -#### API workflow +### API workflow ```mermaid graph LR; @@ -52,7 +50,7 @@ graph LR; AI Gateway is not a brand-new gateway implementation, only does one thing: Convert. -#### Extension API +### Extension API ```yaml apiVersion: extension.gateway.opea.dev/v1 @@ -74,7 +72,7 @@ spec: - name: the name of extension feature, support multiple extensions - config: the content of extension config, following specified gateway API -#### Extension API example +### Extension API example ```yaml diff --git a/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md b/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md index 76e097f3..22f8d09f 100644 --- a/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md +++ b/community/rfcs/24-08-21-GenAIExample-002-Edge_Craft_RAG.md @@ -1,25 +1,23 @@ -# Edge Craft RAG +# 24-08-21-GenAIExample-002-Edge Craft RAG This RFC describes a solution of a tunable RAG for edge scenarios. -## RFC Content - -### Author +## Author [myqi](https://github.com/myqi) -### Status +## Status Under Review -### Objective +## Objective -Edge industry users are facing obstacles to build an "out-of-the-box" RAG +Edge industry users are facing obstacles to build an "out-of-the-box" RAG application to meet both quality and performance requirements. Total Cost of Ownership(TCO) and pipeline optimization techniques are the two main reasons to block this process. -#### Total Cost of Ownership +### Total Cost of Ownership The HW requirement of a typical edge use case is a single host with one of the following combinations: @@ -28,12 +26,12 @@ following combinations: - Intel(R) Core(TM) Processor + Intel(R) Arc(TM) A-Series Graphics - Intel(R) Xeon(R) Processor + Intel(R) Arc(TM) A-Series Graphics -The scenarios with these hardware options block the edge users from using large +The scenarios with these hardware options block the edge users from using large parameter size LLMs on-prem as well as sophisticated RAG pipeline for their data. Thus, the RAG pipeline at edge needs to be highly curated for underlying hardwares and suitable models accordingly. -#### RAG Pipeline Optimization Techniques +### RAG Pipeline Optimization Techniques Tuning RAG pipeline is a systematic problem. First, the quality depends on the result of each stage in the pipeline as well as the end-to-end outcome. Second, @@ -43,7 +41,7 @@ relevant to the query. Third, the optimization techniques may not intuitively reflect to metrics improvements. E.g., recrusive retrieval may contribute to improving the recall and context relevancy, or may not. -### Motivation +## Motivation Edge Craft RAG (EC-RAG) is a customizable, tunable and production-ready Retrieval-Augmented Generation system for edge solutions. It is designed to @@ -57,7 +55,7 @@ overhead of inter-service communication on a single host. Meanwhile, the inferen stages like embedding, reranking and generation are optimized for Intel(R) Iris(R) Xe Graphics and Intel(R) Arc(TM) A-Series Graphics. -### Design Proposal +## Design Proposal EC-RAG is composed of the following components: - UI for doc loading and interactive chatbot. @@ -80,7 +78,7 @@ The EC-RAG pipeline will expose 3 types of REST API endpoint: - **/v1/settings** for configuration - **/v1/chatqna** for inferencing -#### /v1/data +### /v1/data | Description | Action | Endpoint | Data Schema | | ------------- | ------ | ------------- | ------------------ | @@ -88,7 +86,7 @@ The EC-RAG pipeline will expose 3 types of REST API endpoint: | List files | GET | /v1/data | | | Remove | DELETE | /v1/data/{id} | | -#### /v1/settings/pipelines +### /v1/settings/pipelines | Description | Action | Endpoint | Data Schema | | ------------------ | ------ | ----------------------------- | ------------------ | @@ -106,7 +104,7 @@ The EC-RAG pipeline will expose 3 types of REST API endpoint: | Update models | PATCH | /v1/settings/models/{id} | Model object | | Remove a model | DELETE | /v1/settings/models/{id} | | -### Pipeline configuration example +## Pipeline configuration example ```json { @@ -151,7 +149,7 @@ The EC-RAG pipeline will expose 3 types of REST API endpoint: } ``` -#### UI +### UI The EC-RAG UI is gradio. The user is able to select the models as well as input parameters in different stages for the pipeline. The chatbox is also integrated @@ -163,17 +161,17 @@ EC-RAG UI - Model Condiguration EC-RAG UI - Chatbot with settings ![EC-RAG UI Chatbot](Edge_Craft_RAG_screenshot_2.png) -### Compatibility +## Compatibility EC-RAG megaservice and microservice are compatible with the existing OPEA GenAIExamples and GenAIComps repos. The EC-RAG leverages the LLM microservice and the VectorDB microservice from GenAIComps. -### Miscellaneous +## Miscellaneous The EC-RAG will be developed in 2 phases. -#### Phase 1 +### Phase 1 The UI, gateway, and EC-RAG pipeline will be finished without Vector DB as persistent DB. Instead, FAISS will be used for vector search and keep vector @@ -183,7 +181,7 @@ In this phase, the LLM inferencing will happen in the pipeline until the LLM serving microservice supports Intel(R) Iris(R) Xe Graphics and Intel(R) Arc(TM) A-Series Graphics. -#### Phase 2 +### Phase 2 The vector DB will be enabled in this phase as well as LLM inferencing on Intel(R) Iris(R) Xe Graphics and Intel(R) Arc(TM) A-Series Graphics.