diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml new file mode 100644 index 00000000..f9a2f517 --- /dev/null +++ b/.pre-commit-config.yaml @@ -0,0 +1,10 @@ +ci: + autofix_prs: true + autoupdate_schedule: quarterly + +repos: + - repo: https://github.com/codespell-project/codespell + rev: v2.2.6 + hooks: + - id: codespell + args: [-w] diff --git a/community/rfc_template.md b/community/rfc_template.md index 115a8d29..a3bbb714 100644 --- a/community/rfc_template.md +++ b/community/rfc_template.md @@ -33,11 +33,11 @@ List other alternatives if have, and corresponding pros/cons to each proposal. ### Compatibility -list possbile incompatible interface or workflow changes if exists. +list possible incompatible interface or workflow changes if exists. ### Miscs -List other informations user and developer may care about, such as: +List other information user and developer may care about, such as: - Performance Impact, such as speed, memory, accuracy. - Engineering Impact, such as binary size, startup time, build time, test times. diff --git a/community/rfcs/24-05-16-001-OPEA-Overall-Design.md b/community/rfcs/24-05-16-001-OPEA-Overall-Design.md index 5b31627a..4ae5cee0 100644 --- a/community/rfcs/24-05-16-001-OPEA-Overall-Design.md +++ b/community/rfcs/24-05-16-001-OPEA-Overall-Design.md @@ -16,7 +16,7 @@ The requirements include but not limited to: have the ability of offer config-based definition or low-code for constructing complex LLM applications. -2. component registery +2. component registry allow user to register new service for building complex GenAI applications @@ -30,7 +30,7 @@ The requirements include but not limited to: **Motivation** -This RFC is used to present the OPEA overall design philosophy, including overall architecture, working flow, componenet design, for community discussion. +This RFC is used to present the OPEA overall design philosophy, including overall architecture, working flow, component design, for community discussion. **Design Proposal** diff --git a/community/rfcs/GenAIExamples-24-05-16-001-Using_MicroService_to_implement_ChatQnA.md b/community/rfcs/GenAIExamples-24-05-16-001-Using_MicroService_to_implement_ChatQnA.md index 9c53c206..498e3ed4 100644 --- a/community/rfcs/GenAIExamples-24-05-16-001-Using_MicroService_to_implement_ChatQnA.md +++ b/community/rfcs/GenAIExamples-24-05-16-001-Using_MicroService_to_implement_ChatQnA.md @@ -5,7 +5,7 @@ Under Review # Objective -This RFC aims to introduce the OPEA microservice design and demonstrate its application to Retrieval-Augmented Generation (RAG). The objective is to address the challenge of designing a flexible architecture for Enterprise AI applicaitons by adopting a microservice approach. This approach facilitates easier deployment, enabling one or multiple microservices to form a megaservice. Each megaservice interfaces with a gateway, allowing users to access services through endpoints exposed by the gateway. The architecture is general and RAG is the first example that we want to apply. +This RFC aims to introduce the OPEA microservice design and demonstrate its application to Retrieval-Augmented Generation (RAG). The objective is to address the challenge of designing a flexible architecture for Enterprise AI applications by adopting a microservice approach. This approach facilitates easier deployment, enabling one or multiple microservices to form a megaservice. Each megaservice interfaces with a gateway, allowing users to access services through endpoints exposed by the gateway. The architecture is general and RAG is the first example that we want to apply. # Motivation diff --git a/framework.md b/framework.md index a3ddc015..a5f1583d 100644 --- a/framework.md +++ b/framework.md @@ -396,7 +396,7 @@ and applying the Linux Foundation licensing considerations._ | ---------- | ----------- | ------------ | -------------------- | | Agent framework | Orchestration software for building and deploying workflows combining information retrieval components with LLMs for building AI agents with contextualized information | Langchain, LlamaIndex, Haystack, Semantic Kernel | Ingest/Data Processing | Software components that can be used to enhance the data that is indexed for retrieval. For example: process, clean, normalization, information extraction, chunking, tokenization, meta data enhancement. | NLTK, spaCY, HF Tokenizers, tiktoken, SparkNLP -| Embedding models/service | Models or services that covert text chunks into embedding vectors to be stored in a vector database | HF Transformers, S-BERT | HF TEI, OpenAI, Cohere, GCP, Azure embedding APIs, JinaAI +| Embedding models/service | Models or services that convert text chunks into embedding vectors to be stored in a vector database | HF Transformers, S-BERT | HF TEI, OpenAI, Cohere, GCP, Azure embedding APIs, JinaAI | Indexing/Vector store | A software for indexing information (sparse/vector) and for retrieving given a query | Elasticsearch, Qdrant, Milvus, ChromaDB, Weaviate, FAISS, Vespa, HNSWLib, SVS, PLAID | Pinecone, Redis | Retrieval/Ranking | A SW component that can re-evaluate existing contexts relevancy order | S-BERT, HF Transformers, Bi/Cross-encoders, ColBERT | Cohere | Prompt engine | A component that creates task specific prompts given queries and contexts, tracks user sessions (maintain history/memory) | Langchain hub @@ -688,7 +688,7 @@ more capabilities than necessary. OWASP container best practices. * High availability * Replication & Data/Instance Protection * Resiliency – time to relaunch an instance when burned down to zero. - * Privides support and instrumentation for enterprise 24/7 support + * Provides support and instrumentation for enterprise 24/7 support * Licensing model and SW Distribution * Scalable from small to large customers * Ability to customize for specific enterprise needs