From 2ff288bf3c47c63e3a65d39dd2686e9c4e20e0b8 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Tue, 24 Sep 2024 15:19:27 +0200 Subject: [PATCH 01/27] feat(inference): add tutorial how to implement RAG with managed inference --- tutorials/how-to-implement-rag/index.mdx | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 tutorials/how-to-implement-rag/index.mdx diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx new file mode 100644 index 0000000000..a5c48231e2 --- /dev/null +++ b/tutorials/how-to-implement-rag/index.mdx @@ -0,0 +1,17 @@ +--- +meta: + title: How to implement RAG with managed inference + description: +--- + +RAG (Retrieval-Augmented Generation) is a powerful approach for enhancing a model's knowledge by leveraging your own dataset. +Scaleway's robust infrastructure makes it easier than ever to implement RAG, as our products are fully compatible with LangChain, especially the OpenAI integration. +By utilizing our managed inference services, managed databases, and object storage, you can effortlessly build and deploy a customized model tailored to your specific needs. + + +- A Scaleway account logged into the [console](https://console.scaleway.com) +- [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/): Set up an inference deployment using [sentence-transformers/sentence-t5-xxl](/ai-data/managed-inference/reference-content/sentence-t5-xxl/) on an L4 instance to efficiently process embeddings. +- [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/) with the model of your choice. +- [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model. +- [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. \ No newline at end of file From 73f71c2438782595c4677ca4a2cac1caebc1b70a Mon Sep 17 00:00:00 2001 From: Laure-di Date: Tue, 24 Sep 2024 16:10:24 +0200 Subject: [PATCH 02/27] configure dev env part --- tutorials/how-to-implement-rag/index.mdx | 45 +++++++++++++++++++++++- 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index a5c48231e2..7413da1470 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -2,6 +2,11 @@ meta: title: How to implement RAG with managed inference description: +content: + h1: How to implement RAG with managed inference +tags: inference managed postgresql pgvector object storage +categories: + - inference --- RAG (Retrieval-Augmented Generation) is a powerful approach for enhancing a model's knowledge by leveraging your own dataset. @@ -14,4 +19,42 @@ By utilizing our managed inference services, managed databases, and object stora - [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/): Set up an inference deployment using [sentence-transformers/sentence-t5-xxl](/ai-data/managed-inference/reference-content/sentence-t5-xxl/) on an L4 instance to efficiently process embeddings. - [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/) with the model of your choice. - [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model. -- [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. \ No newline at end of file +- [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. + +## Configure your developement environnement +1. Install necessary packages: run the following command to install the required packages: + ```sh + pip install langchain psycopg2 python-dotenv scaleway + ``` +2. Configure your environnement variables: create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. + ```sh + # .env file + + # Scaleway API credentials + SCW_ACCESS_KEY=your_scaleway_access_key + SCW_SECRET_KEY=your_scaleway_secret_key + SCW_API_KEY=your_scaleway_api_key + + # Scaleway project and region + SCW_DEFAULT_PROJECT_ID=your_scaleway_project_id + SCW_DEFAULT_REGION=your_scaleway_region + + # Scaleway managed database (PostgreSQL) credentials + SCW_DB_NAME=your_scaleway_managed_db_name + SCW_DB_USER=your_scaleway_managed_db_username + SCW_DB_PASSWORD=your_scaleway_managed_db_password + SCW_DB_HOST=your_scaleway_managed_db_host # The IP address of your database instance + SCW_DB_PORT=your_scaleway_managed_db_port # The port number for your database instance + + # Scaleway S3 bucket configuration + SCW_BUCKET_NAME=your_scaleway_bucket_name + SCW_BUCKET_ENDPOINT=your_scaleway_bucket_endpoint # S3 endpoint, e.g., https://s3.fr-par.scw.cloud + + # Scaleway Inference API configuration (Embeddings) + SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment + SCW_INFERENCE_API_KEY_EMBEDDINGS=your_scaleway_api_key_for_embeddings + + # Scaleway Inference API configuration (LLM deployment) + SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment + SCW_INFERENCE_API_KEY=your_scaleway_api_key_for_inference_deployment + ``` \ No newline at end of file From 83490a3031bcff3bddf9a79c43eb1af4b6b89a14 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Tue, 24 Sep 2024 16:33:20 +0200 Subject: [PATCH 03/27] Managed database setup --- tutorials/how-to-implement-rag/index.mdx | 99 ++++++++++++++++-------- 1 file changed, 65 insertions(+), 34 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 7413da1470..23679f7645 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -1,10 +1,10 @@ --- meta: title: How to implement RAG with managed inference - description: + description: Learn how to implement Retrieval-Augmented Generation (RAG) using Scaleway's managed inference, PostgreSQL, pgvector, and object storage. content: h1: How to implement RAG with managed inference -tags: inference managed postgresql pgvector object storage +tags: inference, managed, postgresql, pgvector, object storage categories: - inference --- @@ -14,6 +14,7 @@ Scaleway's robust infrastructure makes it easier than ever to implement RAG, as By utilizing our managed inference services, managed databases, and object storage, you can effortlessly build and deploy a customized model tailored to your specific needs. + - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization - [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/): Set up an inference deployment using [sentence-transformers/sentence-t5-xxl](/ai-data/managed-inference/reference-content/sentence-t5-xxl/) on an L4 instance to efficiently process embeddings. @@ -21,40 +22,70 @@ By utilizing our managed inference services, managed databases, and object stora - [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model. - [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. -## Configure your developement environnement +## Configure your development environment + 1. Install necessary packages: run the following command to install the required packages: + ```sh pip install langchain psycopg2 python-dotenv scaleway ``` -2. Configure your environnement variables: create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. +2. Configure your environment variables: create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. + ```sh - # .env file - - # Scaleway API credentials - SCW_ACCESS_KEY=your_scaleway_access_key - SCW_SECRET_KEY=your_scaleway_secret_key - SCW_API_KEY=your_scaleway_api_key - - # Scaleway project and region - SCW_DEFAULT_PROJECT_ID=your_scaleway_project_id - SCW_DEFAULT_REGION=your_scaleway_region - - # Scaleway managed database (PostgreSQL) credentials - SCW_DB_NAME=your_scaleway_managed_db_name - SCW_DB_USER=your_scaleway_managed_db_username - SCW_DB_PASSWORD=your_scaleway_managed_db_password - SCW_DB_HOST=your_scaleway_managed_db_host # The IP address of your database instance - SCW_DB_PORT=your_scaleway_managed_db_port # The port number for your database instance - - # Scaleway S3 bucket configuration - SCW_BUCKET_NAME=your_scaleway_bucket_name - SCW_BUCKET_ENDPOINT=your_scaleway_bucket_endpoint # S3 endpoint, e.g., https://s3.fr-par.scw.cloud - - # Scaleway Inference API configuration (Embeddings) - SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment - SCW_INFERENCE_API_KEY_EMBEDDINGS=your_scaleway_api_key_for_embeddings - - # Scaleway Inference API configuration (LLM deployment) - SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment - SCW_INFERENCE_API_KEY=your_scaleway_api_key_for_inference_deployment - ``` \ No newline at end of file + # .env file + + # Scaleway API credentials + SCW_ACCESS_KEY=your_scaleway_access_key + SCW_SECRET_KEY=your_scaleway_secret_key + SCW_API_KEY=your_scaleway_api_key + + # Scaleway project and region + SCW_DEFAULT_PROJECT_ID=your_scaleway_project_id + SCW_DEFAULT_REGION=your_scaleway_region + + # Scaleway managed database (PostgreSQL) credentials + SCW_DB_NAME=your_scaleway_managed_db_name + SCW_DB_USER=your_scaleway_managed_db_username + SCW_DB_PASSWORD=your_scaleway_managed_db_password + SCW_DB_HOST=your_scaleway_managed_db_host # The IP address of your database instance + SCW_DB_PORT=your_scaleway_managed_db_port # The port number for your database instance + + # Scaleway S3 bucket configuration + SCW_BUCKET_NAME=your_scaleway_bucket_name + SCW_BUCKET_ENDPOINT=your_scaleway_bucket_endpoint # S3 endpoint, e.g., https://s3.fr-par.scw.cloud + + # Scaleway Inference API configuration (Embeddings) + SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment + SCW_INFERENCE_API_KEY_EMBEDDINGS=your_scaleway_api_key_for_embeddings + + # Scaleway Inference API configuration (LLM deployment) + SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment + SCW_INFERENCE_API_KEY=your_scaleway_api_key_for_inference_deployment + ``` + +### Set Up Managed Database + +1. Connect to your PostgreSQL instance and install the pg_vector extension. + + ```python + conn = psycopg2.connect( + database="your_database_name", + user="your_db_user", + password=os.getenv("SCW_DB_PASSWORD"), + host="your_db_host", + port="your_db_port" + ) + + cur = conn.cursor() + + # Install pg_vector extension + cur.execute("CREATE EXTENSION IF NOT EXISTS vector;") + conn.commit() + ``` +2. To avoid reprocessing documents that have already been loaded and vectorized, create a table in your PostgreSQL database to track them. This ensures that new documents added to your object storage bucket are processed only once, preventing duplicate downloads and redundant vectorization. + + ```python + cur.execute("CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT)") + conn.commit() + ``` + From ca9beb9c83447fa8c5ebcd51465fb7acf793f17e Mon Sep 17 00:00:00 2001 From: Laure-di Date: Tue, 24 Sep 2024 17:30:07 +0200 Subject: [PATCH 04/27] last part of the first version --- tutorials/how-to-implement-rag/index.mdx | 84 ++++++++++++++++++++++-- 1 file changed, 77 insertions(+), 7 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 23679f7645..a97d8e24c8 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -4,14 +4,14 @@ meta: description: Learn how to implement Retrieval-Augmented Generation (RAG) using Scaleway's managed inference, PostgreSQL, pgvector, and object storage. content: h1: How to implement RAG with managed inference -tags: inference, managed, postgresql, pgvector, object storage +tags: inference, managed, postgresql, pgvector, object storage, RAG categories: - inference --- -RAG (Retrieval-Augmented Generation) is a powerful approach for enhancing a model's knowledge by leveraging your own dataset. -Scaleway's robust infrastructure makes it easier than ever to implement RAG, as our products are fully compatible with LangChain, especially the OpenAI integration. -By utilizing our managed inference services, managed databases, and object storage, you can effortlessly build and deploy a customized model tailored to your specific needs. +Retrieval-Augmented Generation (RAG) enhances the power of language models by enabling them to retrieve relevant information from external datasets. In this tutorial, we’ll implement RAG using Scaleway’s Managed Inference, PostgreSQL, pgvector, and Scaleway’s Object Storage. + +With Scaleway's fully managed services, integrating RAG becomes a streamlined process. You'll use a sentence transformer for embedding text, store embeddings in a PostgreSQL database with pgvector, and leverage object storage for scalable data management. @@ -56,16 +56,14 @@ By utilizing our managed inference services, managed databases, and object stora # Scaleway Inference API configuration (Embeddings) SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment - SCW_INFERENCE_API_KEY_EMBEDDINGS=your_scaleway_api_key_for_embeddings # Scaleway Inference API configuration (LLM deployment) SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment - SCW_INFERENCE_API_KEY=your_scaleway_api_key_for_inference_deployment ``` ### Set Up Managed Database -1. Connect to your PostgreSQL instance and install the pg_vector extension. +1. Connect to your PostgreSQL instance and install the pgvector extension, which is used for storing high-dimensional embeddings. ```python conn = psycopg2.connect( @@ -89,3 +87,75 @@ By utilizing our managed inference services, managed databases, and object stora conn.commit() ``` +### Set Up Document Loaders for Object Storage + +```python + document_loader = S3DirectoryLoader( + bucket=os.getenv('SCW_BUCKET_NAME'), + endpoint_url=os.getenv('SCW_BUCKET_ENDPOINT'), + aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), + aws_secret_access_key=os.getenv("SCW_SECRET_KEY") + ) + +``` + +### Embeddings and Vector Store Setup + +We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. + +```python + embeddings = OpenAIEmbeddings( + openai_api_key=os.getenv("SCW_API_KEY"), + openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"), + model="sentence-transformers/sentence-t5-xxl", + ) + + connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}" + vector_store = PGVector(connection=connection_string, embeddings=embeddings) +``` + +### Load and Process Documents + +Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. + +```python + files = document_loader.lazy_load() + text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20) + + for file in files: + cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (file.metadata["source"],)) + if cur.fetchone() is None: + fileLoader = S3FileLoader( + bucket=os.getenv(), + key=file.metadata["source"].split("/")[-1], + endpoint_url=endpoint_s3, + aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), + aws_secret_access_key=os.getenv("SCW_SECRET_KEY") + ) + file_to_load = fileLoader.load() + chunks = text_splitter.split_text(file.page_content) + + embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks] + for chunk, embedding in zip(chunks, embeddings_list): + vector_store.add_embeddings(embedding, chunk) +``` + +### Query the RAG System + +Now, set up the RAG system to handle queries using RetrievalQA and the LLM. + +```python + retriever = vector_store.as_retriever(search_kwargs={"k": 3}) + llm = ChatOpenAI( + base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"), + api_key=os.getenv("SCW_API_KEY"), + model=deployment.model_name, + ) + + qa_stuff = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever) + + query = "What are the commands to set up a database with the CLI of Scaleway?" + response = qa_stuff.invoke(query) + + print(response['result']) +``` From d9f4ef254e44e6203a08a4954163d0389f302b6d Mon Sep 17 00:00:00 2001 From: Laure-di Date: Thu, 26 Sep 2024 10:57:37 +0200 Subject: [PATCH 05/27] add more informations on the tuto --- tutorials/how-to-implement-rag/index.mdx | 60 +++++++++++++++++++++++- 1 file changed, 59 insertions(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index a97d8e24c8..f0519f4bd6 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -80,6 +80,8 @@ With Scaleway's fully managed services, integrating RAG becomes a streamlined pr cur.execute("CREATE EXTENSION IF NOT EXISTS vector;") conn.commit() ``` +The command above ensures that pgvector is installed on your database if it hasn't been already. + 2. To avoid reprocessing documents that have already been loaded and vectorized, create a table in your PostgreSQL database to track them. This ensures that new documents added to your object storage bucket are processed only once, preventing duplicate downloads and redundant vectorization. ```python @@ -89,6 +91,8 @@ With Scaleway's fully managed services, integrating RAG becomes a streamlined pr ### Set Up Document Loaders for Object Storage +The document loader pulls documents from your Scaleway Object Storage bucket. This loader will retrieve the contents of each document for further processing.s + ```python document_loader = S3DirectoryLoader( bucket=os.getenv('SCW_BUCKET_NAME'), @@ -101,25 +105,55 @@ With Scaleway's fully managed services, integrating RAG becomes a streamlined pr ### Embeddings and Vector Store Setup -We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. +1. We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. ```python embeddings = OpenAIEmbeddings( openai_api_key=os.getenv("SCW_API_KEY"), openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"), model="sentence-transformers/sentence-t5-xxl", + tiktoken_enabled=False, ) +``` + +Key Parameters: +- openai_api_key: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference. +- openai_api_base: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings. +- model="sentence-transformers/sentence-t5-xxl": This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems. +- tiktoken_enabled=False: This is an important parameter, which disables the use of TikToken for tokenization within the embeddings process. + +What is tiktoken_enabled? + +tiktoken is a tokenization library developed by OpenAI, which is optimized for working with GPT-based models (like GPT-3.5 or GPT-4). It transforms text into smaller token units that the model can process. + +Why set tiktoken_enabled=False? + +In the context of using Scaleway’s Managed Inference and the sentence-t5-xxl model, TikToken tokenization is not necessary because the model you are using (sentence-transformers) works with raw text and handles its own tokenization internally. +Moreover, leaving tiktoken_enabled as True causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. +By setting tiktoken_enabled=False, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure. + +2. Next, configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings. + +```python connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}" vector_store = PGVector(connection=connection_string, embeddings=embeddings) ``` +PGVector: This creates the vector store in your PostgreSQL database to store the embeddings. + ### Load and Process Documents Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. +1. Lazy loadings documents: This method is designed to efficiently load and process documents one by one from Scaleway Object Storage. Instead of loading all documents at once, it loads them lazily, allowing us to inspect each file before deciding whether to embed it. ```python files = document_loader.lazy_load() +``` +Why lazy loading? +The key reason for using lazy loading here is to avoid reprocessing documents that have already been embedded. In the context of Retrieval-Augmented Generation (RAG), reprocessing the same document multiple times is redundant and inefficient. Lazy loading enables us to check if a document has already been embedded (by querying the database) before actually loading and embedding it. + +```python text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20) for file in files: @@ -140,6 +174,23 @@ Use the S3FileLoader to load documents and split them into chunks. Then, embed a vector_store.add_embeddings(embedding, chunk) ``` +- S3FileLoader: Loads each file individually from the object storage bucket. +- RecursiveCharacterTextSplitter: Splits the document into smaller text chunks. This is important for embedding, as models typically work better with smaller chunks of text. +- embeddings_list: Stores the embeddings for each chunk. +- vector_store.add_embeddings(): Stores each chunk and its corresponding embedding in the PostgreSQL vector store. + +The code iterates over each file retrieved from object storage using lazy loading. +For each file, a query is made to check if its corresponding object_key (a unique identifier from the file metadata) exists in the object_loaded table in PostgreSQL. +If the document has already been processed and embedded (i.e., the object_key is found in the database), the system skips loading the file and moves on to the next one. +If the document is new (not yet embedded), the file is fully loaded and processed. + +This approach ensures that only new or modified documents are loaded into memory and embedded, saving significant computational resources and reducing redundant work. + +Why store both chunk and embedding? + +Storing both the chunk and its corresponding embedding allows for efficient document retrieval later. +When a query is made, the RAG system will retrieve the most relevant embeddings, and the corresponding text chunks will be used to generate the final response. + ### Query the RAG System Now, set up the RAG system to handle queries using RetrievalQA and the LLM. @@ -159,3 +210,10 @@ Now, set up the RAG system to handle queries using RetrievalQA and the LLM. print(response['result']) ``` + + +### Conclusion + +This step is essential for efficiently processing and storing large document datasets for RAG. By using lazy loading, the system handles large datasets without overwhelming memory, while chunking ensures that each document is processed in a way that maximizes the performance of the LLM. The embeddings are stored in PostgreSQL via pgvector, allowing for fast and scalable retrieval when responding to user queries. + +By combining Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you can implement a powerful RAG system that scales with your data and offers robust information retrieval capabilities. \ No newline at end of file From d1c41bc155be2bbca14e51656c23b774a4b7d94e Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 27 Sep 2024 14:32:55 +0200 Subject: [PATCH 06/27] fix title --- tutorials/how-to-implement-rag/index.mdx | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index f0519f4bd6..130ecd5738 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -1,24 +1,33 @@ --- meta: - title: How to implement RAG with managed inference - description: Learn how to implement Retrieval-Augmented Generation (RAG) using Scaleway's managed inference, PostgreSQL, pgvector, and object storage. + title: Step-by-Step Guide Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference + description: Master Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference content: - h1: How to implement RAG with managed inference -tags: inference, managed, postgresql, pgvector, object storage, RAG + h1: Step-by-Step Guide Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference +tags: inference managed postgresql pgvector object storage RAG langchain machine learning AI language models categories: - inference --- -Retrieval-Augmented Generation (RAG) enhances the power of language models by enabling them to retrieve relevant information from external datasets. In this tutorial, we’ll implement RAG using Scaleway’s Managed Inference, PostgreSQL, pgvector, and Scaleway’s Object Storage. +Retrieval-Augmented Generation (RAG) supercharges language models by enabling real-time retrieval of relevant information from external datasets. This hybrid approach boosts both the accuracy and contextual relevance of model outputs, making it essential for advanced AI applications. -With Scaleway's fully managed services, integrating RAG becomes a streamlined process. You'll use a sentence transformer for embedding text, store embeddings in a PostgreSQL database with pgvector, and leverage object storage for scalable data management. +In this comprehensive guide, you'll learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We'll combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management. + +***Why LangChain?*** +LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. + +***What You’ll Learn:*** +How to embed text using a sentence transformer using ***Scaleway Manage Inference*** +How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector +How to manage large datasets efficiently with ***Scaleway Object Storage*** - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization +- A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) - [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/): Set up an inference deployment using [sentence-transformers/sentence-t5-xxl](/ai-data/managed-inference/reference-content/sentence-t5-xxl/) on an L4 instance to efficiently process embeddings. -- [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/) with the model of your choice. +- [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/) with the large language model of your choice. - [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model. - [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. From 969b299f533118f42765d4895f2f3b0ba61ee450 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 27 Sep 2024 16:03:22 +0200 Subject: [PATCH 07/27] fix: remove unused env var and add missing one in code --- tutorials/how-to-implement-rag/index.mdx | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 130ecd5738..6942ac4285 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -45,12 +45,7 @@ How to manage large datasets efficiently with ***Scaleway Object Storage*** # Scaleway API credentials SCW_ACCESS_KEY=your_scaleway_access_key - SCW_SECRET_KEY=your_scaleway_secret_key - SCW_API_KEY=your_scaleway_api_key - - # Scaleway project and region - SCW_DEFAULT_PROJECT_ID=your_scaleway_project_id - SCW_DEFAULT_REGION=your_scaleway_region + SCW_API_KEY=your_scaleway_secret_ke # Scaleway managed database (PostgreSQL) credentials SCW_DB_NAME=your_scaleway_managed_db_name @@ -76,11 +71,11 @@ How to manage large datasets efficiently with ***Scaleway Object Storage*** ```python conn = psycopg2.connect( - database="your_database_name", - user="your_db_user", + database=os.getenv("SCW_DB_NAME"), + user=os.getenv("SCW_DB_USER"), password=os.getenv("SCW_DB_PASSWORD"), - host="your_db_host", - port="your_db_port" + host=os.getenv("SCW_DB_HOST"), + port=os.getenv("SCW_DB_PORT") ) cur = conn.cursor() @@ -107,7 +102,7 @@ The document loader pulls documents from your Scaleway Object Storage bucket. Th bucket=os.getenv('SCW_BUCKET_NAME'), endpoint_url=os.getenv('SCW_BUCKET_ENDPOINT'), aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), - aws_secret_access_key=os.getenv("SCW_SECRET_KEY") + aws_secret_access_key=os.getenv("SCW_API_KEY") ) ``` @@ -173,7 +168,7 @@ The key reason for using lazy loading here is to avoid reprocessing documents th key=file.metadata["source"].split("/")[-1], endpoint_url=endpoint_s3, aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), - aws_secret_access_key=os.getenv("SCW_SECRET_KEY") + aws_secret_access_key=os.getenv("SCW_API_KEY") ) file_to_load = fileLoader.load() chunks = text_splitter.split_text(file.page_content) From 08218baf0e3bbb62d98427158551d347cf889b40 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 27 Sep 2024 16:21:24 +0200 Subject: [PATCH 08/27] database management improve --- tutorials/how-to-implement-rag/index.mdx | 53 ++++++++++++++++-------- 1 file changed, 35 insertions(+), 18 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 6942ac4285..ebda01eb1b 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -36,7 +36,7 @@ How to manage large datasets efficiently with ***Scaleway Object Storage*** 1. Install necessary packages: run the following command to install the required packages: ```sh - pip install langchain psycopg2 python-dotenv scaleway + pip install langchain psycopg2 python-dotenv ``` 2. Configure your environment variables: create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. @@ -67,31 +67,48 @@ How to manage large datasets efficiently with ***Scaleway Object Storage*** ### Set Up Managed Database -1. Connect to your PostgreSQL instance and install the pgvector extension, which is used for storing high-dimensional embeddings. +To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as psql. The following steps will guide you through setting up your database to handle vector storage and document tracking. - ```python - conn = psycopg2.connect( +1. Install the pgvector extension +pgvector is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command: + +```sql + CREATE EXTENSION IF NOT EXISTS vector; +``` +2. Create a table to track processed documents +To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization: + +```sql + CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT; +``` + +3. Connect to PostgreSQL programmatically using Python +You can also connect to your PostgreSQL instance and perform the same tasks programmatically. + + ```python + # rag.py file + +from dotenv import load_dotenv +import psycopg2 +import os + +# Load environment variables +load_dotenv() + +# Establish connection to PostgreSQL database using environment variables +conn = psycopg2.connect( database=os.getenv("SCW_DB_NAME"), user=os.getenv("SCW_DB_USER"), password=os.getenv("SCW_DB_PASSWORD"), host=os.getenv("SCW_DB_HOST"), port=os.getenv("SCW_DB_PORT") - ) +) - cur = conn.cursor() - - # Install pg_vector extension - cur.execute("CREATE EXTENSION IF NOT EXISTS vector;") - conn.commit() - ``` -The command above ensures that pgvector is installed on your database if it hasn't been already. +# Create a cursor to execute SQL commands +cur = conn.cursor() + ``` -2. To avoid reprocessing documents that have already been loaded and vectorized, create a table in your PostgreSQL database to track them. This ensures that new documents added to your object storage bucket are processed only once, preventing duplicate downloads and redundant vectorization. - ```python - cur.execute("CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT)") - conn.commit() - ``` ### Set Up Document Loaders for Object Storage @@ -164,7 +181,7 @@ The key reason for using lazy loading here is to avoid reprocessing documents th cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (file.metadata["source"],)) if cur.fetchone() is None: fileLoader = S3FileLoader( - bucket=os.getenv(), + bucket=os.getenv("SCW_BUCKET_NAME"), key=file.metadata["source"].split("/")[-1], endpoint_url=endpoint_s3, aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), From 6f66eda7e3cd562ebea16840aa390508e715ac68 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 27 Sep 2024 16:31:49 +0200 Subject: [PATCH 09/27] develop Document loader --- tutorials/how-to-implement-rag/index.mdx | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index ebda01eb1b..3939174dea 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -112,9 +112,28 @@ cur = conn.cursor() ### Set Up Document Loaders for Object Storage -The document loader pulls documents from your Scaleway Object Storage bucket. This loader will retrieve the contents of each document for further processing.s +In this section, we will use LangChain to load documents stored in your Scaleway Object Storage bucket. The document loader retrieves the contents of each document for further processing, such as vectorization or embedding generation. + +1. Storing Data for RAG +Ensure that all the documents and data you want to inject into your Retrieval-Augmented Generation (RAG) system are stored in this Scaleway Object Storage bucket. These could include text files, PDFs, or any other format that will be processed and vectorized in the following steps. + +2. Import Required Modules +Before setting up the document loader, you need to import the necessary modules from LangChain and other libraries. Here's how to do that: + +```python +# rag.py + +from langchain.document_loaders import S3DirectoryLoader +import os +``` + +3. Set Up the Document Loader +The S3DirectoryLoader class, part of LangChain, is specifically designed to load documents from S3-compatible storage (in this case, Scaleway Object Storage). +Now, let’s configure the document loader to pull files from your Scaleway Object Storage bucket using the appropriate credentials and environment variables: ```python +#rag.py + document_loader = S3DirectoryLoader( bucket=os.getenv('SCW_BUCKET_NAME'), endpoint_url=os.getenv('SCW_BUCKET_ENDPOINT'), @@ -124,6 +143,8 @@ The document loader pulls documents from your Scaleway Object Storage bucket. Th ``` +This section highlights that you're leveraging LangChain’s document loader capabilities to connect directly to your Scaleway Object Storage. LangChain simplifies the process of integrating external data sources, allowing you to focus on building a RAG system without handling low-level integration details. + ### Embeddings and Vector Store Setup 1. We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. From a04d4c3244bc469886a2e2750d9bab8ded670f31 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 27 Sep 2024 16:35:43 +0200 Subject: [PATCH 10/27] format --- tutorials/how-to-implement-rag/index.mdx | 52 +++++++++++++++--------- 1 file changed, 33 insertions(+), 19 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 3939174dea..8fa3e5003a 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -13,16 +13,19 @@ Retrieval-Augmented Generation (RAG) supercharges language models by enabling re In this comprehensive guide, you'll learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We'll combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management. -***Why LangChain?*** +#### Why LangChain? LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. -***What You’ll Learn:*** +#### What You’ll Learn: How to embed text using a sentence transformer using ***Scaleway Manage Inference*** How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector How to manage large datasets efficiently with ***Scaleway Object Storage*** +## Before you start + +To complete the actions presented below, you must have: - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) @@ -79,7 +82,7 @@ pgvector is essential for storing and indexing high-dimensional vectors, which a To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization: ```sql - CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT; + CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT); ``` 3. Connect to PostgreSQL programmatically using Python @@ -132,7 +135,7 @@ The S3DirectoryLoader class, part of LangChain, is specifically designed to load Now, let’s configure the document loader to pull files from your Scaleway Object Storage bucket using the appropriate credentials and environment variables: ```python -#rag.py +# rag.py document_loader = S3DirectoryLoader( bucket=os.getenv('SCW_BUCKET_NAME'), @@ -146,10 +149,19 @@ Now, let’s configure the document loader to pull files from your Scaleway Obje This section highlights that you're leveraging LangChain’s document loader capabilities to connect directly to your Scaleway Object Storage. LangChain simplifies the process of integrating external data sources, allowing you to focus on building a RAG system without handling low-level integration details. ### Embeddings and Vector Store Setup +1. Import the required module +```python +# rag.py -1. We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. +from langchain_openai import OpenAIEmbeddings +from langchain_postgres import PGVector +``` + +2. We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. ```python +# rag.py + embeddings = OpenAIEmbeddings( openai_api_key=os.getenv("SCW_API_KEY"), openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"), @@ -158,17 +170,17 @@ This section highlights that you're leveraging LangChain’s document loader cap ) ``` -Key Parameters: +#### Key Parameters: - openai_api_key: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference. - openai_api_base: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings. - model="sentence-transformers/sentence-t5-xxl": This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems. - tiktoken_enabled=False: This is an important parameter, which disables the use of TikToken for tokenization within the embeddings process. -What is tiktoken_enabled? +#### What is tiktoken_enabled? tiktoken is a tokenization library developed by OpenAI, which is optimized for working with GPT-based models (like GPT-3.5 or GPT-4). It transforms text into smaller token units that the model can process. -Why set tiktoken_enabled=False? +#### Why set tiktoken_enabled=False? In the context of using Scaleway’s Managed Inference and the sentence-t5-xxl model, TikToken tokenization is not necessary because the model you are using (sentence-transformers) works with raw text and handles its own tokenization internally. Moreover, leaving tiktoken_enabled as True causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. @@ -192,11 +204,11 @@ Use the S3FileLoader to load documents and split them into chunks. Then, embed a ```python files = document_loader.lazy_load() ``` -Why lazy loading? +#### Why lazy loading? The key reason for using lazy loading here is to avoid reprocessing documents that have already been embedded. In the context of Retrieval-Augmented Generation (RAG), reprocessing the same document multiple times is redundant and inefficient. Lazy loading enables us to check if a document has already been embedded (by querying the database) before actually loading and embedding it. ```python - text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20) + text_splitter = RecursiveCharacterTextSplitter(chunk_size=480, chunk_overlap=20) for file in files: cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (file.metadata["source"],)) @@ -216,19 +228,21 @@ The key reason for using lazy loading here is to avoid reprocessing documents th vector_store.add_embeddings(embedding, chunk) ``` -- S3FileLoader: Loads each file individually from the object storage bucket. -- RecursiveCharacterTextSplitter: Splits the document into smaller text chunks. This is important for embedding, as models typically work better with smaller chunks of text. -- embeddings_list: Stores the embeddings for each chunk. -- vector_store.add_embeddings(): Stores each chunk and its corresponding embedding in the PostgreSQL vector store. +1. S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time. +2. RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once). + - Chunk Size: Here, the chunk size is set to 480 characters, with an overlap of 20 characters. The choice of 480 characters is based on the context size supported by the embeddings model. Models have a maximum number of tokens they can process in a single pass, often around 512 tokens or fewer, depending on the specific model you are using. To ensure that each chunk fits within this limit, 380 characters provide a buffer, as different models tokenize characters into variable-length tokens. + - Chunk Overlap: The 20-character overlap ensures continuity between chunks, which helps prevent loss of meaning or context between segments. +3. Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search. +4. Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query. +5. Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources. + +#### Why 480 characters? -The code iterates over each file retrieved from object storage using lazy loading. -For each file, a query is made to check if its corresponding object_key (a unique identifier from the file metadata) exists in the object_loaded table in PostgreSQL. -If the document has already been processed and embedded (i.e., the object_key is found in the database), the system skips loading the file and moves on to the next one. -If the document is new (not yet embedded), the file is fully loaded and processed. +The chunk size of 480 characters is chosen to fit comfortably within the context size limits of typical embeddings models, which often range between 512 and 1024 tokens. Since most models tokenize text into smaller units (tokens) based on words, punctuation, and subwords, the exact number of tokens for 480 characters will vary depending on the language and the content. By keeping chunks small, we avoid exceeding the model’s context window, which could lead to truncated embeddings or poor performance during inference. This approach ensures that only new or modified documents are loaded into memory and embedded, saving significant computational resources and reducing redundant work. -Why store both chunk and embedding? +#### Why store both chunk and embedding? Storing both the chunk and its corresponding embedding allows for efficient document retrieval later. When a query is made, the RAG system will retrieve the most relevant embeddings, and the corresponding text chunks will be used to generate the final response. From 4a72e84b02328c464514446cb7d95a2bf680c909 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:44:01 -0700 Subject: [PATCH 11/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 8fa3e5003a..22001040ae 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -17,9 +17,9 @@ In this comprehensive guide, you'll learn how to implement RAG using LangChain, LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. #### What You’ll Learn: -How to embed text using a sentence transformer using ***Scaleway Manage Inference*** -How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector -How to manage large datasets efficiently with ***Scaleway Object Storage*** +- How to embed text using a sentence transformer using ***Scaleway Manage Inference*** +- How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector +- How to manage large datasets efficiently with ***Scaleway Object Storage*** From 004699b3e97220f13e3b1d17aa6bb0d9ee6df7af Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:44:10 -0700 Subject: [PATCH 12/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 22001040ae..11ca140107 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -68,7 +68,7 @@ To complete the actions presented below, you must have: SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment ``` -### Set Up Managed Database +## Setting Up Managed Databases To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as psql. The following steps will guide you through setting up your database to handle vector storage and document tracking. From b298435a60a2a4c1e6fe93c6578715f5b857516e Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:44:17 -0700 Subject: [PATCH 13/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 11ca140107..5b69b5b50d 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -41,7 +41,7 @@ To complete the actions presented below, you must have: ```sh pip install langchain psycopg2 python-dotenv ``` -2. Configure your environment variables: create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. +2. Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. ```sh # .env file From 52f5265b43596025f189faebc0103dac4f8a66c0 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:44:22 -0700 Subject: [PATCH 14/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 5b69b5b50d..7481299319 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -36,7 +36,7 @@ To complete the actions presented below, you must have: ## Configure your development environment -1. Install necessary packages: run the following command to install the required packages: +1. Run the following command to install the required packages: ```sh pip install langchain psycopg2 python-dotenv From 93b07a443cd5716832ad44a473a6bf6d8b6700a5 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:44:34 -0700 Subject: [PATCH 15/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 3 --- 1 file changed, 3 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 7481299319..b69d674091 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -23,9 +23,6 @@ LangChain simplifies the process of enhancing language models with retrieval cap -## Before you start - -To complete the actions presented below, you must have: - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) From bf833798bce86b386146d9a4faa7894ed9784e7a Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:44:55 -0700 Subject: [PATCH 16/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index b69d674091..439039090a 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -16,7 +16,7 @@ In this comprehensive guide, you'll learn how to implement RAG using LangChain, #### Why LangChain? LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. -#### What You’ll Learn: +## What You’ll Learn - How to embed text using a sentence transformer using ***Scaleway Manage Inference*** - How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector - How to manage large datasets efficiently with ***Scaleway Object Storage*** From a3f42df252bc16dca64fcdc5702ce96b3d84639a Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:45:01 -0700 Subject: [PATCH 17/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 439039090a..244d01b701 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -13,7 +13,7 @@ Retrieval-Augmented Generation (RAG) supercharges language models by enabling re In this comprehensive guide, you'll learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We'll combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management. -#### Why LangChain? +## Why LangChain? LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. ## What You’ll Learn From 29aa648664d90934c5a0466ed3b549a1014ab521 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:45:06 -0700 Subject: [PATCH 18/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 244d01b701..951a9dd3ae 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -3,7 +3,7 @@ meta: title: Step-by-Step Guide Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference description: Master Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference content: - h1: Step-by-Step Guide Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference + h1: Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference tags: inference managed postgresql pgvector object storage RAG langchain machine learning AI language models categories: - inference From ea655a87f86dc8d1b241e0fbf12a4453a47149c4 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Thu, 3 Oct 2024 07:45:11 -0700 Subject: [PATCH 19/27] Update tutorials/how-to-implement-rag/index.mdx Co-authored-by: ldecarvalho-doc <82805470+ldecarvalho-doc@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 951a9dd3ae..d20927274f 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -1,6 +1,6 @@ --- meta: - title: Step-by-Step Guide Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference + title: Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference description: Master Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference content: h1: Implementing Retrieval-Augmented Generation (RAG) with LangChain and Scaleway Managed Inference From f5b4d4d47fa171eedae0f92692256bc86d3d09d5 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Thu, 3 Oct 2024 10:07:14 -0700 Subject: [PATCH 20/27] switch metodo --- tutorials/how-to-implement-rag/index.mdx | 90 +++++++++++++++--------- 1 file changed, 55 insertions(+), 35 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index d20927274f..6b14cce33a 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -97,19 +97,18 @@ load_dotenv() # Establish connection to PostgreSQL database using environment variables conn = psycopg2.connect( - database=os.getenv("SCW_DB_NAME"), - user=os.getenv("SCW_DB_USER"), - password=os.getenv("SCW_DB_PASSWORD"), - host=os.getenv("SCW_DB_HOST"), - port=os.getenv("SCW_DB_PORT") -) + database=os.getenv("SCW_DB_NAME"), + user=os.getenv("SCW_DB_USER"), + password=os.getenv("SCW_DB_PASSWORD"), + host=os.getenv("SCW_DB_HOST"), + port=os.getenv("SCW_DB_PORT") + ) # Create a cursor to execute SQL commands cur = conn.cursor() ``` - ### Set Up Document Loaders for Object Storage In this section, we will use LangChain to load documents stored in your Scaleway Object Storage bucket. The document loader retrieves the contents of each document for further processing, such as vectorization or embedding generation. @@ -197,45 +196,66 @@ PGVector: This creates the vector store in your PostgreSQL database to store the Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. -1. Lazy loadings documents: This method is designed to efficiently load and process documents one by one from Scaleway Object Storage. Instead of loading all documents at once, it loads them lazily, allowing us to inspect each file before deciding whether to embed it. +1. Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document. + ```python - files = document_loader.lazy_load() + endpoint_s3 = f"https://s3.{os.getenv('SCW_DEFAULT_REGION', '')}.scw.cloud" + session = boto3.session.Session() + client_s3 = session.client(service_name='s3', endpoint_url=endpoint_s3, + aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""), + aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "")) + paginator = client_s3.get_paginator('list_objects_v2') + page_iterator = paginator.paginate(Bucket=BUCKET_NAME) + ``` -#### Why lazy loading? -The key reason for using lazy loading here is to avoid reprocessing documents that have already been embedded. In the context of Retrieval-Augmented Generation (RAG), reprocessing the same document multiple times is redundant and inefficient. Lazy loading enables us to check if a document has already been embedded (by querying the database) before actually loading and embedding it. + +In this code sample we: +- Set Up a Boto3 Session: We initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests. +- Create an S3 Client: We establish an S3 client to interact with the Scaleway Object storage service. +- Set Up Pagination for Listing Objects: We prepare pagination to handle potentially large lists of objects efficiently. +- Iterate Through the Bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly. + +2. Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database. ```python - text_splitter = RecursiveCharacterTextSplitter(chunk_size=480, chunk_overlap=20) - - for file in files: - cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (file.metadata["source"],)) - if cur.fetchone() is None: - fileLoader = S3FileLoader( - bucket=os.getenv("SCW_BUCKET_NAME"), - key=file.metadata["source"].split("/")[-1], - endpoint_url=endpoint_s3, - aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), - aws_secret_access_key=os.getenv("SCW_API_KEY") + text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False) + for page in page_iterator: + for obj in page.get('Contents', []): + cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (obj['Key'],)) + response = cur.fetchone() + if response is None: + file_loader = S3FileLoader( + bucket=BUCKET_NAME, + key=obj['Key'], + endpoint_url=endpoint_s3, + aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""), + aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "") ) - file_to_load = fileLoader.load() - chunks = text_splitter.split_text(file.page_content) - - embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks] - for chunk, embedding in zip(chunks, embeddings_list): - vector_store.add_embeddings(embedding, chunk) + file_to_load = file_loader.load() + cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],)) + chunks = text_splitter.split_text(file_to_load[0].page_content) + try: + embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks] + vector_store.add_embeddings(chunks, embeddings_list) + cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", + (obj['Key'],)) + except Exception as e: + logger.error(f"An error occurred: {e}") + + conn.commit() ``` -1. S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time. -2. RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once). +- S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time. +- RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once). - Chunk Size: Here, the chunk size is set to 480 characters, with an overlap of 20 characters. The choice of 480 characters is based on the context size supported by the embeddings model. Models have a maximum number of tokens they can process in a single pass, often around 512 tokens or fewer, depending on the specific model you are using. To ensure that each chunk fits within this limit, 380 characters provide a buffer, as different models tokenize characters into variable-length tokens. - Chunk Overlap: The 20-character overlap ensures continuity between chunks, which helps prevent loss of meaning or context between segments. -3. Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search. -4. Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query. -5. Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources. +- Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search. +- Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query. +- Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources. -#### Why 480 characters? +#### Why 500 characters? -The chunk size of 480 characters is chosen to fit comfortably within the context size limits of typical embeddings models, which often range between 512 and 1024 tokens. Since most models tokenize text into smaller units (tokens) based on words, punctuation, and subwords, the exact number of tokens for 480 characters will vary depending on the language and the content. By keeping chunks small, we avoid exceeding the model’s context window, which could lead to truncated embeddings or poor performance during inference. +The chunk size of 500 characters is chosen to fit comfortably within the context size limits of typical embeddings models, which often range between 512 and 1024 tokens. Since most models tokenize text into smaller units (tokens) based on words, punctuation, and subwords, the exact number of tokens for 480 characters will vary depending on the language and the content. By keeping chunks small, we avoid exceeding the model’s context window, which could lead to truncated embeddings or poor performance during inference. This approach ensures that only new or modified documents are loaded into memory and embedded, saving significant computational resources and reducing redundant work. From 69bc345ff1528d26d5a803d7d94237574633f9f2 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Thu, 3 Oct 2024 10:11:01 -0700 Subject: [PATCH 21/27] pre-defined prompt --- tutorials/how-to-implement-rag/index.mdx | 43 +++++++++++++++++------- 1 file changed, 30 insertions(+), 13 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 6b14cce33a..ec6de410e1 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -264,29 +264,46 @@ This approach ensures that only new or modified documents are loaded into memory Storing both the chunk and its corresponding embedding allows for efficient document retrieval later. When a query is made, the RAG system will retrieve the most relevant embeddings, and the corresponding text chunks will be used to generate the final response. -### Query the RAG System +### Query the RAG System with a pre-defined prompt template -Now, set up the RAG system to handle queries using RetrievalQA and the LLM. +Now, set up the RAG system to handle queries ```python - retriever = vector_store.as_retriever(search_kwargs={"k": 3}) - llm = ChatOpenAI( - base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"), - api_key=os.getenv("SCW_API_KEY"), - model=deployment.model_name, +llm = ChatOpenAI( + base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"), + api_key=os.getenv("SCW_SECRET_KEY"), + model=deployment.model_name, ) - qa_stuff = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever) + prompt = hub.pull("rlm/rag-prompt") + retriever = vector_store.as_retriever() - query = "What are the commands to set up a database with the CLI of Scaleway?" - response = qa_stuff.invoke(query) - print(response['result']) + rag_chain = ( + {"context": retriever, "question": RunnablePassthrough()} + | prompt + | llm + | StrOutputParser() + ) + + for r in rag_chain.stream("Your question"): + print(r, end="", flush=True) + time.sleep(0.15) ``` +- LLM Initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name. + +- Prompt Setup: The prompt is pulled from the hub using a pre-defined template, ensuring consistent query formatting. + +- Retriever Configuration: We set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query. + +- RAG Chain Construction: We create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow. + +- Query Execution: Finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability. +### Query the RAG system with you own prompt template ### Conclusion -This step is essential for efficiently processing and storing large document datasets for RAG. By using lazy loading, the system handles large datasets without overwhelming memory, while chunking ensures that each document is processed in a way that maximizes the performance of the LLM. The embeddings are stored in PostgreSQL via pgvector, allowing for fast and scalable retrieval when responding to user queries. +In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets for a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we can quickly check which documents have already been processed, ensuring that our system operates smoothly without redundant data handling. Chunking optimizes the processing of each document, maximizing the performance of the LLM. Storing embeddings in PostgreSQL via pgvector enables fast and scalable retrieval, ensuring quick responses to user queries. -By combining Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you can implement a powerful RAG system that scales with your data and offers robust information retrieval capabilities. \ No newline at end of file +By integrating Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you can build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently. \ No newline at end of file From 403fd5e90b6aa7a815fb6baa16d3e8bf767f0e54 Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 4 Oct 2024 01:44:34 -0700 Subject: [PATCH 22/27] add custom prompt --- tutorials/how-to-implement-rag/index.mdx | 228 ++++++++++++++--------- 1 file changed, 144 insertions(+), 84 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index ec6de410e1..79daf6db73 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -33,12 +33,16 @@ LangChain simplifies the process of enhancing language models with retrieval cap ## Configure your development environment -1. Run the following command to install the required packages: +### Step 1: Install Required Packages + +Run the following command to install the required packages: ```sh pip install langchain psycopg2 python-dotenv ``` -2. Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. +### Step 2: Create a .env File + +Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. ```sh # .env file @@ -67,23 +71,28 @@ LangChain simplifies the process of enhancing language models with retrieval cap ## Setting Up Managed Databases +### Step 1: Connect to Your PostgreSQL Database + To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as psql. The following steps will guide you through setting up your database to handle vector storage and document tracking. -1. Install the pgvector extension +### Step 2: Install the pgvector Extension + pgvector is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command: ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` -2. Create a table to track processed documents +### Step 3: Create a Table to Track Processed Documents + To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization: ```sql CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT); ``` -3. Connect to PostgreSQL programmatically using Python -You can also connect to your PostgreSQL instance and perform the same tasks programmatically. +### Step 4: Connect to PostgreSQL Programmatically + +Connect to your PostgreSQL instance and perform tasks programmatically. ```python # rag.py file @@ -108,62 +117,30 @@ conn = psycopg2.connect( cur = conn.cursor() ``` +## Embeddings and Vector Store Setup -### Set Up Document Loaders for Object Storage - -In this section, we will use LangChain to load documents stored in your Scaleway Object Storage bucket. The document loader retrieves the contents of each document for further processing, such as vectorization or embedding generation. - -1. Storing Data for RAG -Ensure that all the documents and data you want to inject into your Retrieval-Augmented Generation (RAG) system are stored in this Scaleway Object Storage bucket. These could include text files, PDFs, or any other format that will be processed and vectorized in the following steps. - -2. Import Required Modules -Before setting up the document loader, you need to import the necessary modules from LangChain and other libraries. Here's how to do that: +### Step 1: Import Required Modules ```python # rag.py -from langchain.document_loaders import S3DirectoryLoader -import os -``` - -3. Set Up the Document Loader -The S3DirectoryLoader class, part of LangChain, is specifically designed to load documents from S3-compatible storage (in this case, Scaleway Object Storage). -Now, let’s configure the document loader to pull files from your Scaleway Object Storage bucket using the appropriate credentials and environment variables: - -```python -# rag.py - - document_loader = S3DirectoryLoader( - bucket=os.getenv('SCW_BUCKET_NAME'), - endpoint_url=os.getenv('SCW_BUCKET_ENDPOINT'), - aws_access_key_id=os.getenv("SCW_ACCESS_KEY"), - aws_secret_access_key=os.getenv("SCW_API_KEY") - ) - -``` - -This section highlights that you're leveraging LangChain’s document loader capabilities to connect directly to your Scaleway Object Storage. LangChain simplifies the process of integrating external data sources, allowing you to focus on building a RAG system without handling low-level integration details. - -### Embeddings and Vector Store Setup -1. Import the required module -```python -# rag.py - from langchain_openai import OpenAIEmbeddings from langchain_postgres import PGVector ``` -2. We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. +### Step 2: Configure OpenAI Embeddings + +We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. ```python # rag.py - embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv("SCW_API_KEY"), - openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"), - model="sentence-transformers/sentence-t5-xxl", - tiktoken_enabled=False, - ) +embeddings = OpenAIEmbeddings( + openai_api_key=os.getenv("SCW_API_KEY"), + openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"), + model="sentence-transformers/sentence-t5-xxl", + tiktoken_enabled=False, + ) ``` #### Key Parameters: @@ -182,31 +159,49 @@ In the context of using Scaleway’s Managed Inference and the sentence-t5-xxl m Moreover, leaving tiktoken_enabled as True causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. By setting tiktoken_enabled=False, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure. -2. Next, configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings. +### Step 3: Create a PGVector Store + +Configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings. ```python +# rag.py - connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}" - vector_store = PGVector(connection=connection_string, embeddings=embeddings) +connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}" +vector_store = PGVector(connection=connection_string, embeddings=embeddings) ``` PGVector: This creates the vector store in your PostgreSQL database to store the embeddings. -### Load and Process Documents +## Load and Process Documents Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. -1. Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document. +### Step 1: Import Required Modules + +```python +#rag.py + +import boto3 +from langchain_community.document_loaders import S3FileLoader +from langchain.text_splitter import RecursiveCharacterTextSplitter +from langchain_openai import OpenAIEmbeddings + +``` + +### Step 2: Load Metadata for Improved Efficiency + +Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document. ```python - endpoint_s3 = f"https://s3.{os.getenv('SCW_DEFAULT_REGION', '')}.scw.cloud" - session = boto3.session.Session() - client_s3 = session.client(service_name='s3', endpoint_url=endpoint_s3, +# rag.py + +endpoint_s3 = f"https://s3.{os.getenv('SCW_DEFAULT_REGION', '')}.scw.cloud" +session = boto3.session.Session() +client_s3 = session.client(service_name='s3', endpoint_url=endpoint_s3, aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""), aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "")) - paginator = client_s3.get_paginator('list_objects_v2') - page_iterator = paginator.paginate(Bucket=BUCKET_NAME) - +paginator = client_s3.get_paginator('list_objects_v2') +page_iterator = paginator.paginate(Bucket=BUCKET_NAME) ``` In this code sample we: @@ -215,34 +210,37 @@ In this code sample we: - Set Up Pagination for Listing Objects: We prepare pagination to handle potentially large lists of objects efficiently. - Iterate Through the Bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly. -2. Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database. +### Step 3: Iterate Through Metadata + +Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database. ```python - text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False) - for page in page_iterator: - for obj in page.get('Contents', []): - cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (obj['Key'],)) - response = cur.fetchone() +# rag.py + +text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False) +for page in page_iterator: + for obj in page.get('Contents', []): + cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (obj['Key'],)) + response = cur.fetchone() if response is None: file_loader = S3FileLoader( - bucket=BUCKET_NAME, - key=obj['Key'], - endpoint_url=endpoint_s3, - aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""), - aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "") - ) + bucket=BUCKET_NAME, + key=obj['Key'], + endpoint_url=endpoint_s3, + aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""), + aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "") + ) file_to_load = file_loader.load() cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],)) chunks = text_splitter.split_text(file_to_load[0].page_content) try: embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks] vector_store.add_embeddings(chunks, embeddings_list) - cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", - (obj['Key'],)) + cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],)) except Exception as e: logger.error(f"An error occurred: {e}") - conn.commit() +conn.commit() ``` - S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time. @@ -266,29 +264,44 @@ When a query is made, the RAG system will retrieve the most relevant embeddings, ### Query the RAG System with a pre-defined prompt template +### Step 1: Import Required Modules + +```python +#rag.py + +from langchain import hub +from langchain_core.output_parsers import StrOutputParser +from langchain_core.runnables import RunnablePassthrough + +``` + +### Step 2: Setup LLM for Querying + Now, set up the RAG system to handle queries ```python +#rag.py + llm = ChatOpenAI( base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"), api_key=os.getenv("SCW_SECRET_KEY"), model=deployment.model_name, - ) + ) - prompt = hub.pull("rlm/rag-prompt") - retriever = vector_store.as_retriever() +prompt = hub.pull("rlm/rag-prompt") +retriever = vector_store.as_retriever() - rag_chain = ( +rag_chain = ( {"context": retriever, "question": RunnablePassthrough()} | prompt | llm | StrOutputParser() ) - for r in rag_chain.stream("Your question"): - print(r, end="", flush=True) - time.sleep(0.15) +for r in rag_chain.stream("Your question"): + print(r, end="", flush=True) + time.sleep(0.1) ``` - LLM Initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name. @@ -302,8 +315,55 @@ llm = ChatOpenAI( ### Query the RAG system with you own prompt template +Personalizing your prompt template allows you to tailor the responses from your RAG (Retrieval-Augmented Generation) system to better fit your specific needs. This can significantly improve the relevance and tone of the answers you receive. Below is a detailed guide on how to create a custom prompt for querying the system. + +```python +#rag.py + +from langchain.chains.combine_documents import create_stuff_documents_chain +from langchain_core.prompts import PromptTemplate +from langchain_openai import ChatOpenAI + +llm = ChatOpenAI( + base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"), + api_key=os.getenv("SCW_SECRET_KEY"), + model=deployment.model_name, + ) +prompt = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Always finish your answer by "Thank you for asking". {context} Question: {question} Helpful Answer:""" +custom_rag_prompt = PromptTemplate.from_template(prompt) +retriever = vector_store.as_retriever() +custom_rag_chain = create_stuff_documents_chain(llm, custom_rag_prompt) + + +context = retriever.invoke("your question") +for r in custom_rag_chain.stream({"question":"your question", "context": context}): + print(r, end="", flush=True) + time.sleep(0.1) +``` + +- Prompt Template: The prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information. +To make the responses more engaging, consider adding a light-hearted conclusion or a personalized touch. For example, you might modify the closing line to say, "Thank you for asking! I'm here to help with anything else you need!" +Retrieving Context: +- The retriever.invoke(new_message) method fetches relevant information from your vector store based on the user’s query. It's essential that this step retrieves high-quality context to ensure that the model's responses are accurate and helpful. +You can enhance the quality of the context by fine-tuning your embeddings and ensuring that the documents in your vector store are relevant and well-structured. +Creating the RAG Chain: +- The create_stuff_documents_chain function connects the language model with your custom prompt. This integration allows the model to process the retrieved context effectively and formulate a coherent and context-aware response. +Consider experimenting with different chain configurations to see how they affect the output. For instance, using a different chain type may yield varied responses. +Streaming Responses: +- The loop that streams responses from the custom_rag_chain provides a dynamic user experience. Instead of waiting for the entire output, users can see responses as they are generated, enhancing interactivity. +You can customize the streaming behavior further, such as implementing progress indicators or more sophisticated UI elements for applications. + +#### Example Use Cases +- Customer Support: Use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging. +- Research Assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities. +- Content Generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones. + ### Conclusion -In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets for a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we can quickly check which documents have already been processed, ensuring that our system operates smoothly without redundant data handling. Chunking optimizes the processing of each document, maximizing the performance of the LLM. Storing embeddings in PostgreSQL via pgvector enables fast and scalable retrieval, ensuring quick responses to user queries. +In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets within a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we ensured that our system avoids redundant data handling, allowing for smooth and efficient operations. The use of chunking optimizes document processing, maximizing the performance of the language model. Storing embeddings in PostgreSQL via pgvector enables rapid and scalable retrieval, ensuring quick responses to user queries. + +Furthermore, you can continually enhance your RAG system by implementing mechanisms to retain chat history. Keeping track of past interactions allows for more contextually aware responses, fostering a more engaging user experience. This historical data can be used to refine your prompts, adapt to user preferences, and improve the overall accuracy of responses. + +By integrating Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you have the foundation to build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently. -By integrating Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you can build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently. \ No newline at end of file +With ongoing refinement and adaptation, your RAG system can evolve to meet the changing needs of your users, ensuring that it remains a valuable asset in your AI toolkit. \ No newline at end of file From 69c9974dfa8eb593f3c70b93589b590eed9ef82b Mon Sep 17 00:00:00 2001 From: Laure-di Date: Fri, 4 Oct 2024 02:47:58 -0700 Subject: [PATCH 23/27] exemple of endpoint --- tutorials/how-to-implement-rag/index.mdx | 32 +++++++++++------------- 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 79daf6db73..b6f93f0462 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -60,24 +60,24 @@ Create a .env file and add the following variables. These will store your API ke # Scaleway S3 bucket configuration SCW_BUCKET_NAME=your_scaleway_bucket_name - SCW_BUCKET_ENDPOINT=your_scaleway_bucket_endpoint # S3 endpoint, e.g., https://s3.fr-par.scw.cloud + SCW_BUCKET_ENDPOINT="https://{{SCW_BUCKET_NAME}}.s3.{{SCW_REGION}}.scw.cloud" # S3 endpoint, e.g., https://s3.fr-par.scw.cloud # Scaleway Inference API configuration (Embeddings) - SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment + SCW_INFERENCE_EMBEDDINGS_ENDPOINT="https://{{SCW_INFERENCE_DEPLOYMENT_ID}}.ifr.fr-par.scw.cloud/v1" # Endpoint for sentence-transformers/sentence-t5-xxl deployment # Scaleway Inference API configuration (LLM deployment) - SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment + SCW_INFERENCE_DEPLOYMENT_ENDPOINT="https://{{SCW_INFERENCE_DEPLOYMENT_ID}}.ifr.fr-par.scw.cloud/v1" # Endpoint for your LLM deployment ``` ## Setting Up Managed Databases ### Step 1: Connect to Your PostgreSQL Database -To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as psql. The following steps will guide you through setting up your database to handle vector storage and document tracking. +To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking. ### Step 2: Install the pgvector Extension -pgvector is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command: +[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command: ```sql CREATE EXTENSION IF NOT EXISTS vector; @@ -130,7 +130,7 @@ from langchain_postgres import PGVector ### Step 2: Configure OpenAI Embeddings -We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. +We will utilize the [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. ```python # rag.py @@ -144,20 +144,20 @@ embeddings = OpenAIEmbeddings( ``` #### Key Parameters: -- openai_api_key: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference. -- openai_api_base: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings. -- model="sentence-transformers/sentence-t5-xxl": This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems. -- tiktoken_enabled=False: This is an important parameter, which disables the use of TikToken for tokenization within the embeddings process. +- `openai_api_key`: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference. +- `openai_api_base`: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings. +- `model="sentence-transformers/sentence-t5-xxl"`: This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems. +- `tiktoken_enabled=False`: This is parameter disables the use of TikToken for tokenization within the embeddings process. #### What is tiktoken_enabled? -tiktoken is a tokenization library developed by OpenAI, which is optimized for working with GPT-based models (like GPT-3.5 or GPT-4). It transforms text into smaller token units that the model can process. +[`tiktoken`](https://github.com/openai/tiktoken) is a tokenization library developed by OpenAI, which is optimized for working with GPT-based models (like GPT-3.5 or GPT-4). It transforms text into smaller token units that the model can process. #### Why set tiktoken_enabled=False? -In the context of using Scaleway’s Managed Inference and the sentence-t5-xxl model, TikToken tokenization is not necessary because the model you are using (sentence-transformers) works with raw text and handles its own tokenization internally. -Moreover, leaving tiktoken_enabled as True causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. -By setting tiktoken_enabled=False, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure. +In the context of using Scaleway’s Managed Inference and the `sentence-t5-xxl` model, TikToken tokenization is not necessary because the model you are using (sentence-transformers) works with raw text and handles its own tokenization internally. +Moreover, leaving `tiktoken_enabled` as `True` causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. +By setting `tiktoken_enabled=False`, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure. ### Step 3: Create a PGVector Store @@ -174,7 +174,7 @@ PGVector: This creates the vector store in your PostgreSQL database to store the ## Load and Process Documents -Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. +Use the [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. ### Step 1: Import Required Modules @@ -245,8 +245,6 @@ conn.commit() - S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time. - RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once). - - Chunk Size: Here, the chunk size is set to 480 characters, with an overlap of 20 characters. The choice of 480 characters is based on the context size supported by the embeddings model. Models have a maximum number of tokens they can process in a single pass, often around 512 tokens or fewer, depending on the specific model you are using. To ensure that each chunk fits within this limit, 380 characters provide a buffer, as different models tokenize characters into variable-length tokens. - - Chunk Overlap: The 20-character overlap ensures continuity between chunks, which helps prevent loss of meaning or context between segments. - Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search. - Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query. - Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources. From 8419eb7d3caea9f8493375e7c091bd2d91f06830 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Fri, 4 Oct 2024 03:50:59 -0700 Subject: [PATCH 24/27] Apply suggestions from code review Co-authored-by: Benedikt Rollik --- tutorials/how-to-implement-rag/index.mdx | 76 ++++++++++++------------ 1 file changed, 38 insertions(+), 38 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index b6f93f0462..65c7304055 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -11,12 +11,12 @@ categories: Retrieval-Augmented Generation (RAG) supercharges language models by enabling real-time retrieval of relevant information from external datasets. This hybrid approach boosts both the accuracy and contextual relevance of model outputs, making it essential for advanced AI applications. -In this comprehensive guide, you'll learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We'll combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management. +In this comprehensive guide, you will learn how to implement RAG using LangChain, one of the leading frameworks for developing robust language model applications. We will combine LangChain with ***Scaleway’s Managed Inference***, ***Scaleway’s PostgreSQL Managed Database*** (featuring pgvector for vector storage), and ***Scaleway’s Object Storage*** for seamless integration and efficient data management. ## Why LangChain? LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. -## What You’ll Learn +## What you will learn - How to embed text using a sentence transformer using ***Scaleway Manage Inference*** - How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector - How to manage large datasets efficiently with ***Scaleway Object Storage*** @@ -40,7 +40,7 @@ Run the following command to install the required packages: ```sh pip install langchain psycopg2 python-dotenv ``` -### Step 2: Create a .env File +### Step 2: Create a .env file Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values. @@ -71,18 +71,18 @@ Create a .env file and add the following variables. These will store your API ke ## Setting Up Managed Databases -### Step 1: Connect to Your PostgreSQL Database +### Step 1: Connect to your PostgreSQL database -To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking. +To perform these actions, you will need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking. -### Step 2: Install the pgvector Extension +### Step 2: Install the pgvector extension [pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command: ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` -### Step 3: Create a Table to Track Processed Documents +### Step 3: Create a table to track processed documents To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization: @@ -90,7 +90,7 @@ To prevent reprocessing documents that have already been loaded and vectorized, CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT); ``` -### Step 4: Connect to PostgreSQL Programmatically +### Step 4: Connect to PostgreSQL programmatically Connect to your PostgreSQL instance and perform tasks programmatically. @@ -143,7 +143,7 @@ embeddings = OpenAIEmbeddings( ) ``` -#### Key Parameters: +#### Key parameters: - `openai_api_key`: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference. - `openai_api_base`: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings. - `model="sentence-transformers/sentence-t5-xxl"`: This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems. @@ -159,7 +159,7 @@ In the context of using Scaleway’s Managed Inference and the `sentence-t5-xxl` Moreover, leaving `tiktoken_enabled` as `True` causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. By setting `tiktoken_enabled=False`, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure. -### Step 3: Create a PGVector Store +### Step 3: Create a PGVector store Configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings. @@ -172,11 +172,11 @@ vector_store = PGVector(connection=connection_string, embeddings=embeddings) PGVector: This creates the vector store in your PostgreSQL database to store the embeddings. -## Load and Process Documents +## Load and process documents Use the [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. -### Step 1: Import Required Modules +### Step 1: Import required modules ```python #rag.py @@ -188,9 +188,9 @@ from langchain_openai import OpenAIEmbeddings ``` -### Step 2: Load Metadata for Improved Efficiency +### Step 2: Load metadata for improved efficiency -Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document. +Load metadata for improved efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document. ```python # rag.py @@ -205,14 +205,14 @@ page_iterator = paginator.paginate(Bucket=BUCKET_NAME) ``` In this code sample we: -- Set Up a Boto3 Session: We initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests. -- Create an S3 Client: We establish an S3 client to interact with the Scaleway Object storage service. -- Set Up Pagination for Listing Objects: We prepare pagination to handle potentially large lists of objects efficiently. -- Iterate Through the Bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly. +- Set up a Boto3 session: We initialize a Boto3 session, which is the AWS SDK for Python, fully compatible with Scaleway Object Storage. This session manages configuration, including credentials and settings, that Boto3 uses for API requests. +- Create an S3 client: We establish an S3 client to interact with the Scaleway Object Storage service. +- Set up pagination for listing objects: We prepare pagination to handle potentially large lists of objects efficiently. +- Iterate through the bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly. -### Step 3: Iterate Through Metadata +### Step 3: Iterate through metadata -Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database. +Iterate through metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database. ```python # rag.py @@ -245,9 +245,9 @@ conn.commit() - S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time. - RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once). -- Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search. -- Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query. -- Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources. +- Embedding the chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search. +- Embedding storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query. +- Avoiding redundant processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources. #### Why 500 characters? @@ -262,7 +262,7 @@ When a query is made, the RAG system will retrieve the most relevant embeddings, ### Query the RAG System with a pre-defined prompt template -### Step 1: Import Required Modules +### Step 1: Import required modules ```python #rag.py @@ -273,7 +273,7 @@ from langchain_core.runnables import RunnablePassthrough ``` -### Step 2: Setup LLM for Querying +### Step 2: Setup LLM for querying Now, set up the RAG system to handle queries @@ -301,15 +301,15 @@ for r in rag_chain.stream("Your question"): print(r, end="", flush=True) time.sleep(0.1) ``` -- LLM Initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name. +- LLM initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name. -- Prompt Setup: The prompt is pulled from the hub using a pre-defined template, ensuring consistent query formatting. +- Prompt setup: The prompt is pulled from the hub using a pre-defined template, ensuring consistent query formatting. -- Retriever Configuration: We set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query. +- Retriever configuration: We set up the retriever to access the vector store, allowing the RAG system to retrieve relevant information based on the query. -- RAG Chain Construction: We create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow. +- RAG chain construction: We create the RAG chain, which connects the retriever, prompt, LLM, and output parser in a streamlined workflow. -- Query Execution: Finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability. +- Query execution: Finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability. ### Query the RAG system with you own prompt template @@ -339,22 +339,22 @@ for r in custom_rag_chain.stream({"question":"your question", "context": context time.sleep(0.1) ``` -- Prompt Template: The prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information. +- Prompt template: The prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information. To make the responses more engaging, consider adding a light-hearted conclusion or a personalized touch. For example, you might modify the closing line to say, "Thank you for asking! I'm here to help with anything else you need!" -Retrieving Context: +Retrieving context: - The retriever.invoke(new_message) method fetches relevant information from your vector store based on the user’s query. It's essential that this step retrieves high-quality context to ensure that the model's responses are accurate and helpful. You can enhance the quality of the context by fine-tuning your embeddings and ensuring that the documents in your vector store are relevant and well-structured. -Creating the RAG Chain: +Creating the RAG chain: - The create_stuff_documents_chain function connects the language model with your custom prompt. This integration allows the model to process the retrieved context effectively and formulate a coherent and context-aware response. Consider experimenting with different chain configurations to see how they affect the output. For instance, using a different chain type may yield varied responses. -Streaming Responses: +Streaming responses: - The loop that streams responses from the custom_rag_chain provides a dynamic user experience. Instead of waiting for the entire output, users can see responses as they are generated, enhancing interactivity. You can customize the streaming behavior further, such as implementing progress indicators or more sophisticated UI elements for applications. -#### Example Use Cases -- Customer Support: Use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging. -- Research Assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities. -- Content Generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones. +#### Example use cases +- Customer support: Use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging. +- Research assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities. +- Content generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones. ### Conclusion From bb72b38cf4a36d77a8e1f74989b25beaf0e73eb4 Mon Sep 17 00:00:00 2001 From: Laure-di <62625835+Laure-di@users.noreply.github.com> Date: Fri, 4 Oct 2024 03:51:43 -0700 Subject: [PATCH 25/27] Apply suggestions from code review Co-authored-by: Benedikt Rollik --- tutorials/how-to-implement-rag/index.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 65c7304055..2f2aec8d23 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -33,7 +33,7 @@ LangChain simplifies the process of enhancing language models with retrieval cap ## Configure your development environment -### Step 1: Install Required Packages +### Step 1: Install required packages Run the following command to install the required packages: From d35ff83e337eb726eed89c6486bfb4a2e1ec9c0f Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Fri, 4 Oct 2024 13:31:21 +0200 Subject: [PATCH 26/27] Apply suggestions from code review Co-authored-by: Rowena Jones <36301604+RoRoJ@users.noreply.github.com> --- tutorials/how-to-implement-rag/index.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index 2f2aec8d23..ecda0830fe 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -17,7 +17,7 @@ In this comprehensive guide, you will learn how to implement RAG using LangChain LangChain simplifies the process of enhancing language models with retrieval capabilities, allowing developers to build scalable, intelligent applications that access external datasets effortlessly. By leveraging LangChain’s modular design and Scaleway’s cloud services, you can unlock the full potential of Retrieval-Augmented Generation. ## What you will learn -- How to embed text using a sentence transformer using ***Scaleway Manage Inference*** +- How to embed text using a sentence transformer with ***Scaleway Manage Inference*** - How to store and query embeddings using ***Scaleway’s Managed PostgreSQL Database*** with pgvector - How to manage large datasets efficiently with ***Scaleway Object Storage*** @@ -26,10 +26,10 @@ LangChain simplifies the process of enhancing language models with retrieval cap - A Scaleway account logged into the [console](https://console.scaleway.com) - [Owner](/identity-and-access-management/iam/concepts/#owner) status or [IAM permissions](/identity-and-access-management/iam/concepts/#permission) allowing you to perform actions in the intended Organization - A valid [API key](/identity-and-access-management/iam/how-to/create-api-keys/) -- [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/): Set up an inference deployment using [sentence-transformers/sentence-t5-xxl](/ai-data/managed-inference/reference-content/sentence-t5-xxl/) on an L4 instance to efficiently process embeddings. -- [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/) with the large language model of your choice. -- [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model. -- [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. +- An [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/): set it up using [sentence-transformers/sentence-t5-xxl](/ai-data/managed-inference/reference-content/sentence-t5-xxl/) on an L4 instance to efficiently process embeddings. +- An [Inference Deployment](/ai-data/managed-inference/how-to/create-deployment/) with the large language model of your choice. +- An [Object Storage Bucket](/storage/object/how-to/create-a-bucket/) to store all the data you want to inject into your LLM model. +- A [Managed Database](/managed-databases/postgresql-and-mysql/how-to/create-a-database/) to securely store all your embeddings. ## Configure your development environment @@ -311,7 +311,7 @@ for r in rag_chain.stream("Your question"): - Query execution: Finally, we stream the output of the RAG chain for a specified question, printing each response with a slight delay for better readability. -### Query the RAG system with you own prompt template +### Query the RAG system with your own prompt template Personalizing your prompt template allows you to tailor the responses from your RAG (Retrieval-Augmented Generation) system to better fit your specific needs. This can significantly improve the relevance and tone of the answers you receive. Below is a detailed guide on how to create a custom prompt for querying the system. @@ -327,7 +327,7 @@ llm = ChatOpenAI( api_key=os.getenv("SCW_SECRET_KEY"), model=deployment.model_name, ) -prompt = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Always finish your answer by "Thank you for asking". {context} Question: {question} Helpful Answer:""" +prompt = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Always finish your answer with "Thank you for asking". {context} Question: {question} Helpful Answer:""" custom_rag_prompt = PromptTemplate.from_template(prompt) retriever = vector_store.as_retriever() custom_rag_chain = create_stuff_documents_chain(llm, custom_rag_prompt) @@ -362,6 +362,6 @@ In this tutorial, we explored essential techniques for efficiently processing an Furthermore, you can continually enhance your RAG system by implementing mechanisms to retain chat history. Keeping track of past interactions allows for more contextually aware responses, fostering a more engaging user experience. This historical data can be used to refine your prompts, adapt to user preferences, and improve the overall accuracy of responses. -By integrating Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you have the foundation to build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently. +By integrating Scaleway Object Storage, Managed Database for PostgreSQL with pgvector, and LangChain’s embedding tools, you have the foundation to build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently. With ongoing refinement and adaptation, your RAG system can evolve to meet the changing needs of your users, ensuring that it remains a valuable asset in your AI toolkit. \ No newline at end of file From a77d1975b59f59d6d4bfdadb33d87ac02c322827 Mon Sep 17 00:00:00 2001 From: Benedikt Rollik Date: Fri, 4 Oct 2024 13:40:35 +0200 Subject: [PATCH 27/27] Apply suggestions from code review --- tutorials/how-to-implement-rag/index.mdx | 28 ++++++++++++------------ 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/tutorials/how-to-implement-rag/index.mdx b/tutorials/how-to-implement-rag/index.mdx index ecda0830fe..38ca4ba4a1 100644 --- a/tutorials/how-to-implement-rag/index.mdx +++ b/tutorials/how-to-implement-rag/index.mdx @@ -33,7 +33,7 @@ LangChain simplifies the process of enhancing language models with retrieval cap ## Configure your development environment -### Step 1: Install required packages +### Install required packages Run the following command to install the required packages: @@ -71,18 +71,18 @@ Create a .env file and add the following variables. These will store your API ke ## Setting Up Managed Databases -### Step 1: Connect to your PostgreSQL database +### Connect to your PostgreSQL database To perform these actions, you will need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking. -### Step 2: Install the pgvector extension +### Install the pgvector extension [pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command: ```sql CREATE EXTENSION IF NOT EXISTS vector; ``` -### Step 3: Create a table to track processed documents +### Create a table to track processed documents To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization: @@ -90,7 +90,7 @@ To prevent reprocessing documents that have already been loaded and vectorized, CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT); ``` -### Step 4: Connect to PostgreSQL programmatically +### Connect to PostgreSQL programmatically Connect to your PostgreSQL instance and perform tasks programmatically. @@ -119,7 +119,7 @@ cur = conn.cursor() ## Embeddings and Vector Store Setup -### Step 1: Import Required Modules +### Import Required Modules ```python # rag.py @@ -128,7 +128,7 @@ from langchain_openai import OpenAIEmbeddings from langchain_postgres import PGVector ``` -### Step 2: Configure OpenAI Embeddings +### Configure OpenAI Embeddings We will utilize the [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain and store the embeddings in PostgreSQL using the PGVector integration. @@ -159,7 +159,7 @@ In the context of using Scaleway’s Managed Inference and the `sentence-t5-xxl` Moreover, leaving `tiktoken_enabled` as `True` causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior. By setting `tiktoken_enabled=False`, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure. -### Step 3: Create a PGVector store +### Create a PGVector store Configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings. @@ -176,7 +176,7 @@ PGVector: This creates the vector store in your PostgreSQL database to store the Use the [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database. -### Step 1: Import required modules +### Import required modules ```python #rag.py @@ -188,7 +188,7 @@ from langchain_openai import OpenAIEmbeddings ``` -### Step 2: Load metadata for improved efficiency +### Load metadata for improved efficiency Load metadata for improved efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document. @@ -210,7 +210,7 @@ In this code sample we: - Set up pagination for listing objects: We prepare pagination to handle potentially large lists of objects efficiently. - Iterate through the bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly. -### Step 3: Iterate through metadata +### Iterate through metadata Iterate through metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database. @@ -262,7 +262,7 @@ When a query is made, the RAG system will retrieve the most relevant embeddings, ### Query the RAG System with a pre-defined prompt template -### Step 1: Import required modules +### Import required modules ```python #rag.py @@ -273,7 +273,7 @@ from langchain_core.runnables import RunnablePassthrough ``` -### Step 2: Setup LLM for querying +### Setup LLM for querying Now, set up the RAG system to handle queries @@ -356,7 +356,7 @@ You can customize the streaming behavior further, such as implementing progress - Research assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities. - Content generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones. -### Conclusion +## Conclusion In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets within a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we ensured that our system avoids redundant data handling, allowing for smooth and efficient operations. The use of chunking optimizes document processing, maximizing the performance of the language model. Storing embeddings in PostgreSQL via pgvector enables rapid and scalable retrieval, ensuring quick responses to user queries.