From cacd16b4a58b90729b090e1a70a9da5e3ae3b948 Mon Sep 17 00:00:00 2001 From: Ines Bouissou <106828511+lineUCB@users.noreply.github.com> Date: Sun, 17 Nov 2024 22:46:09 -0800 Subject: [PATCH] Update README.md --- rag/README.md | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/rag/README.md b/rag/README.md index 7c0811e..0482098 100644 --- a/rag/README.md +++ b/rag/README.md @@ -18,19 +18,19 @@ Run this to install the required packages ### Pipeline from website or local to knowledge base Run [pipline_kb.py](Scraper_master/pipeline_kb.py) as the pipeline to scrape, chunk and embed websites into a knowledge base. The pipeline takes a task, which is a collection of content that will be saved into a single knowledge base, and will save all information at the root_folder designated in the task. The pipeline first scrapes, and then converts the content into markdown. Finally, it embeds and saves the everything as a knowledge base. This is all saved according to the path defined by root_folder. The knowledge base is automatically saved in the scraped data folder in a sub-folder labeled "pickle". - A .yaml file is used to specify the tasks to be performed. It should be should be structured as follows: - ``` - root_folder : "path/to/root/folder" - tasks : - - name : "Website Name" - local : False // True if is a Local file, False if it is a site that needs to be scraped - url : "https://website/site.url" - root : "https://website.url" - - name : "Folder Name" - local : True // Scraping Locally - url : "path/to/folder" - root : "path/to/folder - ``` +A .yaml file is used to specify the tasks to be performed. It should be should be structured as follows: +``` +root_folder : "path/to/root/folder" +tasks : + - name : "Website Name" + local : False // True if is a Local file, False if it is a site that needs to be scraped + url : "https://website/site.url" + root : "https://website.url" + - name : "Folder Name" + local : True // Scraping Locally + url : "path/to/folder" + root : "path/to/folder +``` ### Pre-requisites