Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
lineUCB authored Nov 18, 2024
1 parent 9fb93fa commit cacd16b
Showing 1 changed file with 13 additions and 13 deletions.
26 changes: 13 additions & 13 deletions rag/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,19 +18,19 @@ Run this to install the required packages

### Pipeline from website or local to knowledge base
Run [pipline_kb.py](Scraper_master/pipeline_kb.py) as the pipeline to scrape, chunk and embed websites into a knowledge base. The pipeline takes a task, which is a collection of content that will be saved into a single knowledge base, and will save all information at the root_folder designated in the task. The pipeline first scrapes, and then converts the content into markdown. Finally, it embeds and saves the everything as a knowledge base. This is all saved according to the path defined by root_folder. The knowledge base is automatically saved in the scraped data folder in a sub-folder labeled "pickle".
A .yaml file is used to specify the tasks to be performed. It should be should be structured as follows:
```
root_folder : "path/to/root/folder"
tasks :
- name : "Website Name"
local : False // True if is a Local file, False if it is a site that needs to be scraped
url : "https://website/site.url"
root : "https://website.url"
- name : "Folder Name"
local : True // Scraping Locally
url : "path/to/folder"
root : "path/to/folder
```
A .yaml file is used to specify the tasks to be performed. It should be should be structured as follows:
```
root_folder : "path/to/root/folder"
tasks :
- name : "Website Name"
local : False // True if is a Local file, False if it is a site that needs to be scraped
url : "https://website/site.url"
root : "https://website.url"
- name : "Folder Name"
local : True // Scraping Locally
url : "path/to/folder"
root : "path/to/folder
```


### Pre-requisites
Expand Down

0 comments on commit cacd16b

Please sign in to comment.