LLM-Knowledge-Pool-RAG

The repository contains Part 1 of an LLM Pipeline for Design Exploration.

This project covers parsing raw data, creating an embedding database, and doing knowledge retrieval with a RAG system.

Setup

-- Start by creating a virtual environment (python 3.10 is recomended):

git clone https://github.com/jomi13/LLM-Knowledge-Pool-RAG
cd <your local repo directory>
python3.10 -m venv myenv
source myenv/bin/activate
pip install -r requirements.txt

-- Create a keys.py file inside the directory, containing any necessary keys you may need, like so:

LLAMAPARSE_API_KEY = "your key"
OPENAI_API_KEY = "your key"
....

Note: Get a key from Llama-index here. Get a key from OpenAI here.

-- Install LM Studio and download:

An LLM of your preference in GGUF format, such as "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF"
An embedding model, such as: "nomic-ai/nomic-embed-text-v1.5-GGUF"

-- In config.py, setup the configuration for the model you just downloaded. Example for Llama 3:

llama3 = [
{
"model": "lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF",
'api_key': 'any string here is fine',
'api_type': 'openai',
'base_url': "http://localhost:1234/v1",
"cache_seed": random.randint(0, 100000),
}]

Running

--In LM Studio:

Go to Local Server and load both models
Click Start Server

-- Run the python scripts in order:

01_parse_pdf.py will take any pdfs inside the knowledge_pool folder and turn them into structured .txt files.
02_create_vector_db.py will create an embeddings database as a json.
02.1_merge_embeddings.py is for optional use, if you want to join multiple embedding sources into a single one.
03_ask_rag.pywill let you ask questions about your corpus of text with a RAG system.

Note: To run the RAG with your own corpus of text, place any pdf files inside the folder knowledge_pool. The script 03_ask_rag has two modes: local inference (with LM Studio) or using OpenAI (GPT/4) - check inside for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Knowledge-Pool-RAG

Setup

Running

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
knowledge_pool		knowledge_pool
.gitignore		.gitignore
01_parse_pdf.py		01_parse_pdf.py
02.1_merge_embeddings.py		02.1_merge_embeddings.py
02_create_vector_db.py		02_create_vector_db.py
03_ask_rag _Vietnamese.py		03_ask_rag _Vietnamese.py
03_ask_rag.py		03_ask_rag.py
README.md		README.md
config.py		config.py
requirements.txt		requirements.txt

baoqtrinh97/LLM-Knowledge-Pool-RAG

Folders and files

Latest commit

History

Repository files navigation

LLM-Knowledge-Pool-RAG

Setup

Running

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages