Advanced RAG Pipeline

The Advanced RAG Pipeline is a powerful system that leverages various state-of-the-art NLP models and techniques to perform semantic search and retrieval, reranking, and response generation. It is designed to work with pre-chunked documents from Hugging Face, generate embeddings using SentenceBERT, utilize FAISS for HNSW semantic search, rerank using distilBERT, and generate responses using OpenAI GPT3.5.

Features

Chunked Documents: The pipeline supports pre-chunked documents from Hugging Face, allowing for efficient processing and retrieval. These documents are a collection of research papers from arvix.
Embedding Generation: SentenceBERT is used to generate high-quality embeddings for the documents, capturing their semantic meaning.
Semantic Search and Retrieval: FAISS with HNSW index is employed for efficient semantic search and retrieval, enabling fast and accurate retrieval of relevant documents.
Reranking: The pipeline utilizes distilBERT for reranking the retrieved documents, ensuring the most relevant documents are prioritized.
Response Generation: OpenAI GPT3.5 is used to generate informative and contextually relevant responses based on the retrieved and reranked documents.

Installation

To install and set up the Advanced RAG Pipeline, follow these steps:

Clone the repository:

git clone https://github.com/Kdotseth7/advanced-rag.git

Install the required dependencies:
```
pip install -r requirements.txt
```

Configure the pipeline:

Update the environment file .env with the following settings:

OPENAI_API_KEY="your-openai-key"
TOKENIZERS_PARALLELISM=true
BATCH_SIZE=32
MODEL_NAME="model-name"

Run the pipeline:
```
python main.py
```

Contributing

Contributions to the Advanced RAG Pipeline are welcome! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.gitignore		.gitignore
README.md		README.md
dataset.py		dataset.py
embeddings.py		embeddings.py
llm.py		llm.py
main.py		main.py
requirements.txt		requirements.txt
reranker.py		reranker.py
retriever.py		retriever.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced RAG Pipeline

Features

Installation

Contributing

License

About

Releases

Packages

Languages

Kdotseth7/advanced-rag

Folders and files

Latest commit

History

Repository files navigation

Advanced RAG Pipeline

Features

Installation

Contributing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages