ChatPDF Clone

Chatbot app for interactively conversing with documents

Introduction

With the emergence of services like ChatGPT, showcasing the power of LLMs and RAG in generating contextually relevant responses, I was motivated to understand their underlying mechanics. It led to me working on this chatbot which not only converses intelligently but also interacts seamlessly with PDF documents.

Technical Overview

We utilize LlamaIndex, enhanced with the Llama2 model API using Gradient's LLM solution, seamlessly merge it with DataStax's Apache Cassandra as a vector database and develop the frontend UI with Streamlit using Python.

Architecture Diagram

LlamaIndex

Serves as a powerful framework for handling embeddings, efficient document indexing and retrieval.

Retriever Engine

GradientAI's LLM

By tapping into Gradient's LLM solution, we leverage state-of-the-art open source language models such as Meta's LLAMA 2 model specifically llama-2b-chat, allowing the chatbot to generate coherent and informed responses.

Cassandra Vector Store

Integrates Apache Cassandra as a vector database, offering a solution for storing and managing vector embeddings of the provided documents which facilitates efficient retrieval and storage of document-related information.

Streamlit

Simplifies the creation and deployment of web applications, providing a user friendly interface to initiate conversations with chatbot, explore document-related insights, and experience immersive interactions with PDFs in a visually appealing manner.

Prerequisites

Python 3.9 or above
GradientAI Account:
- Create an account on GradientAI to access the LLMs required for training and deploying models.
- Create a new workspace & generate and store your Access token and Workspace ID credentials as secrets/environment variables.
AstraDB Account
- Set up an account on AstraDB, a cloud-native database service built on Apache Cassandra and create a Vector Database.
- Under Connect generate an App Token as Database Administrator and save the app-token.json and the Secure-Connect-Bundle.zip.

Setup

To set up ChatPDF Clone, follow these steps:

Clone the Repository: Clone the ChatPDF Clone repository to your local machine.
```
git clone https://github.com/your-username/ChatPDF-Clone.git
```
Install Dependencies: Navigate to the project directory and install the necessary dependencies.
```
cd ChatPDF-Clone/project
pip install -r requirements.txt
```
Configure Credentials: Add the GradientAI credentials as environment variables to your project environment. Copy the Secure-Connect-Bundle.zip and app-token.json into the project root directory.

Open in Google Colab

Click to open the Notebook directly in Google Colab. Configure the access tokens under the Secrets section and upload PDFs into the Documents folder.

Usage

Once the setup is complete, you can use ChatPDF Clone for interactive conversations. Run the script as follows and navigate to the locahost URL generated to access the webapp:

streamlit run main.py

In the following examples, I provided the PDF for a summary of Merchant of Venice to the service.

Screenshots

Challenges

During the development of ChatPDF Clone, we encountered several challenges, including:

Integration Complexity: Integrating GradientAI and AstraDB posed challenges in terms of authentication and data synchronization.
Retrieval Performance: Retrieval Accuracy and Speed was severely affected with increase in document quantity.
Handling Dynamic Conversations: Adapting the chatbot to handle dynamic and evolving conversations while maintaining coherence presented a challenge.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
project		project
ChatPDF_Clone.ipynb		ChatPDF_Clone.ipynb
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChatPDF Clone

Introduction

Technical Overview

LlamaIndex

GradientAI's LLM

Cassandra Vector Store

Streamlit

Prerequisites

Setup

Open in Google Colab

Usage

Screenshots

Challenges

About

Languages

License

SourasishBasu/ChatPDF-clone-llama2b

Folders and files

Latest commit

History

Repository files navigation

ChatPDF Clone

Introduction

Technical Overview

LlamaIndex

GradientAI's LLM

Cassandra Vector Store

Streamlit

Prerequisites

Setup

Open in Google Colab

Usage

Screenshots

Challenges

About

Topics

Resources

License

Stars

Watchers

Forks

Languages