Skip to content

PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References

Notifications You must be signed in to change notification settings

JerryYin777/PaperHelper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References

[1] Introduction

Thanks to the Great Ms. Freax for designing the Athenas-Oracle project.

Based on it, we have made improvements and designed the Paper Helper for Machine Learning Scientists. With the effects of RAG Fusion and RAFT (RAG Finetune, fine-tuned using GPT-4-1106-Preview API on the 52,000 MLArxivPapers and ArxivQA dataset as the backend), it can effectively reduce hallucinations and enhance retrieval relevance. We have implemented an end-to-end application of parallel generating, providing useful information to paper readers based on references ranked by relevance. We also incorporated structural relationships to represent the extracted information.

In short, everything is designed to enable a machine learning researcher to read papers more efficiently and provide the most reliable references based on paper citations!

[2] Implementation Details

Overview

The assistant utilizes three tools: search, gather evidence, and answer questions. These tools enable it to find and parse relevant full-text research papers, identify specific sections in the paper that help answer the question, summarize those sections with the context of the question (called evidence), and then generate an answer based on the evidence. It is an agent so that the LLMs orchestrating the tools can adjust the input to paper searches, gather evidence with different phrases, and assess if an answer is complete.

Basic RAG

The basic RAG simply splits the search prompt into simple words in a crude manner, and may produce certain spelling illusions without truly understanding the user's intent. Basic RAG

RAG Fusion with RAFT

Our system also has integrated the RAFT method. This approach enhances the capability of LLMs in specific RAG tasks by leveraging the core idea that if LLMs can "learn" documents in advance, it can improve RAG's performance.

We finetuned the OpenAI API using 52,000 domain-specific papers from the field of machine learning to augment the knowledge of PaperHelper within the machine learning domain, thereby assisting machine learning scientists in reading papers more efficiently and accurately.

Extract Relevance

With the implementation of RAFT, we can extract the reference section at the end of articles more efficiently. First, we use RAG to traverse all the references in the article. Then, based on the knowledge from the LLMs, we refine the information using the top-k algorithm to identify the literature most relevant to the article.

We can find that through the RAFT method, the model integrates cutting-edge knowledge, enabling readers to further explore academic papers based on current information rather than providing outdated and misleading content.

[3] Usage

Use the following command step by step:

  1. Clone the Repository
git clone https://github.com/JerryYin777/PaperHelper.git
  1. Install Dependencies
cd PaperHelper
pip install -r requirements.txt
  1. Set OpenAI API Key
cd .streamlit
touch secrets.toml #input your OPENAI_API_KEY = "sk-yourapikeyhere" here
  1. Start PaperHelper
streamlit run app.py

Note:

  1. Set allow_dangerous_deserialization: bool = True first, where you can find in faiss.py.
  1. You may also embed your pdf first in the application (click the button), or you may raise error Exceptation: Directory index does not exist.

About

PaperHelper: Knowledge-Based LLM QA Paper Reading Assistant with Reliable References

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages