README

Introduction

This project introduces an advanced evaluation framework for RAG Systems that significantly improves upon existing open-source methodologies. Our framework achieves:

Superior Accuracy: Consistently outperforms standard open-source benchmarks
Reduced Variance: Significantly lower variance in evaluation metrics compared to baseline methods
Enhanced Reproducibility: Structured approach ensuring consistent results across evaluations

The image below shows the overview of our optimization method.

Prerequisites

Vertex AI Setup

Registration:
Sign up for Vertex AI and ensure that your account has the necessary permissions to execute tasks. This may include enabling APIs and setting IAM roles.
GCP Authentication JSON File:
Download your Google Cloud authentication JSON file from the GCP Console and save it to a secure location on your local machine. This file is required for accessing Vertex AI programmatically.

Environment Requirements

Python Version:
The scripts are tested with Python 3.12.7. Ensure your environment matches this version to avoid compatibility issues.
Required Libraries:
The project uses several Python libraries, including:
- langchain
- llama-index
- ragas
  A complete list of dependencies is provided in the requirements.txt file. Install all dependencies with the following command:
```
pip install -r requirements.txt
```

File Structure

retrieval_evalulation.ipynb: The implementation of the retrieval system, including Vector-based Retrieval, BM25 Retrieval and Hybrid Retrieval.
./llm_evaluation/llm_result_evaluation.ipynb:
The main evaluation script that integrates all components of the framework. This notebook orchestrates the evaluation workflow.
./llm_evaluation/factualcorrectness_revise.py:
The core script that defines custom metrics and evaluation logic. Significant modifications have been made to existing open-source frameworks to improve accuracy and flexibility.
./llm_evaluation/evaluation_plots: The t-validation metrics (factual correctness, faithfulness, semantic similarity) of the three different retrieval strategies.
Other supporting files:
Additional scripts in the directory provide utility functions and support for various evaluation tasks. All files are referenced within the notebook and should remain in the same directory.

How to Use

Step 1: Configure Your Environment

Install the required Python version (3.12.7).

Set up your Python environment using the following commands:

python -m venv env
source env/bin/activate   # On Windows, use `env\Scripts\activate`
pip install -r requirements.txt

Step 2: Set Up GCP Authentication

Place the downloaded GCP authentication JSON file in a known location.
Export the path to your environment:
```
export GOOGLE_APPLICATION_CREDENTIALS="path/to/your/auth.json"
```
Replace path/to/your/auth.json with the actual path to your JSON file.

Step 3: Run the Evaluation

Open ./llm_evaluation/llm_result_evaluation.ipynb in a Jupyter Notebook environment.
Execute the notebook cells sequentially. The notebook will automatically load and use the supporting files in the directory.
The evaluation results, including metrics and comparisons, will be generated within the notebook.
Open and execute retrieval_evalulation.ipynb in a Jupyter Notebook environment to run and test the three different retrieval strategies and generate the corresponding results.

Custom Metrics

The evaluation framework introduces enhancements to key metrics, including:

Improved In-context learning:
Optimized prompts and examples for more accurate evaluations.
Few-shot learning:
- Adjustments to sampling and scoring methods for enhanced benchmarking
- More consistent performance across diverse test cases
- Better handling of edge cases and complex scenarios

These modifications are implemented in factualcorrectness_revise.py and automatically integrated into the main notebook.

Notes

Ensure your environment is correctly configured before running the scripts.
For detailed implementation of metrics and logic, refer to the comments within factualcorrectness_revise.py.
The evaluation results demonstrate consistently lower variance compared to existing frameworks, making our approach more reliable for production use.
Our framework's improved stability makes it particularly suitable for continuous evaluation pipelines where consistent results are crucial.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
chroma_db		chroma_db
data		data
images		images
llm_evaluation		llm_evaluation
README.md		README.md
process.md		process.md
requirements.txt		requirements.txt
retrieval_evaluation.ipynb		retrieval_evaluation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Introduction

Prerequisites

Vertex AI Setup

Environment Requirements

File Structure

How to Use

Step 1: Configure Your Environment

Step 2: Set Up GCP Authentication

Step 3: Run the Evaluation

Custom Metrics

Notes

About

Releases

Packages

Contributors 2

Languages

hxyin/genai_proj_rag

Folders and files

Latest commit

History

Repository files navigation

README

Introduction

Prerequisites

Vertex AI Setup

Environment Requirements

File Structure

How to Use

Step 1: Configure Your Environment

Step 2: Set Up GCP Authentication

Step 3: Run the Evaluation

Custom Metrics

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages