Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add readme for autorag #39

Merged
merged 3 commits into from
Jun 28, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions evals/evaluation/autorag/evaluation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# AutoRAG to evaluate the RAG system performance

AutoRAG is help to end-to-end evaluate the performance of the whole system. Currently, we support to evaluate the performance from 4 perspectives, answer_relevancy, faithfulness, context_recall, context_precision. Before using this service, the use should firstly prepare the groundtruth dataset in the [standard format](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/ground_truth.jsonl). We also provide a [script](https://github.com/opea-project/GenAIEval/blob/main/evals/evaluation/autorag/data_generation/gen_eval_dataset.py) to automatically generate the groundtruth query and answer.

## Service preparation
The evaluation for the RAG system is based on the set up of the RAG services. Please follow [the steps](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/README.md) to set up your RAG services.

## RAG evaluation
At this moment, we provide a solution that test the single group of parameters and multiple groups of parameters. For evaluating the single group of parameters, please firectly use [this script](https://github.com/opea-project/GenAIEval/blob/main/evals/evaluation/autorag/evaluation/ragas_evaluation_benchmark.py).

```bash
python -u ragas_evaluation_benchmark.py --ground_truth_file ground_truth.jsonl --search_type mmr --k 1 --fetch_k 5 --score_threshold 0.3 --top_n 1 --temperature 0.01 --top_k 5 --top_p 0.95 --repetition_penalty 1.1 --use_openai_key True
```

For evaluating multiple groups of parameters, please use [this script](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/run_rag_benchmark.py).
```bash
python -u run_rag_benchmark.py --config config.yaml
```

The group parameters should predefined in a `config.yaml`. It will pass available parameters to the RAG system.

## Notes
Due to some dependences issues, we can use OpenAI-series models to evaluate the RAG system from the four perspectives, answer_relevancy, faithfulness, context_recall, context_precision. If you want to use your local model to evaluate the RAG system, it can only support the evaluation for answer_relevancy and faithfulness.