Add CRAG benchmark #88

minmin-intel · 2024-08-27T18:34:13Z

Description

Add Meta CRAG benchmark for Agent QnA system benchmarking

Issues

NA

Type of change

List the type of change like below. Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds new functionality)

Dependencies

CRAG dataset: currently we give instructions in readme for users to download dataset themselves.

Tests

All the commands written in readme have been tested in local system to work.

Also ran the test_ragas.py in tests folder. Below are the results using llama3-70B-instruct as judge.
{'answer_relevancy': 0.4586, 'faithfulness': 1.0000, 'answer_correctness': 0.2400, 'answer_similarity': 0.9598, 'context_precision': 1.0000, 'context_recall': 1.0000}

Signed-off-by: minmin-intel <[email protected]>

for more information, see https://pre-commit.ci

evals/evaluation/crag_eval/README.md

Signed-off-by: minmin-intel <[email protected]>

for more information, see https://pre-commit.ci

Signed-off-by: minmin-intel <[email protected]>

for more information, see https://pre-commit.ci

lkk12014402 · 2024-09-10T02:31:29Z

LGTM

Signed-off-by: lvliang-intel <[email protected]>

minmin-intel added 11 commits August 22, 2024 16:29

add crag eval first pass code

be54803

add first pass llm eval code

6da78fe

fix answer correctness code

d57f9d5

Signed-off-by: minmin-intel <[email protected]>

docker container for crag eval

a61efa8

sample data for testing

914cdc2

docker compose for tgi gaudi

1b49b1b

Signed-off-by: minmin-intel <[email protected]>

fix tgi gaudi docker compose for llama3 70b

7867461

update llm eval code

8e999dc

Signed-off-by: minmin-intel <[email protected]>

allow per sample grading

7b9b9b2

Signed-off-by: minmin-intel <[email protected]>

save graded scores

5c58b72

Signed-off-by: minmin-intel <[email protected]>

ipdate readme

d043dde

Signed-off-by: minmin-intel <[email protected]>

minmin-intel requested review from lkk12014402 and chensuyue August 27, 2024 18:34

minmin-intel and others added 4 commits August 27, 2024 14:30

Merge branch 'main' into crag-eval

ba55652

[pre-commit.ci] auto fixes from pre-commit.com hooks

9556f9a

for more information, see https://pre-commit.ci

update readme and test all commands

90e855b

[pre-commit.ci] auto fixes from pre-commit.com hooks

153e30f

for more information, see https://pre-commit.ci

lvliang-intel approved these changes Aug 31, 2024

View reviewed changes

evals/evaluation/crag_eval/README.md Outdated Show resolved Hide resolved

lkk12014402 reviewed Sep 5, 2024

View reviewed changes

evals/evaluation/crag_eval/README.md Outdated Show resolved Hide resolved

lkk12014402 approved these changes Sep 5, 2024

View reviewed changes

mv crag_eval to agent_eval

d51c083

Signed-off-by: minmin-intel <[email protected]>

minmin-intel requested a review from XinyuYe-Intel as a code owner September 9, 2024 16:47

minmin-intel and others added 4 commits September 9, 2024 10:22

Merge branch 'main' into crag-eval

e77e58c

[pre-commit.ci] auto fixes from pre-commit.com hooks

e4cd9c6

for more information, see https://pre-commit.ci

update test case col names in grade_answer.py

69f87c2

Signed-off-by: minmin-intel <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

82894aa

for more information, see https://pre-commit.ci

lkk12014402 added this to the v1.0 milestone Sep 10, 2024

minmin-intel merged commit a9b087f into opea-project:main Sep 10, 2024
9 checks passed

lkk12014402 pushed a commit that referenced this pull request Sep 19, 2024

Fix ChatQnA streaming response issue (#88)

9aa89ec

Signed-off-by: lvliang-intel <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add CRAG benchmark #88

Add CRAG benchmark #88

minmin-intel commented Aug 27, 2024 •

edited

Loading

lkk12014402 commented Sep 10, 2024

Add CRAG benchmark #88

Add CRAG benchmark #88

Conversation

minmin-intel commented Aug 27, 2024 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

lkk12014402 commented Sep 10, 2024

minmin-intel commented Aug 27, 2024 •

edited

Loading