Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add hyperlinks and paths validation. #132

Merged
merged 9 commits into from
Sep 20, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions .github/workflows/pr-path-detection.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0

name: Check Paths and Hyperlinks

on:
pull_request:
branches: [main]
types: [opened, reopened, ready_for_review, synchronize]

jobs:
check-the-validity-of-hyperlinks-in-README:
runs-on: ubuntu-latest
steps:
- name: Clean Up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIEval
uses: actions/checkout@v4

- name: Check the Validity of Hyperlinks
run: |
cd ${{github.workspace}}
fail="FALSE"
url_lines=$(grep -Eo '\]\(http[s]?://[^)]+\)' --include='*.md' -r .)
if [ -n "$url_lines" ]; then
for url_line in $url_lines; do
url=$(echo "$url_line"|cut -d '(' -f2 | cut -d ')' -f1|sed 's/\.git$//')
path=$(echo "$url_line"|cut -d':' -f1 | cut -d'/' -f2-)
response=$(curl -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid link from ${{github.workspace}}/$path: $url"
fail="TRUE"
fi
fi
done
fi

if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash

check-the-validity-of-relative-path:
runs-on: ubuntu-latest
steps:
- name: Clean up Working Directory
run: sudo rm -rf ${{github.workspace}}/*

- name: Checkout Repo GenAIEval
uses: actions/checkout@v4

- name: Checking Relative Path Validity
run: |
cd ${{github.workspace}}
fail="FALSE"
repo_name=${{ github.event.pull_request.head.repo.full_name }}
if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then
owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1)
branch="https://github.com/$owner/GenAIEval/tree/${{ github.event.pull_request.head.ref }}"
else
branch="https://github.com/opea-project/GenAIEval/blob/${{ github.event.pull_request.head.ref }}"
fi
link_head="https://github.com/opea-project/GenAIEval/blob/main"
png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http')
if [ -n "$png_lines" ]; then
for png_line in $png_lines; do
refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-)
png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1)
if [[ "${png_path:0:1}" == "/" ]]; then
check_path=${{github.workspace}}$png_path
elif [[ "${png_path:0:1}" == "#" ]]; then
check_path=${{github.workspace}}/$refer_path$png_path
else
check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path
fi
real_path=$(realpath $check_path)
if [ $? -ne 0 ]; then
echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist"
fail="TRUE"
else
url=$link_head$(echo "$real_path" | sed 's|.*/GenAIEval||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}"
url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIEval||')
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response" -ne 200 ]; then
echo "**********Validation failed, try again**********"
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev")
if [ "$response_retry" -eq 200 ]; then
echo "*****Retry successfully*****"
else
echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
fail="TRUE"
fi
else
echo "Check branch ${{ github.event.pull_request.head.ref }} successfully."
fi
fi
fi
fi
done
fi

if [[ "$fail" == "TRUE" ]]; then
exit 1
else
echo "All hyperlinks are valid."
fi
shell: bash
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ results = evaluate(args)

#### remote service usage

1. setup a separate server with [GenAIComps](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/lm-eval)
1. setup a separate server with [GenAIComps](https://github.com/opea-project/GenAIComps/tree/main/comps/llms/utils/lm-eval)

```
# build cpu docker
Expand Down
2 changes: 1 addition & 1 deletion doc/platform-optimization/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ Let us consider isolating AI inference and reranking containers in
application's Gaudi accelerated pipeline.

In the
[manifest](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/manifests/gaudi/chatqna.yaml)
[manifest](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/kubernetes/intel/hpu/gaudi/manifest/chatqna.yaml)
there are "tgi", "tei" and "teirerank" containers in "chatqna-tgi" and
"chatqna-tei" and "chatqna-teirerank" deployments that will need a lot
of CPUs. They implement text-generation-interface and
Expand Down
4 changes: 2 additions & 2 deletions evals/evaluation/autorag/evaluation/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# AutoRAG to evaluate the RAG system performance

AutoRAG is help to end-to-end evaluate the performance of the whole system. Currently, we support to evaluate the performance from 4 perspectives, answer_relevancy, faithfulness, context_recall, context_precision. Before using this service, the use should firstly prepare the groundtruth dataset in the [standard format](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/ground_truth.jsonl). We also provide a [script](https://github.com/opea-project/GenAIEval/blob/main/evals/evaluation/autorag/data_generation/gen_eval_dataset.py) to automatically generate the groundtruth query and answer.
AutoRAG is help to end-to-end evaluate the performance of the whole system. Currently, we support to evaluate the performance from 4 perspectives, answer_relevancy, faithfulness, context_recall, context_precision. Before using this service, the use should firstly prepare the groundtruth dataset in the [standard format](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/ragas/ground_truth.jsonl). We also provide a [script](https://github.com/opea-project/GenAIEval/blob/main/evals/evaluation/autorag/data_generation/gen_eval_dataset.py) to automatically generate the groundtruth query and answer.

## Service preparation
The evaluation for the RAG system is based on the set up of the RAG services. Please follow [the steps](https://github.com/opea-project/GenAIExamples/blob/main/ChatQnA/README.md) to set up your RAG services.
Expand All @@ -12,7 +12,7 @@ At this moment, we provide a solution that test the single group of parameters a
python -u ragas_evaluation_benchmark.py --ground_truth_file ground_truth.jsonl --search_type mmr --k 1 --fetch_k 5 --score_threshold 0.3 --top_n 1 --temperature 0.01 --top_k 5 --top_p 0.95 --repetition_penalty 1.1 --use_openai_key True
```

For evaluating multiple groups of parameters, please use [this script](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/run_rag_benchmark.py).
For evaluating multiple groups of parameters, please use [this script](https://github.com/opea-project/GenAIEval/blob/main/evals/benchmark/ragas/run_rag_benchmark.py).
```bash
python -u run_rag_benchmark.py --config config.yaml
```
Expand Down
2 changes: 1 addition & 1 deletion evals/evaluation/rag_eval/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
- [Prerequisites](#prerequisites)
- [MultiHop (English dataset)](#multihop)
- [Launch Service of RAG System](#launch-service-of-rag-system)
- [Launch Service of LLM-as-a-Judge](launch-service-of-llm)
- [Launch Service of LLM-as-a-Judge](#launch-service-of-llm-as-a-judge)
- [Prepare Dataset](#prepare-dataset)
- [Evaluation](#evaluation)
- [CRUD (Chinese dataset)](#crud)
Expand Down
2 changes: 1 addition & 1 deletion examples/AudioQnA/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Dataset


We evaluate the ASR accuracy on the test set of librispeech [dataset](andreagasparini/librispeech_test_only), which contains 2620 records of audio and texts.
We evaluate the ASR accuracy on the test set of librispeech [dataset](https://huggingface.co/datasets/andreagasparini/librispeech_test_only), which contains 2620 records of audio and texts.

## Metrics

Expand Down
Loading