merge main

modelscope · Dec 26, 2024 · 08240b1 · 08240b1
2 parents 1ea478c + d8a16b5
commit 08240b1
Show file tree

Hide file tree

Showing 36 changed files with 602 additions and 753 deletions.
diff --git a/.gitignore b/.gitignore
@@ -152,3 +152,4 @@ output/
 _build/
 swift.test*
 /cache
+evalscope/backend/rag_eval/ragas/prompts/chinese
diff --git a/README.md b/README.md
@@ -68,7 +68,7 @@ Please scan the QR code below to join our community groups:
 
 [Discord Group](https://discord.com/invite/D27yfEFVz5)              |  WeChat Group | DingTalk Group
 :-------------------------:|:-------------------------:|:-------------------------:
-<img src="https://sail-moe.oss-cn-hangzhou.aliyuncs.com/modelscope/user_group/discord_qr.jpg" width="160" height="160">  |  <img src="https://sail-moe.oss-cn-hangzhou.aliyuncs.com/modelscope/user_group/wechat.png" width="160" height="160"> | <img src="https://sail-moe.oss-cn-hangzhou.aliyuncs.com/modelscope/user_group/dingding.png" width="160" height="160">
+<img src="docs/asset/discord_qr.jpg" width="160" height="160">  |  <img src="docs/asset/wechat.png" width="160" height="160"> | <img src="docs/asset/dingding.png" width="160" height="160">
 
 
 ## 🎉 News

diff --git a/README_zh.md b/README_zh.md
@@ -74,7 +74,7 @@ EvalScope还适用于多种评测场景，如端到端RAG评测、竞技场模
 
 [Discord Group](https://discord.com/invite/D27yfEFVz5)              |  微信群 | 钉钉群
 :-------------------------:|:-------------------------:|:-------------------------:
-<img src="https://sail-moe.oss-cn-hangzhou.aliyuncs.com/modelscope/user_group/discord_qr.jpg" width="160" height="160">  |  <img src="https://sail-moe.oss-cn-hangzhou.aliyuncs.com/modelscope/user_group/wechat.png" width="160" height="160"> | <img src="https://sail-moe.oss-cn-hangzhou.aliyuncs.com/modelscope/user_group/dingding.png" width="160" height="160">
+<img src="docs/asset/discord_qr.jpg" width="160" height="160">  |  <img src="docs/asset/wechat.png" width="160" height="160"> | <img src="docs/asset/dingding.png" width="160" height="160">
 
 
 ## 🎉 新闻

diff --git a/docs/asset/dingding.png b/docs/asset/dingding.png
diff --git a/docs/asset/discord_qr.jpg b/docs/asset/discord_qr.jpg
diff --git a/docs/asset/wechat.png b/docs/asset/wechat.png
diff --git a/docs/en/user_guides/backend/rageval_backend/ragas.md b/docs/en/user_guides/backend/rageval_backend/ragas.md
@@ -67,7 +67,6 @@ generate_testset_task_cfg = {
             "test_size": 10,
             "output_file": "outputs/testset.json",
             "knowledge_graph", "outputs/knowledge_graph.json",
-            "distribution": {"simple": 0.5, "multi_context": 0.4, "reasoning": 0.1},
             "generator_llm": {
                 "model_name_or_path": "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4",
             },
@@ -89,10 +88,6 @@ Configuration file description:
     - `test_size`: `int`: Size of the generated test set, e.g., 5.
     - `output_file`: `str`: Path of the generated dataset output file, e.g., "outputs/testset.json".
     - `knowledge_graph`: `str`: The file path of the knowledge graph, e.g., "outputs/knowledge_graph.json". The knowledge graph generated during the document processing will be saved in this path. If a knowledge graph already exists at this path, it will be loaded directly, skipping the generation step of the knowledge graph.
-    - `distribution`: `dict`: Configuration of the content distribution in the test set.
-      - `simple`: `float`: Proportion of simple content, e.g., 0.5.
-      - `multi_context`: `float`: Proportion of multi-context content, e.g., 0.4.
-      - `reasoning`: `float`: Proportion of reasoning content, e.g., 0.1.
     - `generator_llm`: `dict`: Configuration of the generator LLM:
       - If using a local model, supports the following parameters:
         - `model_name_or_path`: `str`: Name or path of the generator model, e.g., "qwen/Qwen2-7B-Instruct" can be automatically downloaded from ModelScope; providing a path will load the model locally.
@@ -113,6 +108,13 @@ ragas.testset.transforms.engine - ERROR - unable to apply transformation: 'Gener
 This is because the model output format is incorrect, leading to parsing errors. In this case, please try using a larger model, such as `Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4`, or proprietary models like `GPT-4o`.
 ````
 
+````{tip}
+If you encounter the following error, or if the dataset generation is not effective, it indicates that there may be issues with the `unstructured` library processing the document. You can manually preprocess the input document into txt format.
+```
+ValueError: Documents appear to be too short (i.e., 100 tokens or less). Please provide longer documents.
+```
+````
+
 **Execute Task**
 ```python
 from evalscope.run import run_task

diff --git a/docs/zh/user_guides/backend/rageval_backend/ragas.md b/docs/zh/user_guides/backend/rageval_backend/ragas.md
@@ -67,7 +67,6 @@ generate_testset_task_cfg = {
             "test_size": 10,
             "output_file": "outputs/testset.json",
             "knowledge_graph", "outputs/knowledge_graph.json",
-            "distribution": {"simple": 0.5, "multi_context": 0.4, "reasoning": 0.1},
             "generator_llm": {
                 "model_name_or_path": "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4",
             },
@@ -89,10 +88,6 @@ generate_testset_task_cfg = {
     - `test_size`: `int`：生成测试集的大小，例如 5。
     - `output_file`: `str`：生成数据集的输出文件路径，例如 "outputs/testset.json"。
     - `knowledge_graph`: `str`：知识图谱文件路径，例如 "outputs/knowledge_graph.json"，文档处理过程中生成的知识图谱会保存在该路径下；若该路径已有知识图谱，则会直接加载知识图谱，跳过生成知识图谱的步骤。
-    - `distribution`: `dict`：测试集内容的分布配置。
-      - `simple`: `float`：简单内容的分布比例，例如 0.5。
-      - `multi_context`: `float`：多上下文内容的分布比例，例如 0.4。
-      - `reasoning`: `float`：推理内容的分布比例，例如 0.1。
     - `generator_llm`: `dict`：生成器LLM的配置：
       - 若使用本地模型，支持如下参数：
         - `model_name_or_path`: `str`：生成器模型的名称或路径，例如 "Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4" 可以从 ModelScope 自动下载模型；填入路径则从本地加载模型。
@@ -114,6 +109,13 @@ ragas.testset.transforms.engine - ERROR - unable to apply transformation: 'Gener
 这是因为模型输出的格式不对，导致解析出错，此时请尝试使用规模更大的模型，例如 `Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4`，或是闭源模型`GPT-4o`等。
 ````
 
+````{tip}
+若出现如下错误，或生成数据集效果不好，说明`unstructured`库处理文档时，可能存在问题，可自行预处理输入文档为txt格式
+```
+ValueError: Documents appears to be too short (ie 100 tokens or less). Please provide longer documents.
+```
+````
+
 **执行任务**
 ```python
 from evalscope.run import run_task

diff --git a/.../backend/rag_eval/ragas/prompts/chinese/AnswerCorrectness/correctness_prompt_chinese.json b/.../backend/rag_eval/ragas/prompts/chinese/AnswerCorrectness/correctness_prompt_chinese.json
diff --git a/...end/rag_eval/ragas/prompts/chinese/AnswerCorrectness/long_form_answer_prompt_chinese.json b/...end/rag_eval/ragas/prompts/chinese/AnswerCorrectness/long_form_answer_prompt_chinese.json
diff --git a/...e/backend/rag_eval/ragas/prompts/chinese/AnswerRelevancy/question_generation_chinese.json b/...e/backend/rag_eval/ragas/prompts/chinese/AnswerRelevancy/question_generation_chinese.json
diff --git a/...end/rag_eval/ragas/prompts/chinese/ContextPrecision/context_precision_prompt_chinese.json b/...end/rag_eval/ragas/prompts/chinese/ContextPrecision/context_precision_prompt_chinese.json
diff --git a/...scope/backend/rag_eval/ragas/prompts/chinese/CustomNodeFilter/scoring_prompt_chinese.json b/...scope/backend/rag_eval/ragas/prompts/chinese/CustomNodeFilter/scoring_prompt_chinese.json
diff --git a/...e/backend/rag_eval/ragas/prompts/chinese/Faithfulness/nli_statements_message_chinese.json b/...e/backend/rag_eval/ragas/prompts/chinese/Faithfulness/nli_statements_message_chinese.json
-Original file line number
+Diff line change
@@ Expand Up / @@ -152,3 +152,4 @@ output/ @@
     _build/
     swift.test*
     /cache
+    evalscope/backend/rag_eval/ragas/prompts/chinese