Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

modelscope / evalscope Public

Notifications You must be signed in to change notification settings
Fork 38
Star 324

Code
Issues 18
Pull requests 1
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Releases: modelscope/evalscope

Releases · modelscope/evalscope

v0.8.2 release

26 Dec 12:08

Yunnglin

Compare

Choose a tag to compare

Loading

v0.8.2 release Latest

Latest

What's Changed

add user group by @Yunnglin in #251
fix perf seed by @Yunnglin in #254
add spawn env by @Yunnglin in #256
Fix: sglang API response does not contain 'object' field. by @tghfly in #260
fix parse response by @Yunnglin in #262
fix predict by @Yunnglin in #264
compat ragas 0.2.9 and remove chinese prompt cache by @Yunnglin in #265

New Contributors

@tghfly made their first contribution in #260

Full Changelog: v0.8.1...v0.8.2

Contributors

Yunnglin and tghfly

Assets 2

Loading

All reactions

v0.8.1 release

17 Dec 12:06

Yunnglin

Compare

Choose a tag to compare

Loading

v0.8.1 release

What's Changed

Unify opencompass and vlmeval output dirs by @Yunnglin in #242
Perf add more metrics by @Yunnglin in #245
Perf add trust remote parameter by @Yunnglin in #246
Compat ms-swift<3.0 by @Yunnglin in #249
Fix humaneval for native eval by @Yunnglin in #248

中文版本

统一 opencompass 和 vlmeval 输出目录，作者：@Yunnglin，相关链接：#242
模型压测：增加更多指标，作者：@Yunnglin，相关链接：#245
模型压测：添加trust remote参数，作者：@Yunnglin，相关链接：#246
兼容 ms-swift<3.0，作者：@Yunnglin，相关链接：#249
修复本地评估的 humaneval 问题，作者：@Yunnglin，相关链接：#248

Full Changelog: v0.8.0...v0.8.1

Contributors

Yunnglin

Assets 2

Loading

All reactions

v0.8.0 release

14 Dec 17:30

Compare

Choose a tag to compare

Loading

v0.8.0 release

Release Notes

Optimize Native eval and remove template_type #231
The evalscope perf command supports the --outputs-dir configuration. #232
Support ragas 0.2.7 #234

Bug Fixes

Fix longwriter docs #239
Fix lint for longwriter #240
Fix lint #237
Unify perf output #238

Documentation Updates

Fix longwriter docs #239
Optimize Native eval and remove template_type #231

中文说明

特性

取消Native模式评测中template_type参数 #231
perf模块支持--output-dir #232
支持适配最新的ragas 0.2.7版本 #234

缺陷修复

修复longwriter代码示例，优化流程 #239
修复lint，以及longwriter的lint #240 #237

文档更新

更新longwriter文档 #239
更新Native评测模式的相关文档 #231

Assets 2

Loading

All reactions

v0.7.2 release

04 Dec 04:24

Yunnglin

Compare

Choose a tag to compare

Loading

v0.7.2 release

Release Note

Remove pyarrow version requirement #225
Optimize warning info #223

中文说明

移除 pyarrow 版本要求 #225
优化 warning 信息 #223

Assets 2

Loading

All reactions

v0.7.1 release

28 Nov 18:30

Compare

Choose a tag to compare

Loading

v0.7.1 release

Release Notes

Add PMMEval benchmark #222

中文说明

特性

增加PMMEval评测集 #222

Assets 2

Loading

All reactions

v0.7.0 release

28 Nov 07:14

Compare

Choose a tag to compare

Loading

v0.7.0 release

Release Notes

Refactor the perf module, more robust and easier to use. #178
Add speed benchmarking in the perf module. #178
Add multi-modal benchmark flickr8k in the perf module for speed benchmark. #211

Bug Fixes

Add timeout for download punkt.zip #206
Fix parallel for speed benchmarking in the perf module. #215

Documentation Updates

Update VLM-Eval doc #209
Update perf module doc #178 #211

中文说明

特性

重构perf模块，更鲁棒、更易用。 #178
在perf模块中添加速度基准测试。 #178
在perf模块中添加多模态基准 flickr8k 以进行速度基准测试。 #211

缺陷修复

修复下载punkt.zip的超时问题。 #206
修复perf模块中的速度基准测试并行问题。 #215

文档更新

更新VLM-Eval文档。 #209
更新perf模块文档。 #178 #211

Assets 2

Loading

All reactions

v0.6.1 release

22 Nov 06:34

Compare

Choose a tag to compare

Loading

v0.6.1 release

Release Notes

Add CMMLU benchmark #198
Add publish workflow #186
Adapt RAGAS v0.2.5 and update readme #205
Adapt MTEB v1.19 #196

Bug Fixes

Set datasets version: dataset>=3.0.0, <=3.0.1 #184
Set pyarrow version to <=17.0.0 to avoid installation issue on OSX. #187
Add timeout for download punkt.zip #206

Documentation Updates

Update OpenCompass list all datasets docs #199
Update RAGAS v0.2.5 docs #205

中文说明

特性

支持CMMLU benchmark #198
支持publish 流程 #186
适配RAGAS v0.2.5并更新文档 #205
适配 MTEB v1.19 #196

缺陷修复

设置datasets 版本，修复兼容性问题: dataset>=3.0.0, <=3.0.1 #184
设置 pyarrow版本：<=17.0.0 修复在OSX操作系统下的安装问题 #187
增加下载punkt.zip时的超时时间 #206

文档更新

更新OpenCompass作为backend时所支持的数据集列表文档 #199
更新RAGAS v0.2.5 文档 #205

Assets 2

Loading

All reactions

Release v0.6.0

08 Nov 05:51

Compare

Choose a tag to compare

Loading

Release v0.6.0

Release Notes

Support multi-modal RAG evaluation #149
- Add CLIP_Benchmark
- Add end-to-end multi-modal RAG evaluation in Ragas
To be compatible with Ragas v0.2.3 #165 #171
Support truncating input for CLIP models #163 #164
Support saving knowledge graphs when generating datasets in Ragas #175

Bug Fixes

Fix issue of abnormal metrics during CMTEB evaluation #157
Fix issue of GenerationConfig being None #173
Update datasets version constraints #184
Add publish workflow #186

Documentation Updates

Update VLMEvalKit documentation #166
Update multi-modal RAG blog #172

中文说明

特性

添加多模态RAG评测支持 #149
- 支持CLIP_Benchmark
- 支持Ragas端到端多模态RAG评测
兼容Ragas v0.2.3 #165 #171
支持CLIP模型截断输入 #163 #164
支持Ragas生成数据集时保存知识图谱 #175

缺陷修复

修复CMTEB评估时指标异常的问题 #157
修复GenerationConfig为None的异常 #173
更新datasets版本限制 #184
增加publish workflow #186

文档更新

更新VLMEvalKit文档 #166
更新多模态RAG博客 #172

Assets 2

Loading

All reactions

Release v0.5.5

15 Oct 02:57

Yunnglin

Compare

Choose a tag to compare

Loading

Release v0.5.5

Release Notes

Added Dataset Support:
- Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations #146
- Added cmb dataset #117
Support for LongBench-write quality evaluation of long text generation #136
Automatic downloading of punkt_tab.zip from nltk #140
Support for RAG evaluation #127:
- Support for embeddings/reranker evaluation: Integration of MTEB (Massive Text Embedding Benchmark) and CMTEB (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
- Support for end-to-end RAG evaluation: Integration of the ragas framework, supporting automatic generation of evaluation datasets and evaluation based on judge models
Documentation Updates:
- Added "Blog" section #126, #135
- Added support for dataset page #121
- Updated function usage instructions #125, #134, #138, #137, #127
Updated dependencies: nltk>=3.9 and rouge-score>=0.1.0 #145, #143

中文说明

新增数据集支持：
- 完善多模态评测功能，支持MMBench-Video，Video-MME，MVBench视频评测 #146
- 新增cmb数据集 #117
支持LongBench-write 长文本生成的质量评测 #136
支持从nltk自动下载 punkt_tab.zip #140
支持RAG评测：#127
- 支持embeddings/reranker 评测：集成MTEB（Massive Text Embedding Benchmark）和 CMTEB（Chinese Massive Text Embedding Benchmark），支持检索、重排等任务评估
- 支持RAG端到端评测：集成ragas框架，支持自动生成评测数据集和基于裁判员模型的评测
文档更新
- 增加 “博客” 板块 #126, #135
- 增加支持的数据集页面 #121
- 更新功能使用说明 #125, #134, #138, #137, #127
更新依赖nltk>=3.9和rouge-score>=0.1.0 #145, #143

Assets 2

Loading

All reactions

Release v0.5.2

09 Aug 13:20

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Release v0.5.2

Highlight features

Support Multi-modal models evaluation (VLM Eval)
Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
Support installation with format: pip install evalscope[opencompass] or pip install evalscope[vlmeval]

Breaking Changes

None

What's Changed

Support Multi-modal models evaluation (VLM Eval)
Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
Support installation with format: pip install evalscope[opencompass] or pip install evalscope[vlmeval]
Update README
Add UT cases for VLM eval
Update examples for OpenCompass and VLMEval eval backends
Update version restrictions for ms-opencompass and ms-vlmeval dependencies.

Assets 2

Loading

seetimee reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

Previous 1 2 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.