From f7398c92abb0e361d1829b7e70916c1e3ef93dbc Mon Sep 17 00:00:00 2001 From: KairuiHu Date: Wed, 27 Nov 2024 15:44:21 +0800 Subject: [PATCH] final proofread --- docs/lmms-eval-0.3.md | 171 ------------------------------------------ 1 file changed, 171 deletions(-) diff --git a/docs/lmms-eval-0.3.md b/docs/lmms-eval-0.3.md index aee2fbc9..368f39d5 100644 --- a/docs/lmms-eval-0.3.md +++ b/docs/lmms-eval-0.3.md @@ -155,177 +155,6 @@ This upgrade includes multiple benchmarks for audio understanding and instructio | **VocalSound** | test | Acc | 0.936 | 0.81 | | | val | | 0.9288 | 0.8 | | **WavCaps** | test | GPT-Eval | 1.73 | | -#### Table 2: Alignment check for audio datasets - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
MetricQwen2-Audio-Instruct (lmms-eval)Qwen2-Audio (lmms-eval)
AIRBench-ChatSpeechGPT-Eval7.16
Sound6.14
Music6.66
Mixed5.75
AIRBench-FoundationSpeechAcc62.89
Sound55.42
Music56.77
AlpacatestGPT-Eval51.8
Clotho_aqatestGPT-Eval0.7587
Common_voicezhWER15.786.7
en36.0127.9
fr39.8834.8
GigaSpeechdevWER19.4514
test22.615.01
LibriSpeechdev-cleanWER4.241.66
dev-others6.543.66
test-clean3.591.74
test-others7.463.87
MuchoMusictestAcc68.3245.07
OpenHermestestGPT-Eval46.8
People_speechvalWER25.8617.1
TediumvalWER10.928.29
VocalSoundtestAcc0.9360.81
val0.92880.8
WavCapstestGPT-Eval1.73
The result might be inconsistent with the reported result as we do not have the original prompt and we have to maintain the fair environment for all the models. For the base model, we do not test on the Chat Benchmarks.