VERSA

VERSA (Versatile Evaluation of Speech and Audio) is a toolkit dedicated to collecting evaluation metrics in speech and audio quality. Our goal is to provide a comprehensive connection to the cutting-edge techniques developed for evaluation. The toolkit is also tightly integrated into ESPnet.

Colab Demonstration

Colab Demonstration at Interspeech2024 Tutorial

Install

The base-installation is as easy as follows:

git clone https://github.com/shinjiwlab/versa.git
cd versa
pip install .

As for collection purposes, VERSA instead of re-distributing the model, we try to align as much to the original API provided by the algorithm developer. Therefore, we have many dependencies. We try to include as many as default, but there are cases where the toolkit needs specific installation requirements. Please refer to our list-of-metric section for more details on whether the metrics are automatically included or not. If not, we provide an installation guide or installers in tools.

Quick test

python versa/test/test_general.py

# test metrics with additional installation
python versa/test/test_{metric}.py

Usage

Simple usage case for a few samples.

# direct usage
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1 \
    --pred test/test_samples/test2 \
    --output_file test_result

# with scp-style input
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result

# with kaldi-ark style
python versa/bin/scorer.py \
    --score_config egs/speech.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result \
    --io kaldi
  
# For text information
python versa/bin/scorer.py \
    --score_config egs/separate_metrics/wer.yaml \
    --gt test/test_samples/test1.scp \
    --pred test/test_samples/test2.scp \
    --output_file test_result \
    --text test/test_samples/text

Use launcher with slurm job submissions

# use the launcher
# Option1: with gt speech
./launch.sh \
  <pred_speech_scp> \
  <gt_speech_scp> \
  <score_dir> \
  <split_job_num> 

# Option2: without gt speech
./launch.sh \
  <pred_speech_scp> \
  None \
  <score_dir> \
  <split_job_num>

# aggregate the results
cat <score_dir>/result/*.result.cpu.txt > <score_dir>/utt_result.cpu.txt
cat <score_dir>/result/*.result.gpu.txt > <score_dir>/utt_result.gpu.txt

# show result
python scripts/show_result.py <score_dir>/utt_result.cpu.txt
python scripts/show_result.py <score_dir>/utt_result.gpu.txt

Access egs/*.yaml for different configs for different setups.

List of Metrics

We include x mark if the metric is auto-installed in versa.

Number	Auto-Install	Metric Name (Auto-Install)	Key in config	Key in report	Code Source	References
1	x	Mel Cepstral Distortion (MCD)	mcd_f0	mcd	espnet and s3prl-vc	paper
2	x	F0 Correlation	mcd_f0	f0_corr	espnet and s3prl-vc	paper
3	x	F0 Root Mean Square Error	mcd_f0	f0_rmse	espnet and s3prl-vc	paper
4	x	Signal-to-interference Ratio (SIR)	signal_metric	sir	espnet	-
5	x	Signal-to-artifact Ratio (SAR)	signal_metric	sar	espnet	-
6	x	Signal-to-distortion Ratio (SDR)	signal_metric	sdr	espnet	-
7	x	Convolutional scale-invariant signal-to-distortion ratio (CI-SDR)	signal_metric	ci-sdr	ci_sdr	paper
8	x	Scale-invariant signal-to-noise ratio (SI-SNR)	signal_metric	si-snr	espnet	paper
9	x	Perceptual Evaluation of Speech Quality (PESQ)	pesq	pesq	pesq	paper
10	x	Short-Time Objective Intelligibility (STOI)	stoi	stoi	pystoi	paper
11	x	Speech BERT Score	discrete_speech	speech_bert	discrete speech metric	paper
12	x	Discrete Speech BLEU Score	discrete_speech	speech_belu	discrete speech metric	paper
13	x	Discrete Speech Token Edit Distance	discrete_speech	speech_token_distance	discrete speech metric	paper
14	x	UTokyo-SaruLab System for VoiceMOS Challenge 2022 (UTMOS)	pseudo_mos	utmos	speechmos	paper
15	x	Deep Noise Suppression MOS Score of P.835 (DNSMOS)	pseudo_mos	dnsmos_overall	speechmos (MS)	paper
16	x	Deep Noise Suppression MOS Score of P.808 (DNSMOS)	pseudo_mos	dnsmos_p808	speechmos (MS)	paper
17	x	Packet Loss Concealment-related MOS Score (PLCMOS)	pseudo_mos	plcmos	speechmos (MS)	paper
18		Virtual Speech Quality Objective Listener (VISQOL)	visqol	visqol	google-visqol	paper
19	x	Speaker Embedding Similarity	speaker	spk_similarity	espnet	paper
20	x	PESQ in TorchAudio-Squim	squim_no_ref	torch_squim_pesq	torch_squim	paper
21	x	STOI in TorchAudio-Squim	squim_no_ref	torch_squim_stoi	torch_squim	paper
22	x	SI-SDR in TorchAudio-Squim	squim_no_ref	torch_squim_si_sdr	torch_squim	paper
23	x	MOS in TorchAudio-Squim	squim_ref	torch_squim_mos	torch_squim	paper
24	x	Singing voice MOS	singmos	singmos	singmos	paper
25	x	Log-Weighted Mean Square Error	log_wmse	log_wmse	log_wmse
26		Dynamic Time Warping Cost Metric	warpq	warpq	WARP-Q	paper
27	x	Sheet SSQA MOS Models	sheet_ssqa	sheet_ssqa	Sheet	paper
28	x	ESPnet Speech Recognition-based Error Rate	espnet_wer	espnet_wer	ESPnet	paper
29	x	ESPnet-OWSM Speech Recognition-based Error Rate	owsm_wer	owsm_wer	ESPnet	paper
30	x	OpenAI-Whisper Speech Recognition-based Error Rate	whisper_wer	whisper_wer	Whisper	paper
31		UTMOSv2: UTokyo-SaruLab MOS Prediction System	utmosv2	utmosv2	UTMOSv2	paper
32		Speech Contrastive Regression for Quality Assessment with reference (ScoreQ)	scoreq_ref	scoreq_ref	ScoreQ	paper
33		Speech Contrastive Regression for Quality Assessment without reference (ScoreQ)	scoreq_nr	scoreq_nr	ScoreQ	paper
34		Emotion2vec similarity (emo2vec)	emo2vec_similarity	emotion_similarity	emo2vec	paper
35	x	Speech enhancement-based SI-SNR	se_snr	se_si_snr	ESPnet
36	x	Speech enhancement-based CI-SDR	se_snr	se_ci_sdr	ESPnet
37	x	Speech enhancement-based SAR	se_snr	se_sar	ESPnet
38	x	Speech enhancement-based SDR	se_snr	se_sdr	ESPnet
39		NOMAD: Unsupervised Learning of Perceptual Embeddings For Speech Enhancement and Non-Matching Reference Audio Quality Assessment	nomad	nomad	Nomad	paper
40		Frechet Audio Distance (FAD)	fad	fad	fadtk	paper
41		Contrastive Language-Audio Pretraining Score (CLAP Score)	clap_score	clap_score	fadtk	paper
42		Audio Density and Coverage Score	audio_density_coverage	audio_density_coverage	Sony-audio-metrics	paper
43		Accompaniment Prompt Adherence (APA)	apa	apa	Sony-audio-metrics	paper
44		Kullback-Leibler Divergence on Embedding Distribution	kl_embedding	kl_embedding	Stability-AI
45	x	PAM: Prompting Audio-Language Models for Audio Quality Assessment	pam	pam	PAM	Paper
46		Frequency-Weighted SEGmental SNR (FWSEGSNR)	pysepm	pysepm_fwsegsnr	pysepm	Paper
47		Log Likelihood Ratio (LLR)	pysepm	pysepm_llr	pysepm	Paper
48		Weighted Spectral Slope (WSS)	pysepm	pysepm_wss	pysepm	Paper
49		Cepstrum Distance Objective Speech Quality Measure (CD)	pysepm	pysepm_cd	pysepm	Paper
50		Composite Objective Speech Quality (composite)	pysepm	pysepm_Csig, pysepm_Cbak, pysepm_Covl	pysepm	Paper
51		Coherence and speech intelligibility index (CSII)	pysepm	pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low	pysepm	Paper
52		Normalized-covariance measure (NCM)	pysepm	pysepm_ncm	pysepm	Paper
51		Coherence and Speech Intelligibility Index (CSII)	pysepm	pysepm_csii_high, pysepm_csii_mid, pysepm_csii_low	pysepm	Paper
52		Normalized-Covariance Measure (NCM)	pysepm	pysepm_ncm	pysepm	Paper
53		Speech-to-Reverberation Modulation energy Ratio (SRMR)	srmr	srmr	SRMRpy	Paper
54		Voice Activity Detection (VAD)	vad	vad_info	SileroVAD
55	x	AASIST: Audio Anti-Spoofing using Integrated Spectro-Temporal Graph Attention Networks	asvspoof_score	asvspoof_score	AASIST	Paper
56		NORESQA : A Framework for Speech Quality Assessment using Non-Matching References	noresqa	noresqa	Noresqa	Paper
57		KID : Kernel Distance Metric for Audio/Music Quality	kid	kid	KID	Paper
A few more in verifying/progresss

Acknowledgement

We sincerely thank all the open-source implementations listed in https://github.com/shinjiwlab/versa/tree/main#list-of-metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VERSA

Colab Demonstration

Install

Quick test

Usage

List of Metrics

Acknowledgement

Files

README.md

Latest commit

History

README.md

File metadata and controls

VERSA

Colab Demonstration

Install

Quick test

Usage

List of Metrics

Acknowledgement