From 6e24e2fc9d59f7e845bea07c2626c9ecec6626b3 Mon Sep 17 00:00:00 2001 From: Yifan Peng Date: Mon, 2 Sep 2024 00:08:38 -0400 Subject: [PATCH] update owsm --- _posts/2024-01-01-owsm.md | 38 ++++++++++++++++++++++++++++---------- 1 file changed, 28 insertions(+), 10 deletions(-) diff --git a/_posts/2024-01-01-owsm.md b/_posts/2024-01-01-owsm.md index 5170ce9d..ccbce19a 100644 --- a/_posts/2024-01-01-owsm.md +++ b/_posts/2024-01-01-owsm.md @@ -17,7 +17,7 @@ comments: false ## Pre-trained models -We publicly release a series of pre-trained models. The training logs are also available for major models. We recommend using OWSM v3.1 or later versions for better performance and efficiency. +We publicly release a series of [pre-trained models](https://huggingface.co/collections/pyf98/open-whisper-style-speech-models-owsm-66d5312c1c9a1508189192cd). The training logs are also available for major models. We recommend using OWSM v3.1 or later versions for better performance and efficiency. @@ -76,7 +76,7 @@ We publicly release a series of pre-trained models. The training logs are also a - + @@ -88,11 +88,27 @@ We publicly release a series of pre-trained models. The training logs are also a - + - - + + + + + + + + + + + + + + + + + + @@ -133,9 +149,9 @@ The latest OWSM v3.1 models are trained on a diverse combination of public datas -The license-free model is trained on a subset of the above data with "free licenses". +The low-restriction model is trained on a subset of the above data with "more flexible licenses". -
OWSM v3.1 license-free data +
OWSM v3.1 low-restriction data
  • AMI: CC-BY-4.0
  • Common Voice: CC0-1.0
  • @@ -240,14 +256,16 @@ result = s2t.decode_long(speech) ## Fine-tuning on custom data -Coming soon! +Our latest work (accepted to SLT 2024), "ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration", will provide an easier way for fine-tuning pre-trained models. We are preparing demos and notebooks. Please stay tuned! ## Papers -Please cite our papers if you use OWSM in your project. +Please cite our papers if you find OWSM helpful. -- Preprint: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658) +- ACL 2024: [OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification](https://aclanthology.org/2024.acl-long.549/) +- INTERSPEECH 2024: [On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models](https://arxiv.org/abs/2406.09282) +- INTERSPEECH 2024: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658) - ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876)
180k E-Branchformer 367MComing soonespnet/owsm_v3.1_ebf_small egs2/owsm_v3.1/s2t1
egs2/owsm_v3.1/s2t1
OWSM v3.1 medium license-freeOWSM v3.1 small low-restriction 70k E-Branchformer1.02BComing soon367Mespnet/owsm_v3.1_ebf_small_lowrestrictionegs2/owsm_v3.1/s2t1
OWSM-CTC v3.1 medium180kE-Branchformer1.01Bpyf98/owsm_ctc_v3.1_1BCheck model page
OWSM v3.2 small180kE-Branchformer367Mespnet/owsm_v3.2 Coming soon