From 8bf494ac4326968e20993b2e86002ac50c9f5532 Mon Sep 17 00:00:00 2001 From: Yifan Peng Date: Sat, 6 Jan 2024 17:25:59 -0500 Subject: [PATCH] add owsm to open source page --- _pages/open-source.md | 28 ++++++++++++++++++++++++++-- _posts/2024-01-01-owsm.md | 28 +++++++++++++++++----------- 2 files changed, 43 insertions(+), 13 deletions(-) diff --git a/_pages/open-source.md b/_pages/open-source.md index 6d560cbe..ebe1d585 100644 --- a/_pages/open-source.md +++ b/_pages/open-source.md @@ -7,10 +7,12 @@ nav: true order: 5 --- -Our lab has been led and participated in the development of several open-source toolkits and datasets. The followings are some selected ones. +Our lab has led and participated in the development of several open-source toolkits, projects, and datasets. Some selected ones are listed below. ### Softwares +
+
@@ -76,8 +78,29 @@ Our lab has been led and participated in the development of several open-source

+ +### Projects + +
+ + + + + +
+ + OWSM + + + Open Whisper-style Speech Models (OWSM, pronounced as "awesome") are a series of speech foundation models developed by WAVLab at Carnegie Mellon University. We reproduce Whisper-style training using publicly available data and our open-source toolkit ESPnet. By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training. +
+
+ + ### Datasets +
+
@@ -182,4 +205,5 @@ Our lab has been led and participated in the development of several open-source The substantive material of Totonac from the northern sierras of Puebla and adjacent areas of Veracruz were compiled starting in 2016 by Jonathan D. Amith and continue to the present as part of a joint effort by Amith and Osbel López Francisco, a native speaker biologist from Zongozotla.
-
\ No newline at end of file +
+ diff --git a/_posts/2024-01-01-owsm.md b/_posts/2024-01-01-owsm.md index 688e6563..4d66d145 100644 --- a/_posts/2024-01-01-owsm.md +++ b/_posts/2024-01-01-owsm.md @@ -8,10 +8,11 @@ comments: false ## Overview -The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. It reproduces Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training. +The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. We reproduce Whisper-style training using publicly available data and our open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training. ## News +- TBD ## Demo pages @@ -19,6 +20,12 @@ The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "aweso - Colab notebook: [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing) +## Papers + +Please cite our papers if you use OWSM. + +- ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876) + ## Pre-trained models We have released various pre-trained models. The training logs are also available for major models. @@ -27,9 +34,9 @@ We have released various pre-trained models. The training logs are also availabl Name + Data (hours) Encoder Parameters - Data (hours) Model Link ESPnet Recipe @@ -37,47 +44,50 @@ We have released various pre-trained models. The training logs are also availabl OWSM v1 + 38k Transformer 272M - 38k espnet/owsm_v1 egs2/owsm_v1/s2t1 OWSM v2 + 129k Transformer 712M - 129k espnet/owsm_v2 egs2/owsm_v2/s2t1 OWSM v2 + 129k E-Branchformer 739M - 129k espnet/owsm_v2_ebranchformer egs2/owsm_v2/s2t1 OWSM v3 + 180k Transformer 889M - 180k espnet/owsm_v3 egs2/owsm_v3/s2t1 OWSM v3.1 + 180k E-Branchformer 1.02B - 180k espnet/owsm_v3.1_ebf TBD +## Data details + + ## Inference @@ -91,7 +101,3 @@ We have released various pre-trained models. The training logs are also availabl ## Fine-tuning - - -## Citations -