add owsm to open source page

shinjiwlab · Jan 25, 2024 · 2e28165 · 2e28165
1 parent ed6b238
commit 2e28165
Show file tree

Hide file tree

Showing 2 changed files with 43 additions and 13 deletions.
diff --git a/_pages/open-source.md b/_pages/open-source.md
@@ -7,10 +7,12 @@ nav: true
 order: 5
 ---
 
-Our lab has been led and participated in the development of several open-source toolkits and datasets. The followings are some selected ones.
+Our lab has led and participated in the development of several open-source toolkits, projects, and datasets. Some selected ones are listed below.
 
 ### Softwares
 
+<hr />
+
 <table cellspacing="0" cellpadding="0">
 <tr>
 <td class="col-sm w-25">
@@ -76,8 +78,29 @@ Our lab has been led and participated in the development of several open-source
 </table>
 <hr />
 
+
+### Projects
+
+<hr />
+
+<table cellspacing="0" cellpadding="0">
+<tr>
+<td class="col-sm w-25">
+      <a href="{% post_url 2024-01-01-owsm %}">
+        OWSM
+      </a>
+</td>
+<td>
+  <strong>Open Whisper-style Speech Models</strong> (<strong>OWSM</strong>, pronounced as "awesome") are a series of speech foundation models developed by WAVLab at Carnegie Mellon University. We reproduce Whisper-style training using publicly available data and our open-source toolkit ESPnet. By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training.
+</td></tr>
+</table>
+<hr />
+
+
 ### Datasets
 
+<hr />
+
 <table cellspacing="0" cellpadding="0">
 <tr>
 <td class="col-sm w-25">
@@ -182,4 +205,5 @@ Our lab has been led and participated in the development of several open-source
   The substantive material of <strong>Totonac</strong> from the northern sierras of Puebla and adjacent areas of Veracruz were compiled starting in 2016 by Jonathan D. Amith and continue to the present as part of a joint effort by Amith and Osbel López Francisco, a native speaker biologist from Zongozotla.
 </td></tr>
 </table>
-<hr />
+<hr />
+
diff --git a/_posts/2024-01-01-owsm.md b/_posts/2024-01-01-owsm.md
@@ -8,17 +8,24 @@ comments: false
 
 ## Overview
 
-The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. It reproduces Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training.
+The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. We reproduce Whisper-style training using publicly available data and our open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training.
 
 ## News
 
+- TBD
 
 ## Demo pages
 
 - Gradio demo: [![Static Badge](https://img.shields.io/badge/OWSM-Demo-orange)](https://pyf98-owsm-v3-demo.hf.space)
 - Colab notebook: [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing)
 
 
+## Papers
+
+Please cite our papers if you use OWSM.
+
+- ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876)
+
 ## Pre-trained models
 
 We have released various pre-trained models. The training logs are also available for major models.
@@ -27,57 +34,60 @@ We have released various pre-trained models. The training logs are also availabl
     <thead>
       <tr>
         <th>Name</th>
+        <th>Data (hours)</th>
         <th>Encoder</th>
         <th>Parameters</th>
-        <th>Data (hours)</th>
         <th>Model Link</th>
         <th>ESPnet Recipe</th>
       </tr>
     </thead>
     <tbody>
       <tr>
         <td>OWSM v1</td>
+        <td>38k</td>
         <td>Transformer</td>
         <td>272M</td>
-        <td>38k</td>
         <td><a href="https://huggingface.co/espnet/owsm_v1">espnet/owsm_v1</a></td>
         <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v1/s2t1">egs2/owsm_v1/s2t1</a></td>
       </tr>
       <tr>
         <td>OWSM v2</td>
+        <td>129k</td>
         <td>Transformer</td>
         <td>712M</td>
-        <td>129k</td>
         <td><a href="https://huggingface.co/espnet/owsm_v2">espnet/owsm_v2</a></td>
         <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v2/s2t1">egs2/owsm_v2/s2t1</a></td>
       </tr>
       <tr>
         <td>OWSM v2</td>
+        <td>129k</td>
         <td>E-Branchformer</td>
         <td>739M</td>
-        <td>129k</td>
         <td><a href="https://huggingface.co/espnet/owsm_v2_ebranchformer">espnet/owsm_v2_ebranchformer</a></td>
         <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v2/s2t1">egs2/owsm_v2/s2t1</a></td>
       </tr>
       <tr>
         <td>OWSM v3</td>
+        <td>180k</td>
         <td>Transformer</td>
         <td>889M</td>
-        <td>180k</td>
         <td><a href="https://huggingface.co/espnet/owsm_v3">espnet/owsm_v3</a></td>
         <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3/s2t1">egs2/owsm_v3/s2t1</a></td>
       </tr>
       <tr>
         <td>OWSM v3.1</td>
+        <td>180k</td>
         <td>E-Branchformer</td>
         <td>1.02B</td>
-        <td>180k</td>
         <td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf">espnet/owsm_v3.1_ebf</a></td>
         <td>TBD</td>
       </tr>
     </tbody>
   </table>
 
+## Data details
+
+
 
 ## Inference
 
@@ -91,7 +101,3 @@ We have released various pre-trained models. The training logs are also availabl
 
 
 ## Fine-tuning
-
-
-## Citations
-