Merge pull request #227 from pyf98/owsm

update owsm
shinjiwlab · Sep 3, 2024 · 719eb4a · 719eb4a
2 parents db64409 + 6e24e2f
commit 719eb4a
Showing 1 changed file with 28 additions and 10 deletions.
diff --git a/_posts/2024-01-01-owsm.md b/_posts/2024-01-01-owsm.md
@@ -17,7 +17,7 @@ comments: false
 
 ## Pre-trained models
 
-We publicly release a series of pre-trained models. The training logs are also available for major models. <strong>We recommend using OWSM v3.1 or later versions for better performance and efficiency.</strong>
+We publicly release a series of [pre-trained models](https://huggingface.co/collections/pyf98/open-whisper-style-speech-models-owsm-66d5312c1c9a1508189192cd). The training logs are also available for major models. <strong>We recommend using OWSM v3.1 or later versions for better performance and efficiency.</strong>
 
 <table class="table">
     <thead>
@@ -76,7 +76,7 @@ We publicly release a series of pre-trained models. The training logs are also a
         <td>180k</td>
         <td>E-Branchformer</td>
         <td>367M</td>
-        <td><a href="">Coming soon</a></td>
+        <td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf_small">espnet/owsm_v3.1_ebf_small</a></td>
         <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
       </tr>
       <tr>
@@ -88,11 +88,27 @@ We publicly release a series of pre-trained models. The training logs are also a
         <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
       </tr>
       <tr>
-        <td><b>OWSM v3.1 medium license-free</b></td>
+        <td><b>OWSM v3.1 small low-restriction</b></td>
         <td>70k</td>
         <td>E-Branchformer</td>
-        <td>1.02B</td>
-        <td><a href="">Coming soon</a></td>
+        <td>367M</td>
+        <td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf_small_lowrestriction">espnet/owsm_v3.1_ebf_small_lowrestriction</a></td>
+        <td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
+      </tr>
+      <tr>
+        <td><b><a href="https://aclanthology.org/2024.acl-long.549/">OWSM-CTC v3.1 medium</a></b></td>
+        <td>180k</td>
+        <td>E-Branchformer</td>
+        <td>1.01B</td>
+        <td><a href="https://huggingface.co/pyf98/owsm_ctc_v3.1_1B">pyf98/owsm_ctc_v3.1_1B</a></td>
+        <td><a href="https://huggingface.co/pyf98/owsm_ctc_v3.1_1B">Check model page</a></td>
+      </tr>
+      <tr>
+        <td><b>OWSM v3.2 small</b></td>
+        <td>180k</td>
+        <td>E-Branchformer</td>
+        <td>367M</td>
+        <td><a href="https://huggingface.co/espnet/owsm_v3.2">espnet/owsm_v3.2</a></td>
         <td><a href="">Coming soon</a></td>
       </tr>
     </tbody>
@@ -133,9 +149,9 @@ The latest OWSM v3.1 models are trained on a diverse combination of public datas
 </ul>
 </details>
 
-The license-free model is trained on a subset of the above data with "free licenses".
+The low-restriction model is trained on a subset of the above data with "more flexible licenses".
 
-<details style="margin-bottom:1em;"><summary>OWSM v3.1 license-free data</summary>
+<details style="margin-bottom:1em;"><summary>OWSM v3.1 low-restriction data</summary>
 <ul>
   <li>AMI: CC-BY-4.0</li>
   <li>Common Voice: CC0-1.0</li>
@@ -240,14 +256,16 @@ result = s2t.decode_long(speech)
 
 ## Fine-tuning on custom data
 
-Coming soon!
+Our latest work (accepted to SLT 2024), "ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration", will provide an easier way for fine-tuning pre-trained models. We are preparing demos and notebooks. Please stay tuned!
 
 
 ## Papers
 
-Please cite our papers if you use OWSM in your project.
+Please cite our papers if you find OWSM helpful.
 
-- Preprint: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658)
+- ACL 2024: [OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification](https://aclanthology.org/2024.acl-long.549/)
+- INTERSPEECH 2024: [On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models](https://arxiv.org/abs/2406.09282)
+- INTERSPEECH 2024: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658)
 - ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876)