Skip to content

Commit

Permalink
Merge pull request #227 from pyf98/owsm
Browse files Browse the repository at this point in the history
update owsm
  • Loading branch information
pyf98 authored Sep 3, 2024
2 parents db64409 + 6e24e2f commit 719eb4a
Showing 1 changed file with 28 additions and 10 deletions.
38 changes: 28 additions & 10 deletions _posts/2024-01-01-owsm.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ comments: false

## Pre-trained models

We publicly release a series of pre-trained models. The training logs are also available for major models. <strong>We recommend using OWSM v3.1 or later versions for better performance and efficiency.</strong>
We publicly release a series of [pre-trained models](https://huggingface.co/collections/pyf98/open-whisper-style-speech-models-owsm-66d5312c1c9a1508189192cd). The training logs are also available for major models. <strong>We recommend using OWSM v3.1 or later versions for better performance and efficiency.</strong>

<table class="table">
<thead>
Expand Down Expand Up @@ -76,7 +76,7 @@ We publicly release a series of pre-trained models. The training logs are also a
<td>180k</td>
<td>E-Branchformer</td>
<td>367M</td>
<td><a href="">Coming soon</a></td>
<td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf_small">espnet/owsm_v3.1_ebf_small</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
</tr>
<tr>
Expand All @@ -88,11 +88,27 @@ We publicly release a series of pre-trained models. The training logs are also a
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
</tr>
<tr>
<td><b>OWSM v3.1 medium license-free</b></td>
<td><b>OWSM v3.1 small low-restriction</b></td>
<td>70k</td>
<td>E-Branchformer</td>
<td>1.02B</td>
<td><a href="">Coming soon</a></td>
<td>367M</td>
<td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf_small_lowrestriction">espnet/owsm_v3.1_ebf_small_lowrestriction</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
</tr>
<tr>
<td><b><a href="https://aclanthology.org/2024.acl-long.549/">OWSM-CTC v3.1 medium</a></b></td>
<td>180k</td>
<td>E-Branchformer</td>
<td>1.01B</td>
<td><a href="https://huggingface.co/pyf98/owsm_ctc_v3.1_1B">pyf98/owsm_ctc_v3.1_1B</a></td>
<td><a href="https://huggingface.co/pyf98/owsm_ctc_v3.1_1B">Check model page</a></td>
</tr>
<tr>
<td><b>OWSM v3.2 small</b></td>
<td>180k</td>
<td>E-Branchformer</td>
<td>367M</td>
<td><a href="https://huggingface.co/espnet/owsm_v3.2">espnet/owsm_v3.2</a></td>
<td><a href="">Coming soon</a></td>
</tr>
</tbody>
Expand Down Expand Up @@ -133,9 +149,9 @@ The latest OWSM v3.1 models are trained on a diverse combination of public datas
</ul>
</details>

The license-free model is trained on a subset of the above data with "free licenses".
The low-restriction model is trained on a subset of the above data with "more flexible licenses".

<details style="margin-bottom:1em;"><summary>OWSM v3.1 license-free data</summary>
<details style="margin-bottom:1em;"><summary>OWSM v3.1 low-restriction data</summary>
<ul>
<li>AMI: CC-BY-4.0</li>
<li>Common Voice: CC0-1.0</li>
Expand Down Expand Up @@ -240,14 +256,16 @@ result = s2t.decode_long(speech)

## Fine-tuning on custom data

Coming soon!
Our latest work (accepted to SLT 2024), "ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration", will provide an easier way for fine-tuning pre-trained models. We are preparing demos and notebooks. Please stay tuned!


## Papers

Please cite our papers if you use OWSM in your project.
Please cite our papers if you find OWSM helpful.

- Preprint: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658)
- ACL 2024: [OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification](https://aclanthology.org/2024.acl-long.549/)
- INTERSPEECH 2024: [On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models](https://arxiv.org/abs/2406.09282)
- INTERSPEECH 2024: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658)
- ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876)


Expand Down

0 comments on commit 719eb4a

Please sign in to comment.