Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update owsm #227

Merged
merged 1 commit into from
Sep 3, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions _posts/2024-01-01-owsm.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ comments: false

## Pre-trained models

We publicly release a series of pre-trained models. The training logs are also available for major models. <strong>We recommend using OWSM v3.1 or later versions for better performance and efficiency.</strong>
We publicly release a series of [pre-trained models](https://huggingface.co/collections/pyf98/open-whisper-style-speech-models-owsm-66d5312c1c9a1508189192cd). The training logs are also available for major models. <strong>We recommend using OWSM v3.1 or later versions for better performance and efficiency.</strong>

<table class="table">
<thead>
Expand Down Expand Up @@ -76,7 +76,7 @@ We publicly release a series of pre-trained models. The training logs are also a
<td>180k</td>
<td>E-Branchformer</td>
<td>367M</td>
<td><a href="">Coming soon</a></td>
<td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf_small">espnet/owsm_v3.1_ebf_small</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
</tr>
<tr>
Expand All @@ -88,11 +88,27 @@ We publicly release a series of pre-trained models. The training logs are also a
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
</tr>
<tr>
<td><b>OWSM v3.1 medium license-free</b></td>
<td><b>OWSM v3.1 small low-restriction</b></td>
<td>70k</td>
<td>E-Branchformer</td>
<td>1.02B</td>
<td><a href="">Coming soon</a></td>
<td>367M</td>
<td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf_small_lowrestriction">espnet/owsm_v3.1_ebf_small_lowrestriction</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3.1/s2t1">egs2/owsm_v3.1/s2t1</a></td>
</tr>
<tr>
<td><b><a href="https://aclanthology.org/2024.acl-long.549/">OWSM-CTC v3.1 medium</a></b></td>
<td>180k</td>
<td>E-Branchformer</td>
<td>1.01B</td>
<td><a href="https://huggingface.co/pyf98/owsm_ctc_v3.1_1B">pyf98/owsm_ctc_v3.1_1B</a></td>
<td><a href="https://huggingface.co/pyf98/owsm_ctc_v3.1_1B">Check model page</a></td>
</tr>
<tr>
<td><b>OWSM v3.2 small</b></td>
<td>180k</td>
<td>E-Branchformer</td>
<td>367M</td>
<td><a href="https://huggingface.co/espnet/owsm_v3.2">espnet/owsm_v3.2</a></td>
<td><a href="">Coming soon</a></td>
</tr>
</tbody>
Expand Down Expand Up @@ -133,9 +149,9 @@ The latest OWSM v3.1 models are trained on a diverse combination of public datas
</ul>
</details>

The license-free model is trained on a subset of the above data with "free licenses".
The low-restriction model is trained on a subset of the above data with "more flexible licenses".

<details style="margin-bottom:1em;"><summary>OWSM v3.1 license-free data</summary>
<details style="margin-bottom:1em;"><summary>OWSM v3.1 low-restriction data</summary>
<ul>
<li>AMI: CC-BY-4.0</li>
<li>Common Voice: CC0-1.0</li>
Expand Down Expand Up @@ -240,14 +256,16 @@ result = s2t.decode_long(speech)

## Fine-tuning on custom data

Coming soon!
Our latest work (accepted to SLT 2024), "ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration", will provide an easier way for fine-tuning pre-trained models. We are preparing demos and notebooks. Please stay tuned!


## Papers

Please cite our papers if you use OWSM in your project.
Please cite our papers if you find OWSM helpful.

- Preprint: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658)
- ACL 2024: [OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification](https://aclanthology.org/2024.acl-long.549/)
- INTERSPEECH 2024: [On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models](https://arxiv.org/abs/2406.09282)
- INTERSPEECH 2024: [OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer](https://arxiv.org/abs/2401.16658)
- ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876)


Expand Down
Loading