Skip to content

Commit

Permalink
add owsm to open source page
Browse files Browse the repository at this point in the history
  • Loading branch information
pyf98 committed Jan 25, 2024
1 parent ed6b238 commit 2e28165
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 13 deletions.
28 changes: 26 additions & 2 deletions _pages/open-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@ nav: true
order: 5
---

Our lab has been led and participated in the development of several open-source toolkits and datasets. The followings are some selected ones.
Our lab has led and participated in the development of several open-source toolkits, projects, and datasets. Some selected ones are listed below.

### Softwares

<hr />

<table cellspacing="0" cellpadding="0">
<tr>
<td class="col-sm w-25">
Expand Down Expand Up @@ -76,8 +78,29 @@ Our lab has been led and participated in the development of several open-source
</table>
<hr />


### Projects

<hr />

<table cellspacing="0" cellpadding="0">
<tr>
<td class="col-sm w-25">
<a href="{% post_url 2024-01-01-owsm %}">
OWSM
</a>
</td>
<td>
<strong>Open Whisper-style Speech Models</strong> (<strong>OWSM</strong>, pronounced as "awesome") are a series of speech foundation models developed by WAVLab at Carnegie Mellon University. We reproduce Whisper-style training using publicly available data and our open-source toolkit ESPnet. By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training.
</td></tr>
</table>
<hr />


### Datasets

<hr />

<table cellspacing="0" cellpadding="0">
<tr>
<td class="col-sm w-25">
Expand Down Expand Up @@ -182,4 +205,5 @@ Our lab has been led and participated in the development of several open-source
The substantive material of <strong>Totonac</strong> from the northern sierras of Puebla and adjacent areas of Veracruz were compiled starting in 2016 by Jonathan D. Amith and continue to the present as part of a joint effort by Amith and Osbel López Francisco, a native speaker biologist from Zongozotla.
</td></tr>
</table>
<hr />
<hr />

28 changes: 17 additions & 11 deletions _posts/2024-01-01-owsm.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,24 @@ comments: false

## Overview

The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. It reproduces Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training.
The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. We reproduce Whisper-style training using publicly available data and our open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training.

## News

- TBD

## Demo pages

- Gradio demo: [![Static Badge](https://img.shields.io/badge/OWSM-Demo-orange)](https://pyf98-owsm-v3-demo.hf.space)
- Colab notebook: [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing)


## Papers

Please cite our papers if you use OWSM.

- ASRU 2023: [Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data](https://arxiv.org/abs/2309.13876)

## Pre-trained models

We have released various pre-trained models. The training logs are also available for major models.
Expand All @@ -27,57 +34,60 @@ We have released various pre-trained models. The training logs are also availabl
<thead>
<tr>
<th>Name</th>
<th>Data (hours)</th>
<th>Encoder</th>
<th>Parameters</th>
<th>Data (hours)</th>
<th>Model Link</th>
<th>ESPnet Recipe</th>
</tr>
</thead>
<tbody>
<tr>
<td>OWSM v1</td>
<td>38k</td>
<td>Transformer</td>
<td>272M</td>
<td>38k</td>
<td><a href="https://huggingface.co/espnet/owsm_v1">espnet/owsm_v1</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v1/s2t1">egs2/owsm_v1/s2t1</a></td>
</tr>
<tr>
<td>OWSM v2</td>
<td>129k</td>
<td>Transformer</td>
<td>712M</td>
<td>129k</td>
<td><a href="https://huggingface.co/espnet/owsm_v2">espnet/owsm_v2</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v2/s2t1">egs2/owsm_v2/s2t1</a></td>
</tr>
<tr>
<td>OWSM v2</td>
<td>129k</td>
<td>E-Branchformer</td>
<td>739M</td>
<td>129k</td>
<td><a href="https://huggingface.co/espnet/owsm_v2_ebranchformer">espnet/owsm_v2_ebranchformer</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v2/s2t1">egs2/owsm_v2/s2t1</a></td>
</tr>
<tr>
<td>OWSM v3</td>
<td>180k</td>
<td>Transformer</td>
<td>889M</td>
<td>180k</td>
<td><a href="https://huggingface.co/espnet/owsm_v3">espnet/owsm_v3</a></td>
<td><a href="https://github.com/espnet/espnet/tree/master/egs2/owsm_v3/s2t1">egs2/owsm_v3/s2t1</a></td>
</tr>
<tr>
<td>OWSM v3.1</td>
<td>180k</td>
<td>E-Branchformer</td>
<td>1.02B</td>
<td>180k</td>
<td><a href="https://huggingface.co/espnet/owsm_v3.1_ebf">espnet/owsm_v3.1_ebf</a></td>
<td>TBD</td>
</tr>
</tbody>
</table>

## Data details



## Inference

Expand All @@ -91,7 +101,3 @@ We have released various pre-trained models. The training logs are also availabl


## Fine-tuning


## Citations

0 comments on commit 2e28165

Please sign in to comment.