diff --git a/_posts/2024-01-01-owsm.md b/_posts/2024-01-01-owsm.md new file mode 100644 index 00000000..688e6563 --- /dev/null +++ b/_posts/2024-01-01-owsm.md @@ -0,0 +1,97 @@ +--- +layout: post +title: Open Whisper-style Speech Models (OWSM) +description: This is the project page for OWSM models. +date: 2024-01-01 00:00:00-0800 +comments: false +--- + +## Overview + +The **O**pen **W**hisper-style **S**peech **M**odels (OWSM, pronounced as "awesome") are a series of speech foundation models developed by [WAVLab](https://www.wavlab.org/) at Carnegie Mellon University. It reproduces Whisper-style training using publicly available data and an open-source toolkit [ESPnet](https://github.com/espnet/espnet). By publicly releasing data preparation scripts, training and inference code, pre-trained model weights and training logs, we aim to promote transparency and open science in large-scale speech pre-training. + +## News + + +## Demo pages + +- Gradio demo: [![Static Badge](https://img.shields.io/badge/OWSM-Demo-orange)](https://pyf98-owsm-v3-demo.hf.space) +- Colab notebook: [![Open All Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1zKI3ZY_OtZd6YmVeED6Cxy1QwT1mqv9O?usp=sharing) + + +## Pre-trained models + +We have released various pre-trained models. The training logs are also available for major models. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NameEncoderParametersData (hours)Model LinkESPnet Recipe
OWSM v1Transformer272M38kespnet/owsm_v1egs2/owsm_v1/s2t1
OWSM v2Transformer712M129kespnet/owsm_v2egs2/owsm_v2/s2t1
OWSM v2E-Branchformer739M129kespnet/owsm_v2_ebranchformeregs2/owsm_v2/s2t1
OWSM v3Transformer889M180kespnet/owsm_v3egs2/owsm_v3/s2t1
OWSM v3.1E-Branchformer1.02B180kespnet/owsm_v3.1_ebfTBD
+ + +## Inference + +### Language Identification + +### Speech Recognition + +### Speech Translation + +### Long-form Speech Recognition or Translation + + +## Fine-tuning + + +## Citations +