-
Notifications
You must be signed in to change notification settings - Fork 1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: title Reviewed By: jacobkahn, tlikhomanenko Differential Revision: D25699082 fbshipit-source-id: 1e743005db4ec8f1238dc9e72cb7abac5fbb4677
- Loading branch information
1 parent
7839a42
commit 2f86068
Showing
1 changed file
with
85 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
# RASR release | ||
|
||
This is a repository sharing pre-trained acoustic models and language models for our new paper [Rethinking Evaluation in ASR: Are Our Models Robust Enough?](https://arxiv.org/abs/2010.11745). | ||
|
||
|
||
## Dependencies | ||
|
||
* [flashlight](https://github.com/facebookresearch/flashlight) | ||
|
||
## Models | ||
|
||
### Acoustic Model | ||
|
||
All the acoustic models are retrained using flashlight with [wav2letter++](https://github.com/facebookresearch/wav2letter) consolidated. `Tedlium` is not used as training data here due to license issue. All the training data has more standardized sample rate 16kHz rather than 8kHz used in the paper. | ||
|
||
Here, we are releasing models with different architecture and different sizes. Note that the models may not fully reproduce results in the paper because of both data and toolkit implementation discrepancies. | ||
|
||
|Achitecture |# Param |Arch File |Path | | ||
| :---: | :---: | :---: | :---: | | ||
|Transformer |300 mil |[am_transformer_ctc_stride3_letters_300Mparams.arch](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_transformer_ctc_stride3_letters_300Mparams.arch) |[am_transformer_ctc_stride3_letters_300Mparams.bin](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_transformer_ctc_stride3_letters_300Mparams.bin) | | ||
|Transformer |70 mil |[am_transformer_ctc_stride3_letters_70Mparams.arch](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_transformer_ctc_stride3_letters_70Mparams.arch) |[am_transformer_ctc_stride3_letters_70Mparams.bin](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_transformer_ctc_stride3_letters_70Mparams.bin) | | ||
|Conformer |300 mil |[am_conformer_ctc_stride3_letters_300Mparams.arch](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_300Mparams.arch) |[am_conformer_ctc_stride3_letters_300Mparams.bin](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_300Mparams.bin) | | ||
|Conformer |87 mil |[am_conformer_ctc_stride3_letters_87Mparams.arch](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_87Mparams.arch) |[am_conformer_ctc_stride3_letters_87Mparams.bin](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_87Mparams.bin) | | ||
|Conformer |28 mil |[am_conformer_ctc_stride3_letters_25Mparams.arch](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_25Mparams.arch) |[am_conformer_ctc_stride3_letters_25Mparams.bin](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/am_conformer_ctc_stride3_letters_25Mparams.bin) | | ||
|
||
|
||
|
||
### Language Model | ||
|
||
Language models are trained on Common Crawl corpus as mentioned in paper. We are providing 4-gram LMs with different pruning here with [200k-top words](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lm_common_crawl_200kvocab.txt). All the LMs are trained with [KenLM toolkit](https://kheafield.com/code/kenlm/). | ||
|
||
| Pruning Param |Size (GB) |Path | | ||
| :---: | :---: | :---: | | ||
|0 0 5 5 |8.4 |[large](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lm_common_crawl_large_4gram_prun0-0-5_200kvocab.bin) | | ||
|0 6 15 15 |2.5 |[small](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lm_common_crawl_small_4gram_prun0-6-15_200kvocab.bin) | | ||
|
||
The perplexities of the LMs on different development sets are listed below. | ||
|
||
| LM |nov93dev |TL-dev |CV-dev |LS-dev-clean |LS-dev-other |RT03 | | ||
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | | ||
| Large |313 |158 |243 |303 |304 |227 | | ||
| Small |331 |178 |262 |330 |325 |226 | | ||
|
||
|
||
### WER | ||
|
||
Here we summarize the decoding WER for all releasing models. All the numbers in the table are in format `viterbi WER -> beam search WER`. | ||
|
||
|Achitecture |# Param |nov92 |TL-test |CV-test |LS-test-clean |LS-test-other |Hub05-SWB |Hub05-CH | | ||
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | ||
|Transformer |300 mil |3.4 → 2.9 |7.6 → 5.5 |15.5 → 11.6 |3.0 → 3.2 |7.2 → 6.4 |6.8 |11.6 | | ||
|Transformer |70 mil |4.5 |9.4 |19.8 |4 |9.7 |7.5 |13 | | ||
|Conformer |300 mil |3.5 |8.4 |17 |3.2 |8 |7 |11.9 | | ||
|Conformer |87 mil |4.3 |8.7 |18.2 |3.7 |8.6 |7.3 |12.2 | | ||
|Conformer |28 mil |5 |10.5 |22.2 |4.7 |11.1 |8.8 |13.7 | | ||
|
||
Decoding is done with lexicon-based beam-search decoder using 200k common crawl lexicon and small common crawl lm. | ||
* [tokens](https://[dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/tokens.txt](http://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/tokens.txt)) | ||
* [inference lexicon](https://dl.fbaipublicfiles.com/wav2letter/rasr/tutorial/lexicon.txt) | ||
* Decoding parameters: | ||
|
||
|Achitecture |# Param |LM Weight |Word Score |Beam Size | | ||
| :---: | :---: | :---: | :---: | :---: | | ||
|Transformer |300 mil |1.5 |0 |50 | | ||
|Transformer |70 mil | | | | | ||
|Conformer |300 mil | | | | | ||
|Conformer |87 mil | | | | | ||
|Conformer |28 mil |2 |0 |50 | | ||
|
||
## Tutorial | ||
|
||
To simply serialize all the model and interact with them, please refer to the Flashlight tutorials as in [here](https://github.com/facebookresearch/flashlight/tree/master/flashlight/app/asr/tutorial). | ||
|
||
|
||
|
||
## Citation | ||
|
||
``` | ||
@article{likhomanenko2020rethinking, | ||
title={Rethinking Evaluation in ASR: Are Our Models Robust Enough?}, | ||
author={Likhomanenko, Tatiana and Xu, Qiantong and Pratap, Vineel and Tomasello, Paden and Kahn, Jacob and Avidov, Gilad and Collobert, Ronan and Synnaeve, Gabriel}, | ||
journal={arXiv preprint arXiv:2010.11745}, | ||
year={2020} | ||
} | ||
``` |