Attempting to adapt to English #6

aolney · 2014-12-18T21:38:20Z

I'm a Kaldi noob but interested in using your set up for English. I looked at your other project and the Kaldi discussion boards, and this model seems like a good fit

http://kaldi-asr.org/downloads/build/8/trunk/

However I'm not sure how to adapt your Makefile to use the new model. It seems I would need to at least swap out these lines:

# Main language model (should be slightly pruned), used for rescoring
LM ?=language_model/pruned.vestlused-dev.splitw2.arpa.gz

# More aggressively pruned LM, used in decoding
PRUNED_LM ?=language_model/pruned6.vestlused-dev.splitw2.arpa.gz

COMPOUNDER_LM ?=language_model/compounder-pruned.vestlused-dev.splitw.arpa.gz

# Vocabulary in dict format (no pronouncation probs for now)
VOCAB?=language_model/vestlused-dev.splitw2.dict

but I'm not finding comparable files in Fisher.

The text was updated successfully, but these errors were encountered:

aolney · 2014-12-23T23:12:23Z

I'm succesfully running with fisher using the set up described here:

http://kaldi.sourceforge.net/online_decoding.html

I still haven't been able to figure out how to merge this with your set up (as above).

alumae · 2014-12-28T09:47:00Z

I'm planning to add English transcription system, and a language ID module so that the correct transcription path would be selected automatically for each utterance (with an option to force a specific language for the whole recording). Hopefully next week.

aolney · 2014-12-29T00:05:11Z

Happy to help if I can. I'm taking a look at an open source Reverb system which is based on Kaldi

http://reverb2014.dereverberation.com/workshop/reverb2014-papers/1569884459.pdf

devadvance · 2015-02-24T17:30:26Z

New to this , but looking to working on English as well, same as @aolney . You mention the language models are pruned; possibly adding guidance on building the LMs would definitely help expand the supported language pool faster.

Happy to help if I can as well!

nshmyrev · 2015-02-25T12:55:40Z

I have created a language model construction outline which might be interesting for you

http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced

Overall, language model training is pretty complex process with some specifics. To use those models with Fisher you have to recompile the graph so it will take some preparation.

alumae · 2015-03-10T15:44:31Z

Sorry for the delay. I started thinking that probably the English adaptation that I'm planning is not going to be satisfactory for most people. I think that those who wait for the English version want to use it in practice for transcribing actual data (interviews, speeches, whatever), and expect high accuracy. I'm not going to implement an English system that will be really usable for this because I don't have training data for English.

aolney · 2015-03-10T19:42:09Z

So it's pretty straightforward to set up LIUM and Kaldi's Fisher English pre-built models to get a rough and ready transcription system going (I've basically already done this). What I don't have is the improvements in your system, for example multi-pass decoding. Would you be interested in providing some docs for how it would be accomplished using your set up?

sukantaGit · 2015-04-22T17:10:03Z

@alumae, you have done good work with your system. It will be really good if English can be used with it. Have you thought about using Voxforge for English ? It is freely available. Alternately, you may consider Fisher English as @aolney suggested. Please advise what you think.

riebling · 2015-05-28T15:06:48Z

Hey, guys. We've been working with this system too, and have some tutorials available as well as an example of English adapted system, based on the Kaldi tedlium experiment.

First, the language model building tutorial that applies to Kaldi experiments:

http://speechkitchen.org/kaldi-language-model-building/

The adapted Kaldi Offline Transcriber for English, also in a VM, using the tedlium:

http://speechkitchen.org/tedlium-vm/

There are also Docker and Vagrant versions available on our Github here, including ones based on switchboard and tedlium:

https://github.com/srvk/srvk-sandbox

We'd be thrilled to have people take a look, try out, and provide feedback on any of these VMs!

vince62s · 2015-10-19T11:20:41Z

Folks, did you end up findong to set up this with other models ?
speechkitchen could be a solution but does not learn how to do things step by step.

what is exactly needed for LM / PRUNED_LM / COMPOUNDER_LM and VOCAB ?
what is the difference between PRUNED and COMPOUNDER ?

is this all what we need to modify ?

Cheers,

riebling · 2015-10-19T14:37:40Z

We have changed to use other models by brute force; taking out
much of the Estonian and replacing with parts of Kaldi recipes that
do decoding (for example the tedlium recipe). It mostly requires
performing surgery on the Makefile. :)

In particular, for English we do only one pass of decoding, with only
one LM and decoding graph, and skip compounding.

I recently updated a system to use even more different decoding:
neural net decoding based on Yajie Miao's EESEN https://github.com/yajiemiao/eesen.

You could find the resulting code on the SRVK repo here:
https://github.com/srvk/eesen-transcriber

The changes occur primarily in the Makefile. I have copied the two sections
which do the decoding:

RAPH_DIR?=$(EESEN_ROOT)/asr_egs/tedlium-fbank/data/lang_phn_test_pruned.lm3
MODEL_DIR?=$(EESEN_ROOT)/asr_egs/tedlium-fbank/exp/train_l4_c320

FBANK calculation

example target:

make build/trans/HVC000037/fbank

note the % pattern matches e.g. HVC000037

build/trans/%/fbank: build/trans/%/spk2utt
rm -rf $@
steps_eesen/make_fbank.sh --fbank-config conf/fbank.conf --cmd "$$train_cmd" --nj 1
build/trans/$* build/trans/$/exp/make_fbank $@ || exit 1
steps/compute_cmvn_stats.sh build/trans/$ build/trans/$*/exp/make_fbank $@ || exit 1
echo "feature generation done"
date +%s%N | cut -b1-13

Decode with Eesen & 8kHz models

example target

make build/trans/HVC000037/eesen8/decode/log

build/trans/%/eesen8/decode/log: build/trans/%/spk2utt build/trans/%/fbank
rm -rf build/trans/$/eesen8 && mkdir -p build/trans/$/eesen8
(cd build/trans/$/eesen8; for f in $(MODEL_DIR)/; do ln -s $$f; done)
ln -s $(GRAPH_DIR) pwd/build/trans/$/eesen8/graph
steps_eesen/decode_ctc_lat.sh --cmd "$$decode_cmd" --nj $(njobs) --beam 30.0 --max-active 5000
--skip_scoring true
--acwt 1.0 $(GRAPH_DIR) build/trans/$ dirname $@ || exit 1;

On 10/19/2015 07:20 AM, vince62s wrote:

Folks, did you end up findong to set up this with other models ?
speechkitchen could be a solution but does not learn how to do things step by step.

what is exactly needed for LM / PRUNED_LM / COMPOUNDER_LM and VOCAB ?
what is the difference between PRUNED and COMPOUNDER ?

is this all what we need to modify ?

Cheers,

—
Reply to this email directly or view it on GitHub #6 (comment).

Eric Riebling Interactive Systems Lab
[email protected] 407 South Craig St.

michierus · 2018-05-14T02:24:51Z

Hey @alumae ,

do you know if I can use the already files generated from http://www.openslr.org/11/
with you code?

I am looking for to use files already to just transcribe the English audio file to text,

thank you

riebling · 2018-05-14T14:50:35Z

Let me see if I understand correctly: you want to use language models from OpenSLR instead of ones included with Eesen offline transcriber? If all you want to do is transcribe English audio to text, http://github.com/srvk/eesen-offline-transcriber includes models and is intended to do exactly this.

On the other hand, if you wish to build your own language model from OpenSLR sources, that would take some work, but is not impossible. Some instructions on adapting the Eesen Offline Transcriber language model are here: http://speechkitchen.org/kaldi-language-model-building/

riebling · 2019-02-04T16:19:34Z

Please note that URLs above have changed to speech-kitchen.org

aolney changed the title ~~Adapt to English~~ Attempting to adapt to English Dec 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempting to adapt to English #6

Attempting to adapt to English #6

aolney commented Dec 18, 2014

aolney commented Dec 23, 2014

alumae commented Dec 28, 2014

aolney commented Dec 29, 2014

devadvance commented Feb 24, 2015

nshmyrev commented Feb 25, 2015

alumae commented Mar 10, 2015

aolney commented Mar 10, 2015

sukantaGit commented Apr 22, 2015

riebling commented May 28, 2015

vince62s commented Oct 19, 2015

riebling commented Oct 19, 2015

michierus commented May 14, 2018

riebling commented May 14, 2018

riebling commented Feb 4, 2019

Attempting to adapt to English #6

Attempting to adapt to English #6

Comments

aolney commented Dec 18, 2014

aolney commented Dec 23, 2014

alumae commented Dec 28, 2014

aolney commented Dec 29, 2014

devadvance commented Feb 24, 2015

nshmyrev commented Feb 25, 2015

alumae commented Mar 10, 2015

aolney commented Mar 10, 2015

sukantaGit commented Apr 22, 2015

riebling commented May 28, 2015

vince62s commented Oct 19, 2015

riebling commented Oct 19, 2015

FBANK calculation

example target:

make build/trans/HVC000037/fbank

note the % pattern matches e.g. HVC000037

Decode with Eesen & 8kHz models

example target

make build/trans/HVC000037/eesen8/decode/log

michierus commented May 14, 2018

riebling commented May 14, 2018

riebling commented Feb 4, 2019