Skip to content

Latest commit

 

History

History
19 lines (12 loc) · 1.42 KB

README.md

File metadata and controls

19 lines (12 loc) · 1.42 KB

E2E-NER-for-spoken-Finnish

This repository contains two approaches for doing end-to-end named entity recognition from speech.

The "augmented labels" approach is similar to the standard attention-based encoder-decoder speech recognition, but instead of using normal transcripts, named entity augmented transcripts are used. Example Canada LOC is O cold O , O said O Lucas PER

The "multi-task" approach is attention-based encoder-decoder model, consisting of two decoder branches: one for ASR and one for NER. The encoder is shared between the branches.

The models are evaluated on Finnish, Swedish and English data sets.

As input features, we used mean-normalized logarithmic filter banks with 25ms windows length, an overlap every 10ms, and 40 filters. An example of feature extraction from wav files can be found in helper_functions/extract_features.py.

To install the requirements, run: pip install -r requirements.txt.

The multi-task models rely on pre-trained fastText embeddings. The English embeddings can be downloaded from: https://dl.fbaipublicfiles.com/fasttext/vectors-english/crawl-300d-2M-subword.zip . The Swedish embeddings can be downloaded from: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.sv.300.bin.gz . The Finnish embeddings can be downloaded from: https://dl.fbaipublicfiles.com/fasttext/vectors-crawl/cc.fi.300.bin.gz .

The embeddings need to be placed in weights/embeddings, for each of the models.