Skip to content

A collection of resources related to the Lahjoita puhetta speech corpus.

Notifications You must be signed in to change notification settings

aalto-speech/lahjoita-puhetta-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 

Repository files navigation

Lahjoita puhetta resources

A collection of resources related to the Lahjoita puhetta speech corpus. Described in the paper Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks

The corpus on Kielipankki

Hybrid HMM/DNN ASR system

Hybrid HMM/DNN ASR system built with Kaldi, and language models:

Hybrid, semisupervised HMM/DNN ASR system

Semisupervised HMM/DNN ASR system built with Kaldi using 100h of transcribed and 1600h of untranscribed data:

AED recipe

The SpeechBrain AED recipe can be found here: https://github.com/aalto-speech/speechbrain-lahjoita-puhetta-baseline

Even if you're not familiar with SpeechBrain, the hyperparams/Full-B-50s.yaml hyperparameter file should be relatively easy to read, if you're interested in specific hyperparameter choices.

Wav2Vec2 fine-tuned with CTC

Metadata classification

About

A collection of resources related to the Lahjoita puhetta speech corpus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •