generated from alshedivat/al-folio
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
16 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,25 @@ | ||
--- | ||
layout: page | ||
title: Universal Phone Recognition | ||
description: recognizing phonetic units in a language-neural fashion | ||
description: Recognizing phonetic units in a language-neural fashion | ||
img: assets/img/gruyere-tower-proj.jpg | ||
importance: 1 | ||
category: speech | ||
related_publications: true | ||
--- | ||
|
||
Building both on past efforts at universal phone recognition {% cite li2020universal yan2021differentiable %}and current self-supervised speech models, we aim to build high-accuracy models that can transcribe speech as IPA with the same reliability as a human linguist. | ||
Modern ASR systems typically units larger than an individual sound. However, sometimes it is desirable to recognize individual sounds, whether as structural units of a particular language (phonemes) or as language-neural idealizations of an acoustic/articulatory unit (phones). Recognizing phones is valuable for a variety of applications: | ||
|
||
- Language documentation | ||
- Very low resource ASR | ||
- Zero-shot language identification from speech | ||
- Analysis of atypical speech (e.g., dysarthric or non-native speech) | ||
- | ||
However, existing universal ASR systems suffer from a couple of deficits: | ||
|
||
- They display very high phone error rates | ||
- They do not handle some important phenomena like tone. | ||
|
||
Tone and other **suprasegmentals** are very challenging because they are really phonological rather than phonetic. All speech, for example, displays acoustic variation in frequency, but only some languages use this variation to distinguish words from each other. Thus, hard work is required in order to know how to characterize tone in a language-neural fashion. | ||
|
||
Building both on past efforts at universal phone recognition {% cite li2020universal yan2021differentiable %}and current self-supervised speech models, we aim to build high-accuracy models that can transcribe speech as IPA (International Phonetic Alphabet) with the same reliability as a human linguist. |