Skip to content

Commit

Permalink
Updated univeral
Browse files Browse the repository at this point in the history
  • Loading branch information
dmort27 committed Aug 16, 2024
1 parent 5fdd4fe commit 74700bd
Showing 1 changed file with 16 additions and 2 deletions.
18 changes: 16 additions & 2 deletions _projects/10_universal.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
---
layout: page
title: Universal Phone Recognition
description: recognizing phonetic units in a language-neural fashion
description: Recognizing phonetic units in a language-neural fashion
img: assets/img/gruyere-tower-proj.jpg
importance: 1
category: speech
related_publications: true
---

Building both on past efforts at universal phone recognition {% cite li2020universal yan2021differentiable %}and current self-supervised speech models, we aim to build high-accuracy models that can transcribe speech as IPA with the same reliability as a human linguist.
Modern ASR systems typically units larger than an individual sound. However, sometimes it is desirable to recognize individual sounds, whether as structural units of a particular language (phonemes) or as language-neural idealizations of an acoustic/articulatory unit (phones). Recognizing phones is valuable for a variety of applications:

- Language documentation
- Very low resource ASR
- Zero-shot language identification from speech
- Analysis of atypical speech (e.g., dysarthric or non-native speech)
-
However, existing universal ASR systems suffer from a couple of deficits:

- They display very high phone error rates
- They do not handle some important phenomena like tone.

Tone and other **suprasegmentals** are very challenging because they are really phonological rather than phonetic. All speech, for example, displays acoustic variation in frequency, but only some languages use this variation to distinguish words from each other. Thus, hard work is required in order to know how to characterize tone in a language-neural fashion.

Building both on past efforts at universal phone recognition {% cite li2020universal yan2021differentiable %}and current self-supervised speech models, we aim to build high-accuracy models that can transcribe speech as IPA (International Phonetic Alphabet) with the same reliability as a human linguist.

0 comments on commit 74700bd

Please sign in to comment.