Proposal: OpenUtau g2p phonemizers for machine learning voicebanks #46

oxygen-dioxide · 2023-08-07T07:32:35Z

oxygen-dioxide
Aug 7, 2023
Maintainer

Currently for some languages supported by machine learning renderers like ENUNU, users have to manually input phoneme instead of words. This proposal suggests a set of G2P phonemizers for these languages that enable user to input words and use existing ustx project files.

How does it work

Take French as an example

Convert the lyrics into phonemes with g2p. rangée → rr en jj ei
Split the phoneme list into syllables. rangée → rr | en jj | ei (Here | means the border between notes.)
Distribute syllables to notes. Users can use + to place a syllable or use +~ or +* to extend the current syllable, like how we use EN VCCV in OpenUtau.
Run timing model to get the duration of each phoneme.
Align syllables to notes

Dictionary format

Each phonemizer corresponds to a dictionary named <type>-<lang>.yaml. For example, the ENUNU French phonemizer uses "enudict-fr.yaml" in the voicebank. Voicebank developers should place a yaml dictionary in their voicebanks.

The dictionary consists of 3 parts: "replacements", "symbols" and "entries".

replacements

This part isn't necessary in a dictionary. The voicebank may use a different phoneme set from the phoneme set used by the G2P in OpenUtau. This part tells OpenUtau how to convert the phonemes produced by G2Ps to the phonemes supported by the voicebank.

symbols

This part is necessary in a dictionary. It tells OpenUtau which phonemes are vowels and which are consonants. OpenUtau require these infomations to split words into syllables.

entries

This part isn't necessary in a dictionary. Voicebank developers can use this part to define some unique words.

example

Here is an example French dictionary:

replacements:
- {from: an, to: en}
- {from: eu, to: ee}
- {from: un, to: in}
symbols:
#vowels
- {symbol: ii, type: vowel}
- {symbol: ei, type: vowel}
- {symbol: ai, type: vowel}
- {symbol: aa, type: vowel}
- {symbol: oo, type: vowel}
- {symbol: au, type: vowel}
- {symbol: ou, type: vowel}
- {symbol: uu, type: vowel}
- {symbol: ee, type: vowel}
- {symbol: oe, type: vowel}
- {symbol: in, type: vowel}
- {symbol: en, type: vowel}
- {symbol: on, type: vowel}
#consonants
- {symbol: bb, type: stop}
- {symbol: ch, type: affricate}
- {symbol: dd, type: stop}
- {symbol: ff, type: fricative}
- {symbol: gg, type: stop}
- {symbol: jj, type: affricate}
- {symbol: kk, type: stop}
- {symbol: ll, type: liquid}
- {symbol: mm, type: nasal}
- {symbol: nn, type: nasal}
- {symbol: pp, type: stop}
- {symbol: rr, type: liquid}
- {symbol: ss, type: fricative}
- {symbol: tt, type: stop}
- {symbol: vv, type: fricative}
- {symbol: ww, type: semivowel}
- {symbol: yy, type: semivowel}
- {symbol: zz, type: fricative}
entries:
- grapheme: openutau
  phoneme: [oo pp ee nn uu tt au]

Note that I don't know French, so I write this documentation based on existing resources and codes, including https://enunufr.carrd.co/ and FrenchVCCVPhonemizer.cs. If there are anything wrong in my documentation, feel free to tell me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: OpenUtau g2p phonemizers for machine learning voicebanks #46

{{title}}

Replies: 0 comments

Select a reply

Proposal: OpenUtau g2p phonemizers for machine learning voicebanks #46

oxygen-dioxide Aug 7, 2023 Maintainer

How does it work

Dictionary format

replacements

symbols

entries

example

Replies: 0 comments

oxygen-dioxide
Aug 7, 2023
Maintainer