Update LTS rules from FreeTTS #435

dhowe · 2017-05-09T05:30:39Z

Download source here and compare: https://freetts.sourceforge.io/

See also: https://sourceforge.net/p/freetts/discussion/137669/thread/ed3b643f/
https://sourceforge.net/p/cmusphinx/discussion/sphinx4/thread/76f4bdde/?limit=25

and paper: “Letter-to-sound rules for automatic translation of english text to phonetics"

cqx931 · 2017-05-09T05:51:21Z

Other LTS options: MaryTTS, The Festival Speech Synthesis System, CMU Sphinx

cqx931 · 2017-05-10T08:07:30Z

1.The content in rita_lts.js is the same as the file freetts-1.2.2_src/freetts-1.2.2/com/sun/speech/freetts/lexicon/cmudict04_lts.txt, except that all the ax are replaced with ah. No new lts file is found in the latest freetts download.

Some clues of how the file is generated:
https://sourceforge.net/p/freetts/discussion/137669/thread/4621013d/
http://www.festvox.org/docs/manual-2.4.0/festival_13.html#Building-letter-to-sound-rules
Reason why freetts didn't update the lts file to the latest cmudict
https://sourceforge.net/p/freetts/discussion/137669/thread/4e7c5229/?limit=25#5eb4

2.In the first post you mentioned, the author was trying to do the same thing as we do.
From my research, she was trying to use the lts rules for this project: Poemage, a Visualization Tool in Support of Close Reading. I downloaded the source code, but I was only able to find cmudict04_lts.txt.
So I guess she didn't generate the file in the end, but I still reached out to her in email and hopefully she could provide us with some useful information.

3.The rules from the paper “Letter-to-sound rules for automatic translation of english text to phonetics" could be used as a reference if we want to fix the LTS results with some extra rules.

cqx931 · 2017-05-16T11:54:34Z

One thing I think that worth to mention here is that an update to the latest cmu dictionary might not improve the quality of LTS result (if the standard is the percentage of the same pronunciation given from the dictionary)

The aim of LTS rules is not to generate perfect pronunciation, but acceptable ones. Therefore for many cases, they are acceptable but is different than what we have in the dictionary
Cause the algorithm evaluates the letter to sound rules based on a rule format:
[letter before]letter[letter behind] the "e" in "betray" and "abet" are considered in the same way. Cases like this won't be improved by an update version. Adding extra phonetics rules based on “Letter-to-sound rules for automatic translation of english text to phonetics" should help to improve the results in similar cases
Sometimes the issue even doesn't lie in lts but in the original cmu dictionary.
For example:
When I check the most two common differences we got from last comparison,
"-ness" and "-ed". They already have different pronunciation in cmu dictionary.

vanness  v ae0 n iy1 s
vastness v ae1 s t n ax s

vetted  v eh1 t ih0 d
vested  v eh1 s t ax d

Therefore, when they are evaluated with probability, there must be certain cases when the result is not equal to the dictionary. And they are also hard to solve with manual rules.

dhowe · 2017-05-16T15:59:29Z

I see -- so what do you recommend?

cqx931 · 2017-05-25T03:58:16Z

According to your aims in #97

Two aims: 1) to get the outputs matching; 2) to remove unnecessary entries from lexicons (entries whose pronunciation data matches the LTS engine, and whose POS can be correctly guessed, primarily 'nn')

Updating lts rules to the latest cmu dict won't help with aim one, adding extra rules should be more helpful in this case (prefix, suffix...)
It makes sense to me to remove nns/vb*, because we still keep the corresponding nn/vb in the dictionary. But I'm not sure whether we want to remove too many entries from lexicon. This could limit the result we can get from randomWord()

dhowe · 2017-05-25T04:09:43Z

Are there some generic rules we can try to improve the LTS ? Alternatively, we might want to update the pronunciation in the dictionary to what is generated by LTS, if it is also correct (but not the same).

Am I correct that we the current dictionary has vb* , but not nns ? I think it will require too much work on the analysis side (containsWord, for example) to remove the vb* as well...

dhowe added the PRIORITY: High label May 9, 2017

dhowe assigned cqx931 May 9, 2017

dhowe added PRIORITY: Medium and removed PRIORITY: High labels Jul 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update LTS rules from FreeTTS #435

Update LTS rules from FreeTTS #435

dhowe commented May 9, 2017 •

edited

Loading

cqx931 commented May 9, 2017

cqx931 commented May 10, 2017 •

edited

Loading

cqx931 commented May 16, 2017 •

edited

Loading

dhowe commented May 16, 2017

cqx931 commented May 25, 2017

dhowe commented May 25, 2017 •

edited

Loading

Update LTS rules from FreeTTS #435

Update LTS rules from FreeTTS #435

Comments

dhowe commented May 9, 2017 • edited Loading

cqx931 commented May 9, 2017

cqx931 commented May 10, 2017 • edited Loading

cqx931 commented May 16, 2017 • edited Loading

dhowe commented May 16, 2017

cqx931 commented May 25, 2017

dhowe commented May 25, 2017 • edited Loading

dhowe commented May 9, 2017 •

edited

Loading

cqx931 commented May 10, 2017 •

edited

Loading

cqx931 commented May 16, 2017 •

edited

Loading

dhowe commented May 25, 2017 •

edited

Loading