-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The listed pronunciation of "for" and "four" both sound like "far" #37
Comments
As far as I know this really is intentional. I have always assumed that AO is /ɔ/ whereas OW is /oʊ/ and I am not alone in this assumption. Those are not the same vowel, even in Canadian / Midwestern, cot-caught merged English. And in fact it is because of the cot-caught merger, because if "caught" (or the other one, I have no idea which is which) is transcribed "K AO T", then yes obviously if you train a TTS from cot-caught merged English (which, frankly, I suspect the younger generation speaks nearly everywhere in North America) then "for" is going to sound like "far"... |
I would add though that perhaps it is a distinction without a difference as I can't think of a minimal pair for ɔ ~ oʊ off the top of my head. But phonetically, definitely not the same, see, e.g. https://en.wiktionary.org/wiki/for#Pronunciation |
Nobody says, "one two three four" in a way that rhymes with "far", do they? |
I don't really understand your question. Perhaps I wasn't clear in what I said? Are you under the impression that the vowel in "far" is /ɔ/ (that is, Are you under the impression that |
I'll refer you to this table, tell me if there are pronunciations in cmudict that don't correspond to "General American": |
Note also this for "for" versus "four" which are the same vowel nearly everywhere in the US (I think they're the same vowel too but many Canadians may disagree): |
Also, note that "caught" is transcribed with two alternatives to cover dialects of US English with and without the merger:
Again. AO is /ɔ/. Transcribing "four" as "F AO R" is not an error. In the case of "for" the vowel can be reduced to "ER", which again, is also present in the dictionary (not sure what
|
Isn't the issue here that the Microsoft/Google pronunciations are not following what's in the cmudict? Wondering if they might be using some older version where the entries were incorrect, I went all the back to the original 0.7a commit 11 years ago; the entries mentioned are still as stated above; i.e.,
It seems to me that this isn't an instance of incorrect cmudict transcriptions. The question is then, why are those pronunciations seemingly treating "far" like "for"? I'm struggling to think of any instance of a North American dialect where this is even the case. Regarding the -or table linked to above: I do not believe anyone with the particular NA dialectic tendency to pronounce "Florida" to rhyme with "far" would ever pronounce "for" or "four" like "far" when it's a free morpheme. (So, yes, to my knowledge, no one with such a dialect tendency would say, "one, two, three, far.") |
Here's what Merriam-Webster.com has, including on its pronunciation key:
So this is not just a CMUDICT problem! Were CMUDICT pronunciations originally taken from a revision of Webster? |
@jsalsman What change(s) would you suggest to how these sounds might be encoded then? |
I think AO should be changed to OW anywhere it appears as /ɔ/ in the OED.com pronunciations for U.S. English. I'm not a fan of dipthongs, but otherwise it's just wrong. |
What vowel is AO supposed to represent, then, if it's not /ɔ/? If it's /ɒ/, that's not a phoneme of General American English. I don't understand your comment about diphthongs at all. It's not a question of whether you like them but:
I can't think of a minimal pair for /ɔ/ and /oʊ/ in GAE so yes, this might make sense to merge them, but they are quite phonetically distinct... |
I think I must be missing the point. @jsalsman If you would, please, I'd like more explanation as to the rationale for such changes. My thinking is, if the concern arises from a speech analyzer determining that (as an example) door rhymes with far, I believe the most likely applicable scenarios (in General American) are 1) the treatment of /ɔ/ as /ɑ/ ~ /ɒ/ (unlikely) and/or 2) /ɔ/ ~ /oʊ/. We can easily disregard /ɑ/~/ɒ/. Some North American speakers do realize a non-coda /ɔ/ as /ɑ/ ~ /ɒ/, but this is allophonic. Regardless, as this trait only applies to non-coda positions, if the analyzer's rationale stems from this in some way, it's simply incorrect. In any case, this would have nothing to do with cmudict. As to whether this is between /ɔ/ and /oʊ/ or perhaps /o/, then there's several things to consider. Are we only talking about vowels that occur before /r/? What mergers, if any, are we accounting for? (Several concern these sounds.) Lastly, are we even sure this software is using cmudict in the first place? Thanks. |
This has wide-ranging implications:
Microsoft/Duolingo: https://www.youtube.com/watch?v=DTj7VILryRo
Google: https://www.youtube.com/watch?v=K-tEkivp_YM
This happens because CMUDICT lists "for" as F AO R instead of F OW R, using the "ah" vowel sound in "caught" instead of the "oh" sound in "oat."
This is NOT because of the cot-caught merger, or any other linguistic reason. It is a bona fide coding error which occurs in over 50 similar entries.
I have repeatedly attempted to raise this issue with Alex Rudnicky and others, to no avail. Recommendations for a reasonable plan to approach this issue are both sorely needed and welcome.
The text was updated successfully, but these errors were encountered: