-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, does there any Chinese support? #1
Comments
Yes, the model supports 3 Accents/Dialects of Chinese, namely:
To use one of these languages, you need to:
def phonemize_text(text: List[str] | str):
return phonemize(text, language="your identifier", backend="espeak", strip=True, preserve_punctuation=True, with_stress=True, tie=True, njobs=8)
|
@daniilrobnikov looks like the code gives potential support for Chinese, but there any trained models which can test the effect of Chinese? |
Good question! I haven't tested the model on Chinese dataset yet. At this point, I have trained the model on LJSpeech dataset for 18k steps out of 800k, and here are the results: download.mp4If you are interested, we can collaborate to train the model for Chinese |
@daniilrobnikov for sure. I would like to help. vits the one most elegant tts model i have ever seen, if there might be a multilang version of it espacially for Chinese would be very useful. For now, I can contribute Chinese dataset at the begining, In Chinese, mostly using Biaobei dataset. the dataset can be download from: https://weixinxcxdb.oss-cn-beijing.aliyuncs.com/gwYinPinKu/BZNSYP.rar Here the cn lexions which is special for Chinese that has tones 1,2,3,4 in Chinese pinyin.
Would u like me provide more assist for Chinese? |
Thanks for sharing the code, I would greatly appreciate any support. |
Great. Hoping for your newly updates. Yes, Chinese characters need convert to pinyin first, this should be labeled in GT data. |
Heads-up on the Chinese support: All you need is to assign the language identifier (found in here) to the data:
...
language: "id"
... Also, currently I am working on the sub-word tokenizer for phonemes |
@daniilrobnikov thanks for the updates, looks very promising. so for Chinese it should using Using the symbols I provided above, the model can work out of box as expected enough (although we can tune the performance slightly even further). |
As far as I understand, for Mandarin-Chinese you would use from phonemizer import phonemize
text = "语音合成技术是实现人机语音通信关键技术之一"
phonemes = phonemize(text, language="cmn", backend="espeak",
strip=True, preserve_punctuation=True, with_stress=True)
print("text: \t\t", text)
print("phonemes: \t", phonemes) And the output from the phonemizer would look something like that:
NOTE: Here you can see that it not only For now, you can _pad = "_"
tones = "12345"
_symbols = " !\"',-.:;?abcdefhijklmnopqrstuvwxyz¡«»¿æçðøħŋœǀǁǂǃɐɑɒɓɔɕɖɗɘəɚɛɜɝɞɟɠɡɢɣɤɥɦɧɨɪɫɬɭɮɯɰɱɲɳɴɵɶɸɹɺɻɽɾʀʁʂʃʄʈʉʊʋʌʍʎʏʐʑʒʔʕʘʙʛʜʝʟʡʢʰʲʷʼˈˌːˑ˔˞ˠˡˤ˥˦˧˨˩̴̘̙̜̝̞̟̠̤̥̩̪̬̮̯̰̹̺̻̼͈͉̃̆̈̊̽͆̚͡βθχᵝᶣ—‖“”…‿ⁿ↑↓↗↘◌ⱱꜛꜜ︎ᵻ"
symbols = list(_pad) + list(_symbols) + list(tones) I tried to tokenize this text after import torch
from utils.hparams import HParams
from utils.model import intersperse
from text import text_to_sequence, sequence_to_text, PAD_ID
def get_text(text: str, hps) -> torch.LongTensor:
text_norm = text_to_sequence(text, hps.data.text_cleaners, language=hps.data.language)
if hps.data.add_blank:
text_norm = intersperse(text_norm, PAD_ID)
text_norm = torch.LongTensor(text_norm)
return text_norm
hps = {
"data": {
"text_cleaners": ["phonemize_text"],
"language": "cmn",
"add_blank": False,
}
}
hps = HParams(**hps)
text = "语音合成技术是实现人机语音通信关键技术之一"
text = get_text(text, hps)
print(sequence_to_text(text.numpy())) ! For now, add tones to text/cleaners.py and it should be enough to start training on Mandarin |
Nice, for Chinese, things actually a little bit complicated, there is a special situation which same character pronouce differently in different sentences, this is called polyphone, For instance:
means, since it's going to rain, am going to take my cloth inside. This word This should not effect training, since the pinyin should existed already in labeled data, but when in inference, phonmeizer mgiht can not handle this correclty. this is actually a special part for Chinese only, may people have to using another model like BERT to predict correctly polyphone to pinyin, and then send to tokenizer part. Am not sure if phonemizer can handle this. I have tried some multilang TTS system like PIPE, it actually can not handle this, so from mother language listen to their result are very weired. |
I see, so for Chinese language it is way more complicated. As I understand, in context of Speech Synthesis, using phonemizer or any other g2p conversion is just bridge to make audio more correct. Meaning that during training, the model should account for such mistakes if it sees them in the training dataset. I tested the model on English and Bengali, and the results are almost indistinguishable from source TTS.mp4Compared to the ground truth: GT.mp4This is an audio file from test set, so the model didn't see it during training. Considering the results, so far, I think it should work fine for Chinese, as well. |
Also, I updated tokenizer and vocabulary symbols, so the model should But, if you want to improve the results of g2p convesion, there is a paper |
Hi, does there any Chinese support?
The text was updated successfully, but these errors were encountered: