-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the speech rate of generated voice #2
Comments
Thank you for your interest in our research. You asked about two things.
|
Thank you~ |
@Charlottecuc @intory89 then does it make sense to introduce audiomentations during vocoder training ? Here there was a suggestion yl4579/StarGANv2-VC#21 that it's the VC model that should be supplied with corrupted inputs, not the vocoder. |
@Charlottecuc is right here -- the model doesn't follow the speed of the source speech for UNSEEN speakers. |
Our model did not consider rhythm among the characteristics of Speaker. Please refer to SpeechSplit for related research. |
Hi, I also tried the mandarin dataset to train the model. Therefore, there are questions for me. |
1 similar comment
Hi, I also tried the mandarin dataset to train the model. Therefore, there are questions for me. |
Hi. I tested the model with the inference jupyter file your provided. It's amazing that the model can still generate good voice even if a Mandarin source file is fed as input.
However, I notice that if the speech rate of source is slow while the speech rate of target is very fast, the speech rate of generated voice will also be fast. I was wondering is it possible to tune the speech rate so that the generated voice can have the same speech rate with the source? Or, is the different speech rate caused by the mismatch of source language (Mandarin vs English, for the pretrained ASR model)?
Also, I notice that if I inference the model with noisy source file(e.g. with background of air conditioning), there will also be noise in the generated voice. Is there a way to erase the noise? Or, could you give any advice on noise-robust training/inference?
Thank you very much~ :)
The text was updated successfully, but these errors were encountered: