-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions and documentation #7
Comments
And in regards to Speech Upsampling or Speech Superresolution: |
I'm not sure but I'm guessing GAN. It generates, discriminates, and has an adversary. I'm new to this stuff though, and just playing with it as a hobby and learning.
It's working for me with 24000 hz 16 bit wav's made in audacity. The audio pairs should be around 15 seconds or less each (seems okay to go slightly over that, as long as your system has enough ram.)
You could use those if you like. I tried out the JSV I think it's called, it worked well. I just removed any very short clips. Finally I switched to using audio books and used audacity to label the sounds with minimum 6 seconds. (Short clips can cause the process to crash). You just need to build a parallel dataset of audio, of your own voice and target.
Yes, I can confirm it does. If you want to hear a sample, I'll be sharing my english results in the yukarin discord. I had decent results with 212 audio pairs (some phonemes were silent or missing and the audio was more wobbly), and very good/better results with 512. I might try 1,000 in the future.
It might have been because it was only showing the stage 1 training, I'm unsure. To me, the second stage of training (using the pix2pix I think it is (where it's generating a higher quality sound by turning the audio into a picture) seem to really bring the quality and naturalness back to it again. I learned not to judge it too much on the stage 1 quality, wait for second stage to truly appreciate what it can do. It's very impressive IMO. I have not tried the real-time conversion yet, I'm going to soon. It could be that the real-time conversion has lower quality to speed up processing. I'm hoping I can achieve the quality I've seen in my test output wav's without too much delay, but I'll be finding out soon. Those repositories you linked are all very cool and interesting, however this was the only series of projects that seemed to offer real time conversion. Does anyone know if it's possible to adapt any of those other projects to become real time? Or did I miss one of them that actually does offer real time conversion? |
The text was updated successfully, but these errors were encountered: