Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to combine melGAN with feature predictor like FastSpeech or tacotron2? #17

Open
nikawool opened this issue Feb 25, 2020 · 2 comments

Comments

@nikawool
Copy link

FastSpeech: https://github.com/xcmyz/FastSpeech
How can I combine melGAN with feature predictor like FastSpeech or tacotron2?

@Liujingxiu23
Copy link

Have you tried Fastspeech combined with melgan? How is the result?

@Teravus
Copy link

Teravus commented Sep 29, 2020

I've been playing with Tacotron2's inference notebook.. but so far just noise for me.
I copied the mel2wav folder and my checkpoint log directory to the tacotron2 directory
I end up adding a section after the RemoveWaveGlow bias section of the notebook.

vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG")
recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy()
ipd.Audio(recons , rate=22050)

I've also tried;

vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG")

recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy()

meldata = mel_outputs.float()
meldata.shape
torch.Size([1, 80, 503])
rev_wav = vocoder.inverse(meldata.float())#.squeeze().cpu().numpy()
rev_wav.shape
torch.Size([1, 128768])
rev_wav.dtype
torch.float32
rev_wav2 = rev_wav.cpu().numpy()
rev_wav2.shape
(1, 128768)
ipd.Audio((rev_wav2.reshape((-1))*2**15).astype(np.int16), rate=22050)

Same results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants