Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help Wanted For Stage-1 #239

Open
xujzouyyz opened this issue May 18, 2024 · 3 comments
Open

Help Wanted For Stage-1 #239

xujzouyyz opened this issue May 18, 2024 · 3 comments

Comments

@xujzouyyz
Copy link

I tried to train the first stage using the LJSpeech dataset provided by developer, with the Config file set as default. However, mel loss decreases to 0.5 and becomes NaN after 25 epochs. How does this happen?
1716028326113

@Karesto
Copy link

Karesto commented May 23, 2024

after 25 epochs ?

What's your batch size/data ?
There's a possibility that you start the TMA stage of the training ? (it should be in your config file).

@kushbhatia
Copy link

I am facing a similar issue. I am trying to reproduce the results of the paper and training on LJSpeech with a single GPU. As soon as the training starts the TMA stage, within 1-2 epochs the Gen and Dis loss start blowing up and eventually they NaN. I am using a batchsize of 16 and a learning rate of 1e-4. This is in the first stage of training.

Can you let me know how to stabilize this part of the training?

@martinambrus
Copy link

Perhaps issue #254 as well as its connected PR #253 could solve this - it did solve NaN value errors for me, although it was for 2nd stage training on a single GPU.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants