Help Wanted For Stage-1 #239

xujzouyyz · 2024-05-18T10:33:09Z

I tried to train the first stage using the LJSpeech dataset provided by developer, with the Config file set as default. However, mel loss decreases to 0.5 and becomes NaN after 25 epochs. How does this happen?

Karesto · 2024-05-23T08:34:09Z

after 25 epochs ?

What's your batch size/data ?
There's a possibility that you start the TMA stage of the training ? (it should be in your config file).

kushbhatia · 2024-06-21T19:00:25Z

I am facing a similar issue. I am trying to reproduce the results of the paper and training on LJSpeech with a single GPU. As soon as the training starts the TMA stage, within 1-2 epochs the Gen and Dis loss start blowing up and eventually they NaN. I am using a batchsize of 16 and a learning rate of 1e-4. This is in the first stage of training.

Can you let me know how to stabilize this part of the training?

martinambrus · 2024-09-01T18:48:37Z

Perhaps issue #254 as well as its connected PR #253 could solve this - it did solve NaN value errors for me, although it was for 2nd stage training on a single GPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help Wanted For Stage-1 #239

Help Wanted For Stage-1 #239

xujzouyyz commented May 18, 2024

Karesto commented May 23, 2024

kushbhatia commented Jun 21, 2024

martinambrus commented Sep 1, 2024

Help Wanted For Stage-1 #239

Help Wanted For Stage-1 #239

Comments

xujzouyyz commented May 18, 2024

Karesto commented May 23, 2024

kushbhatia commented Jun 21, 2024

martinambrus commented Sep 1, 2024