You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I'm following along correctly, it looks to me like the model in train_tacotron is only used to extract the alignment layer which is then saved and used in train_foward's model.
When using train_tacotron on a single speaker dataset of ~100k English utterances, I'm seeing a divergence between Val/Loss and Val/Attention_Score around step 15,000 (batch size 22). Val/Loss keeps decreasing, but Val/Attention_Score starts to drop as well. This continues down through my modified training schedule (which I created after seeing this in the original schedule).
It doesn't look to me like the alignments are cherry picked from the model with the best Val/Attention_Score? I can't think of a downside to implementing that? Or am I missing something?
Was the original schedule with changes at distances of 10k steps based on the ~10k utterances in ljspeech, and would you suggest I dramatically increase the steps for my data? Or was the original schedule just the result of tuning/experimentation?
Any ideas what might be causing the divergence around step 15k? Thinking it was simple overfitting I've tried increasing dropout significantly but still see the same overall phenomenon.
The text was updated successfully, but these errors were encountered:
Further testing shows that Val/Attention_Score starts to deviate when r drops according to the schedule (eg down from 5 to 3). It seems like my data set doesn't play well with the shifting outputs_per_step.
Hi, what's the exact schedule you are using? Usually I see a slight drop in the attention score over time, but not too much. Its also questionable whether a higher score is necessarily better, it just needs to be decent imo. In my experience it is just important to get the Tacotorn down to r=1 reduction with no attention breaks. If you post your tensorboard plot I could probably give a hint about the schedule to use. Sometimes it requires a bit of tweaking (also check if your samples contain a lot of silences).
If I'm following along correctly, it looks to me like the model in train_tacotron is only used to extract the alignment layer which is then saved and used in train_foward's model.
When using train_tacotron on a single speaker dataset of ~100k English utterances, I'm seeing a divergence between Val/Loss and Val/Attention_Score around step 15,000 (batch size 22). Val/Loss keeps decreasing, but Val/Attention_Score starts to drop as well. This continues down through my modified training schedule (which I created after seeing this in the original schedule).
It doesn't look to me like the alignments are cherry picked from the model with the best Val/Attention_Score? I can't think of a downside to implementing that? Or am I missing something?
Was the original schedule with changes at distances of 10k steps based on the ~10k utterances in ljspeech, and would you suggest I dramatically increase the steps for my data? Or was the original schedule just the result of tuning/experimentation?
Any ideas what might be causing the divergence around step 15k? Thinking it was simple overfitting I've tried increasing dropout significantly but still see the same overall phenomenon.
The text was updated successfully, but these errors were encountered: