Bad Alignment #69

neuronx1 · 2022-01-25T18:39:12Z

thanks for your great repository.

Unfortunatley I get really bad results, I think the reason is because of bad alignment.

I train the models on a german dataset, containing 900 samples, each between 5 and 30 seconds. The sampling rate is 22050 and they are 16 bit (mono). I ran your preprocessing step.
My tensorboard looks like this (as you can see there is no alignment).

What's the reason for this and how can I solve it?
I really appreciate every help!

Thanks in advance!

cschaefer26 · 2022-01-26T07:03:05Z

Hi, could you show the attention score? The generated attention does not matter, what's used for duration extraction is the ground truth aligned one. 900 samples is quite few for generating attention with tacotron - what language are the samples in and are you using phonemes? For a small dataset like this one could try to pretrain a tacotron model on a different dataset until attention is built up and then continue training on the smaller dataset. Also, it could make sense to set the trim_long_silences=True and vad_max_silence_length=6 or so for shorter silent parts in the audios, which helps attention to build up.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad Alignment #69

Bad Alignment #69

neuronx1 commented Jan 25, 2022 •

edited

Loading

cschaefer26 commented Jan 26, 2022

Bad Alignment #69

Bad Alignment #69

Comments

neuronx1 commented Jan 25, 2022 • edited Loading

cschaefer26 commented Jan 26, 2022

neuronx1 commented Jan 25, 2022 •

edited

Loading