You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi I observed that you are forcing encoder attention too to be diagonal for some steps, and I found that after completion of training, the alignment remains diagonal. My question is then why we need more encoder layers if all it has to remain diagonal?? Did you see any issues when not forcing encoder attention diagonal? Any other observations?
Also often I have seen in some papers that their mel outputs are well predicted towards higher frequency region of mel spectrogram, but in all my trainings the results come a little blurry around the top regions of mel-spectrogram. Does this has to do anything with convergence? Any ideas what might be happening wrong?
The text was updated successfully, but these errors were encountered:
Hi,
in my experiments the encoder alignments are rather optional, that's why I set it to a lower number of steps than the decoder. You probably can safely set it to 0. I didn't experiment extensively, but I didn't notice a drawback. Also, without forcing this diagonality almost all the encoder heads in the aligner tend to become diagonal eventually (typically 0-1 per layer are scattered).
With fewer layers I did experience reduction in quality on the predicted mels.
In case you perform a more complete analysis it would be great to hear the results!
Hi I observed that you are forcing encoder attention too to be diagonal for some steps, and I found that after completion of training, the alignment remains diagonal. My question is then why we need more encoder layers if all it has to remain diagonal?? Did you see any issues when not forcing encoder attention diagonal? Any other observations?
The text was updated successfully, but these errors were encountered: