Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The sample level prediction function could be incorrect? (correct me if Im wrong) #18

Open
guozixunnicolas opened this issue Nov 3, 2019 · 0 comments

Comments

@guozixunnicolas
Copy link

Hi there,

Thank you for your work! It's lot's of help.

But I think this code has some discrepancy with the original paper and original theano implementation and may lead to error. In original paper and code, in Sample Level prediction, sample input is partitioned into overlapping frames with length frame_size. For example, if the seq_input is (batch, seq_len), sample level input would consist of seq_input[:, 0:frame_size], seq_input[:, 1:frame_size+1], seq_input[:, 2:frame_size+2]... As a result sample level input would have shape [total_number_of_overlapping_frames(batch*seq_len), frame_size]. In the original theano implemention, function images2neibs did the work, you can find it here: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/2a3dbdf9eb00f03e64adf58e6780e2a48b9ff6dc/models/two_tier/two_tier.py#L394

I am confused whether this has been implemented in the sample_level_prediction function? I found this issue because I cannot generate useful audio when frame_size is other than 2.

Also please dont hesitate to correct me if I am wrong somewhere.

Best regards,

Nic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant