You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
But I think this code has some discrepancy with the original paper and original theano implementation and may lead to error. In original paper and code, in Sample Level prediction, sample input is partitioned into overlapping frames with length frame_size. For example, if the seq_input is (batch, seq_len), sample level input would consist of seq_input[:, 0:frame_size], seq_input[:, 1:frame_size+1], seq_input[:, 2:frame_size+2]... As a result sample level input would have shape [total_number_of_overlapping_frames(batch*seq_len), frame_size]. In the original theano implemention, function images2neibs did the work, you can find it here: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/2a3dbdf9eb00f03e64adf58e6780e2a48b9ff6dc/models/two_tier/two_tier.py#L394
I am confused whether this has been implemented in the sample_level_prediction function? I found this issue because I cannot generate useful audio when frame_size is other than 2.
Also please dont hesitate to correct me if I am wrong somewhere.
Best regards,
Nic
The text was updated successfully, but these errors were encountered:
Hi there,
Thank you for your work! It's lot's of help.
But I think this code has some discrepancy with the original paper and original theano implementation and may lead to error. In original paper and code, in Sample Level prediction, sample input is partitioned into overlapping frames with length frame_size. For example, if the seq_input is (batch, seq_len), sample level input would consist of seq_input[:, 0:frame_size], seq_input[:, 1:frame_size+1], seq_input[:, 2:frame_size+2]... As a result sample level input would have shape [total_number_of_overlapping_frames(batch*seq_len), frame_size]. In the original theano implemention, function images2neibs did the work, you can find it here: https://github.com/soroushmehr/sampleRNN_ICLR2017/blob/2a3dbdf9eb00f03e64adf58e6780e2a48b9ff6dc/models/two_tier/two_tier.py#L394
I am confused whether this has been implemented in the sample_level_prediction function? I found this issue because I cannot generate useful audio when frame_size is other than 2.
Also please dont hesitate to correct me if I am wrong somewhere.
Best regards,
Nic
The text was updated successfully, but these errors were encountered: