different results output during training compared to test.py #15

mrelich · 2017-07-11T18:02:40Z

I'm trying to reproduce some of the results I obtained during training by using the test.py script. Continuing to dig into this, but wondering if anyone else has come across the same issue?

mrelich · 2017-07-11T21:51:35Z

So I think I have tracked it down to the helper function (decoder_helper) in the decoder part of tacotron, but I'm still a bit of a novice so I don't really understand how to fix it. I've isolated it here by running test with the training data and selectively turning off specific areas that utilize the train bool. Going to keep digging, but I think this will be necessary to fix for everyone that wants to actually use the models they train 😄

onyedikilo · 2017-07-11T22:57:33Z

I opened an issue regarding the same problem, and I was told that In evaluation, unlike training, the output of the decoder fed back into the decoder input of next time step, so the result can be different. Closed the issue after that.

mrelich · 2017-07-11T23:25:30Z

My concern is that the InferenceHelper might not be doing what we think... Maybe the author can help out to explain why the EmbeddingHelper (either greedy or sampling) wasn't used?

Essentially I'm struggling to understand how it can go from sounding very good during training to unintelligible during inference...

Edit: Of course not using embedding layer... it's a float 😝

mrelich · 2017-07-11T23:37:58Z

@onyedikilo Ok, now I see your closed issue and how you ran into the same problem. This level of drop in audio quality just feels wrong. If you (or anyone else) checking into this finds anything, I would be super interested to know.

jpdz · 2017-07-20T04:22:37Z

Hi, I met the same problem. Do you all have any suggestions? Thanks a lot.

mrelich · 2017-07-20T14:41:35Z

Hi @jpdz, I have stepped away from working on this for the moment, but will return in a few weeks. I don't yet understand fully what the CustomHelper function that is used in the decoder actually does. In the paper, it says it passes the information at timestep t to timestep t+1, but to me it looks like at timestep t+1, it can see the entire input... But I'm probably missing something.

One idea I had is to use the standard decoder which has an embedding layer. One could multiply the mel spectrogram coefficients to a large number and then cast to an int. This would allow the use of an embedding layer, which is implemented in tensorflow and has some documentation. I will report back after trying this, but it will likely take me a month to get to it.

barronalex · 2017-07-24T11:03:49Z

Hey, very sorry I'm only just getting back to you all on this.

Although I agree it's annoying, it does make sense that there's a big drop off in quality even on the same prompt when running train.py vs test.py.

As in the original paper, the repo does not use scheduled sampling (although it is a configurable parameter in tacotron.py).

This means that in training, the decoder is given the ground truth input at every time step. When we test we don't have access to the ground truth so the next input at each time step is the output of the previous time step. This will be much noisier since we are unlikely to have perfectly synthesized the previous time step.

I highly encourage you to try changing the scheduled sampling parameter and see if it improves performance, particularly on smaller datasets. I ran a few experiments with it but didn't have the compute to properly explore it. With scheduled sampling probability 1, the output is fed to the input every time step in training and testing so the above problem should go away. The down side is that training will be more difficult and so you may want to reduce the dropout value concurrently.

I wrote InferenceHelper since the TensorFlow seq2seq api is geared towards NLP and so only provided inference helpers which sample from an embedding -- they pick the most likely next discrete word. Here the decoder outputs a continuous function (the mel filters) and so we directly pass the previous output into the next input. That's all that the InferenceHelper class does in next_inputs_fn.

Each time step does have access to the whole sequence, but that logic is handled in the attention mechanism (AttentionWrapper).

I'll put this in the readme this week but together with the above, the best way to tell if your model is generalizing is by looking for monotonicity in the attention plots. You can see these in Tensorboard under the images tab.

jpdz · 2017-07-28T03:22:07Z

@barronalex Hi, I tried to change the scheduled sampling parameter to 0.5 and have trained for three days on one gpu, however, the results are not good. BTW, I am a little confused about the sampling parameter, what's the difference between 0.5 and 1. Thanks a lot.

barronalex · 2017-08-03T11:37:43Z

Which dataset are you training on? I just uploaded some weights trained on Nancy with r=2, scheduled sampling 0.5 which might be a good starting point.

With scheduled sampling 0.5, we use the ground truth at the next decoder input half the time, and the previous output half the time. With scheduled sampling 1, we always use the previous output and never the ground truth. This means you should get the same results for training and testing on the same input with scheduled sampling 1, but it will be harder to train the model.

jpdz · 2017-08-03T11:48:28Z

@barronalex I also use the Nancy dataset based on your previous code with scheduled sampling 0.5. It has been trained for two weeks, still doesn't converge. Did you get some nice results? Thanks a lot!

barronalex · 2017-08-03T14:21:42Z

So on the training set it still sounds poor and there's no alignment?

I ended up getting better results with r=2 rather than r=5 and so maybe try that or just pull the repo, restore my weights and continue training?

The alignment with the weights I posted is quite good but it could use more training to remove some of the noise.

The samples have been updated too so you can get a sense of their quality from that.

jpdz · 2017-08-03T15:38:06Z

@barronalex Thank you so much! I will have a look at it!

mrelich · 2017-08-03T15:59:22Z

Hi @barronalex, the audio clips do indeed sound much better. These are from inference or during treaining?

I look forward to getting back to this in a few more weeks after wrapping up some other projects. Thanks again for the extra work and for uploading your examples.

barronalex · 2017-08-03T21:33:56Z

No worries at all! Sorry it's been a while.

Those clips are from inference on unseen examples (mostly taken from Arctic and the paper examples). It sounds much better during training.

jpdz · 2017-08-04T03:33:26Z

@barronalex I listened to your updated results. it's good I think. I am now using your model and began to continue training it.
However, when I run the test.py based on your weights, it shows this problem:
Traceback (most recent call last):
File "test.py", line 89, in
test(model, config, prompts)
File "test.py", line 31, in test
model = model(config, batch_inputs, train=False)
File "/disk5/tacotron2/models/tacotron.py", line 190, in init
self.seq2seq_output, self.output = self.inference(inputs, train)
File "/disk5/tacotron2/models/tacotron.py", line 131, in inference
encoded = ops.CBHG(pre_out, speaker_embed, K=16, c=[128,128,128], gru_units=128)
File "/disk5/tacotron2/models/ops.py", line 60, in CBHG
) for k in range(1, K+1)]
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/layers/convolutional.py", line 376, in conv1d
return layer.apply(inputs)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 492, in apply
return self.call(inputs, *args, **kwargs)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 428, in call
self._assert_input_compatibility(inputs)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 540, in _assert_input_compatibility
str(x.get_shape().as_list()))
ValueError: Input 0 of layer conv1d_1 is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [None, 128]
Any suggestions? Thanks a lot~

barronalex · 2017-08-04T07:26:24Z

Which version of TensorFlow are you running?

jpdz · 2017-08-04T10:10:06Z

I have solved this problem since when I run the test.py, I merge the two commands into one command.
However, I still have a problem when I am going to continue training the model based on what you have trained. It seems that fails to load your retrained model and continue training it.
Caused by op u'save/Assign_13', defined at:
File "train.py", line 126, in
train(model, config)
File "train.py", line 47, in train
saver = tf.train.Saver(max_to_keep=3, keep_checkpoint_every_n_hours=3)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1139, in init
self.build()
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1170, in build
restore_sequentially=self._restore_sequentially)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 691, in build
restore_sequentially, reshape)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 419, in _AddRestoreOps
assign_ops.append(saveable.restore(tensors, shapes))
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 155, in restore
self.op.get_shape().is_fully_defined())
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/ops/state_ops.py", line 271, in assign
validate_shape=validate_shape)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/ops/gen_state_ops.py", line 45, in assign
use_locking=use_locking, name=name)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/disk5/vir/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1269, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [400] rhs shape= [160]
[[Node: save/Assign_13 = Assign[T=DT_FLOAT, _class=["loc:@decoder/decoder/attention_wrapper/output_projection_wrapper/bias"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/gpu:0"](decoder/decoder/attention_wrapper/output_projection_wrapper/bias/Adam_1, save/RestoreV2_13/_233)]]

Secondly, I can successfully retrain the model from global step 0, however, when it comes to save samples during training, a problem occurs:
saving weights
saving sample
Traceback (most recent call last):
File "train.py", line 126, in
train(model, config)
File "train.py", line 94, in train
ideal = audio.invert_spectrogram(inputs['stft'][0]*stft_std + stft_mean)
File "/disk5/tacotron2/audio.py", line 68, in invert_spectrogram
spec = reshape_frames(spec, forward=False)
File "/disk5/tacotron2/audio.py", line 30, in reshape_frames
signal = np.reshape(signal, (-1, int(signal.shape[1]/r)))
File "/disk5/vir/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 232, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/disk5/vir/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
ValueError: cannot reshape array of size 348500 into shape (2562)

Did I made some mistakes? Thanks a lot.

barronalex · 2017-08-04T12:43:57Z

It seems like you might have the saved spectrogram with r=5. It should work if you rerun 'preprocess.py nancy' with r=2 (which is now the default in audio.py) and then trying the training again.

It's not the best design currently that you have to rerun it so I'll try and fix that soon.

jpdz · 2017-08-04T14:51:03Z

@barronalex That's the problem, thanks a lot!

barronalex closed this as completed Aug 3, 2017

barronalex reopened this Aug 3, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different results output during training compared to test.py #15

different results output during training compared to test.py #15

mrelich commented Jul 11, 2017

mrelich commented Jul 11, 2017

onyedikilo commented Jul 11, 2017

mrelich commented Jul 11, 2017 •

edited

Loading

mrelich commented Jul 11, 2017

jpdz commented Jul 20, 2017

mrelich commented Jul 20, 2017

barronalex commented Jul 24, 2017

jpdz commented Jul 28, 2017

barronalex commented Aug 3, 2017

jpdz commented Aug 3, 2017

barronalex commented Aug 3, 2017

jpdz commented Aug 3, 2017

mrelich commented Aug 3, 2017

barronalex commented Aug 3, 2017

jpdz commented Aug 4, 2017

barronalex commented Aug 4, 2017

jpdz commented Aug 4, 2017

barronalex commented Aug 4, 2017

jpdz commented Aug 4, 2017

different results output during training compared to test.py #15

different results output during training compared to test.py #15

Comments

mrelich commented Jul 11, 2017

mrelich commented Jul 11, 2017

onyedikilo commented Jul 11, 2017

mrelich commented Jul 11, 2017 • edited Loading

mrelich commented Jul 11, 2017

jpdz commented Jul 20, 2017

mrelich commented Jul 20, 2017

barronalex commented Jul 24, 2017

jpdz commented Jul 28, 2017

barronalex commented Aug 3, 2017

jpdz commented Aug 3, 2017

barronalex commented Aug 3, 2017

jpdz commented Aug 3, 2017

mrelich commented Aug 3, 2017

barronalex commented Aug 3, 2017

jpdz commented Aug 4, 2017

barronalex commented Aug 4, 2017

jpdz commented Aug 4, 2017

barronalex commented Aug 4, 2017

jpdz commented Aug 4, 2017

mrelich commented Jul 11, 2017 •

edited

Loading