Move residual connections from SequenceLabeler to the encoders #789

varisd · 2019-02-05T14:48:02Z

Should the residual connections be really handled inside the SequenceLabeler? Shouldn't the underlying encoder be in charge of that?

Right now, the labeler creates two matrices, enc_out_proj_M and enc_in_proj_M to project both encoder output states and input sequence to the output vocabulary (to later compute the distribution from the logits).
So, to get the logits, we compute (enc_out_proj_M * enc_out) + (enc_in_proj_M * enc_in).

If we handle the residual connections in the encoder (optional for the encoder) we can reduce the computation to a single matrix multiplication (enc_out + enc_in) * enc_out_proj_M. This would also simplify the code of the SequenceLabeler and allow us to use any TemporalStateful object as an input (instead of current list of RecurrentEncoder, SentenceEncoder).

This change should not change the gradient flow during training or am I missing something?

The text was updated successfully, but these errors were encountered:

jlibovicky · 2019-02-05T16:27:00Z

To vskutečnosti není residual connection, protože je tam navíc projekce, residual connection by byl jenom součet. Dřív se tomu říkalo skip-connection, dneska se tomu občas dense connection podle DenseNetu. Už to mám v nějaký věvi vyrefaktorovaný pryč. Udělám PR, až se dodělá tf.dataset.

varisd added discussion refactor labels Feb 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move residual connections from SequenceLabeler to the encoders #789

Move residual connections from SequenceLabeler to the encoders #789

varisd commented Feb 5, 2019

jlibovicky commented Feb 5, 2019

Move residual connections from SequenceLabeler to the encoders #789

Move residual connections from SequenceLabeler to the encoders #789

Comments

varisd commented Feb 5, 2019

jlibovicky commented Feb 5, 2019