You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Should the residual connections be really handled inside the SequenceLabeler? Shouldn't the underlying encoder be in charge of that?
Right now, the labeler creates two matrices, enc_out_proj_M and enc_in_proj_M to project both encoder output states and input sequence to the output vocabulary (to later compute the distribution from the logits).
So, to get the logits, we compute (enc_out_proj_M * enc_out) + (enc_in_proj_M * enc_in).
If we handle the residual connections in the encoder (optional for the encoder) we can reduce the computation to a single matrix multiplication (enc_out + enc_in) * enc_out_proj_M. This would also simplify the code of the SequenceLabeler and allow us to use any TemporalStateful object as an input (instead of current list of RecurrentEncoder, SentenceEncoder).
This change should not change the gradient flow during training or am I missing something?
The text was updated successfully, but these errors were encountered:
To vskutečnosti není residual connection, protože je tam navíc projekce, residual connection by byl jenom součet. Dřív se tomu říkalo skip-connection, dneska se tomu občas dense connection podle DenseNetu. Už to mám v nějaký věvi vyrefaktorovaný pryč. Udělám PR, až se dodělá tf.dataset.
Should the residual connections be really handled inside the SequenceLabeler? Shouldn't the underlying encoder be in charge of that?
Right now, the labeler creates two matrices, enc_out_proj_M and enc_in_proj_M to project both encoder output states and input sequence to the output vocabulary (to later compute the distribution from the logits).
So, to get the logits, we compute
(enc_out_proj_M * enc_out) + (enc_in_proj_M * enc_in)
.If we handle the residual connections in the encoder (optional for the encoder) we can reduce the computation to a single matrix multiplication
(enc_out + enc_in) * enc_out_proj_M
. This would also simplify the code of the SequenceLabeler and allow us to use any TemporalStateful object as an input (instead of current list of RecurrentEncoder, SentenceEncoder).This change should not change the gradient flow during training or am I missing something?
The text was updated successfully, but these errors were encountered: