Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize sequence labeler and allow re-use embeddings for labeling #798

Merged
merged 6 commits into from
Mar 14, 2019

Conversation

jlibovicky
Copy link
Contributor

Copy link
Member

@jindrahelcl jindrahelcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oprav testy + kydíky

neuralmonkey/decoders/sequence_labeler.py Show resolved Hide resolved
neuralmonkey/decoders/sequence_labeler.py Show resolved Hide resolved
neuralmonkey/runners/label_runner.py Outdated Show resolved Hide resolved
Copy link
Member

@varisd varisd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vypada to v pohode, jen si nejsem jisty s tim tf.stop_gradients. Sam jsem si s tim ale nikdy nehral, takze nevim.

Jestli jste tohle pouzivali i s BERTem, tak by se hodil priklad.

neuralmonkey/decoders/sequence_labeler.py Show resolved Hide resolved

reshaped_states = tf.reshape(states, [-1, embedding_dim])
reshaped_logits = tf.matmul(
reshaped_states, embeddings, transpose_b=True, name="logits")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biasy necheme?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neděláme je ani jinde při tie_embeddings, viz autoregressive.py:225

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jo, nevsiml jsem si, ze EmbeddingsLabeler vzdycky vaze embeddingy.

def logits(self) -> tf.Tensor:
embeddings = self.embedded_sequence.embedding_matrix
if not self.train_embeddings:
embeddings = tf.stop_gradient(embeddings)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fakt muzes tohle udelat? Vzhledem k tomu, ze se jedna o fakticky posledni vrstvu, nestane se to, ze ti pri backpropu neprotece zadna informace do zbytku site (a tedy se nic nenaucis)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neproteče tam přes tuhle lokální proměnnou embeddings, ale proteče tam skrz states.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jasne, uz to vidim

name="state_to_word_b",
shape=[len(self.vocabulary)],
initializer=tf.zeros_initializer())
def concatenated_inputs(self) -> tf.Tensor:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kdyz budu mit vice enkoderu, co pracuji nad ruzne dlouhymi sekvencemi, tak to na tom concatu spadne, ne?
Nemela by se takova situace resit spise pres FactoredSequence/FactoredEncoder?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to je pravda

@jindrahelcl jindrahelcl requested a review from varisd March 13, 2019 15:07
Copy link
Member

@varisd varisd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale padaj testy + nedostal jsem odpoved k otazce s BERTem (staci rict ne a nemusite ty priklady pridavat)

@jindrahelcl
Copy link
Member

Poznámku o bertovi jsem přehlídnul, s bertem jsme to nepoužívali... Máme normální language model, na kterej tenhleten labeler lepíme a je to pak skoro to samý, jako rekurentní dekodér.

@jindrahelcl jindrahelcl requested a review from varisd March 13, 2019 15:54
@jindrahelcl
Copy link
Member

Tak snad opraveno..

Copy link
Member

@varisd varisd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jeste jedna vec, na ktere jsem si predtim nevsim (train_xents)

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
labels=self.train_targets, logits=self.logits)

# loss is now of shape [batch, time]. Need to mask it now by
# element-wise multiplication with weights placeholder
weighted_loss = loss * self.train_mask
return tf.reduce_sum(weighted_loss)
return loss * self.train_mask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jo, jeste se mi tady nezda format train_xents.
Tady to vraci shape(batch, time), ale v autoregressivnich dekoderech to vraci shape(batch).

Bylo by teda fajn se dohodnout, co se bude vracet (bud tady vracet prumer-per-sequence; nebo v autoregressive by melo stacit vypnout average_across_timesteps v self.train_xents)

In the long run by samozrejme mel byt jeden spolecny predek "dekoder" s abstraktnima metodama, jako loss, xents apod., ale to bych klidne nechal do jineho PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

V mariánovi to jde nastavovat ("ce-mean" vs "ce-mean-words"), s tím, že v nových modelech se používá loss zprůměrovanej ze všech slov v batchi, ale default má průměr po větách.

U labeleru dává víc smysl mít ten loss pro každej label zvlášť, kdežto u dekodéru asi spíš po větách, ale souhlasim, že se to má sjednotit. Je tim pádem asi lepší vracet (batch, time) kterej si pak zprůměruješ jak chceš.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taky jsem za druhou moznost, to ale znamena tedka jeste vypnout switch v AutoregressiveDecoder. Nevim, jak to ovlivni zabehle trenovani (predpokladam, ze minimalne; ale nemeril jsem to). Pokud udelas porovnani, tak to klidne muzes zamergovat timhle zpusobem.

Na druhou stranu uz mam vyzkousene ze udelat prumer pres vety a pak pres batch v seq. labeleru funguje, takze bych v tomto PR radej udelal tohle (hodil issue na poradne doreseni).

Kazdopadne by bylo fajn to uz ted mit v masteru sjednocene, nez to nekam zapadne. Klicove je, ze to vyrazne snizi uroven odrbavani v jinych komponentach, kde pak musis mit divne workaroundy tipu kontroly shapu, supported decoder apod.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoregressive dekodér má nějakej switch? Vidim jen, že to dělá reduce_mean na {train,runtime}_xents (místo aby dělal mean jen přes validní pozice, ale jinak je to stejný)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mam na mysli toto:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/decoders/autoregressive.py#L291

seq2seq.sequence_loss ma prepinac average_across_timesteps, ktery je tedka True. Kdyby se vypnul, tak by to odpovidalo formatu train_xents v labeleru

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

už to vidim...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hotovo. musel se ještě změnit perplexity runner, kterej počítá s průměrama přes čas.

@jindrahelcl jindrahelcl requested a review from varisd March 14, 2019 14:53
Copy link
Member

@varisd varisd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moc se mi nelibi reseni v PerplexityRunner, ale vzhledem k tomu, ze PR #801 ho uplne vyradi, tak ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants