Generalize sequence labeler and allow re-use embeddings for labeling #798

jlibovicky · 2019-02-22T16:06:39Z

sequence labeler can now use multiple inputs
fixed bug with max length max_length must be undefined when using SequenceLabeler #790
labeler can reuse word embeddings for classification

jindrahelcl

oprav testy + kydíky

neuralmonkey/decoders/sequence_labeler.py

neuralmonkey/runners/label_runner.py

varisd

Vypada to v pohode, jen si nejsem jisty s tim tf.stop_gradients. Sam jsem si s tim ale nikdy nehral, takze nevim.

Jestli jste tohle pouzivali i s BERTem, tak by se hodil priklad.

neuralmonkey/decoders/sequence_labeler.py

varisd · 2019-02-26T15:04:40Z

neuralmonkey/decoders/sequence_labeler.py

+
+        reshaped_states = tf.reshape(states, [-1, embedding_dim])
+        reshaped_logits = tf.matmul(
+            reshaped_states, embeddings, transpose_b=True, name="logits")


Biasy necheme?

neděláme je ani jinde při tie_embeddings, viz autoregressive.py:225

Jo, nevsiml jsem si, ze EmbeddingsLabeler vzdycky vaze embeddingy.

varisd · 2019-02-26T15:08:40Z

neuralmonkey/decoders/sequence_labeler.py

+    def logits(self) -> tf.Tensor:
+        embeddings = self.embedded_sequence.embedding_matrix
+        if not self.train_embeddings:
+            embeddings = tf.stop_gradient(embeddings)


Fakt muzes tohle udelat? Vzhledem k tomu, ze se jedna o fakticky posledni vrstvu, nestane se to, ze ti pri backpropu neprotece zadna informace do zbytku site (a tedy se nic nenaucis)?

neproteče tam přes tuhle lokální proměnnou embeddings, ale proteče tam skrz states.

Jasne, uz to vidim

varisd · 2019-03-06T17:00:07Z

neuralmonkey/decoders/sequence_labeler.py

-            name="state_to_word_b",
-            shape=[len(self.vocabulary)],
-            initializer=tf.zeros_initializer())
+    def concatenated_inputs(self) -> tf.Tensor:


Kdyz budu mit vice enkoderu, co pracuji nad ruzne dlouhymi sekvencemi, tak to na tom concatu spadne, ne?
Nemela by se takova situace resit spise pres FactoredSequence/FactoredEncoder?

to je pravda

varisd

Stale padaj testy + nedostal jsem odpoved k otazce s BERTem (staci rict ne a nemusite ty priklady pridavat)

jindrahelcl · 2019-03-13T15:51:59Z

Poznámku o bertovi jsem přehlídnul, s bertem jsme to nepoužívali... Máme normální language model, na kterej tenhleten labeler lepíme a je to pak skoro to samý, jako rekurentní dekodér.

jindrahelcl · 2019-03-13T15:54:24Z

Tak snad opraveno..

varisd

Jeste jedna vec, na ktere jsem si predtim nevsim (train_xents)

varisd · 2019-03-13T16:08:01Z

neuralmonkey/decoders/sequence_labeler.py

        loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
            labels=self.train_targets, logits=self.logits)

        # loss is now of shape [batch, time]. Need to mask it now by
        # element-wise multiplication with weights placeholder
-        weighted_loss = loss * self.train_mask
-        return tf.reduce_sum(weighted_loss)
+        return loss * self.train_mask


Jo, jeste se mi tady nezda format train_xents.
Tady to vraci shape(batch, time), ale v autoregressivnich dekoderech to vraci shape(batch).

Bylo by teda fajn se dohodnout, co se bude vracet (bud tady vracet prumer-per-sequence; nebo v autoregressive by melo stacit vypnout average_across_timesteps v self.train_xents)

In the long run by samozrejme mel byt jeden spolecny predek "dekoder" s abstraktnima metodama, jako loss, xents apod., ale to bych klidne nechal do jineho PR.

V mariánovi to jde nastavovat ("ce-mean" vs "ce-mean-words"), s tím, že v nových modelech se používá loss zprůměrovanej ze všech slov v batchi, ale default má průměr po větách.

U labeleru dává víc smysl mít ten loss pro každej label zvlášť, kdežto u dekodéru asi spíš po větách, ale souhlasim, že se to má sjednotit. Je tim pádem asi lepší vracet (batch, time) kterej si pak zprůměruješ jak chceš.

Taky jsem za druhou moznost, to ale znamena tedka jeste vypnout switch v AutoregressiveDecoder. Nevim, jak to ovlivni zabehle trenovani (predpokladam, ze minimalne; ale nemeril jsem to). Pokud udelas porovnani, tak to klidne muzes zamergovat timhle zpusobem.

Na druhou stranu uz mam vyzkousene ze udelat prumer pres vety a pak pres batch v seq. labeleru funguje, takze bych v tomto PR radej udelal tohle (hodil issue na poradne doreseni).

Kazdopadne by bylo fajn to uz ted mit v masteru sjednocene, nez to nekam zapadne. Klicove je, ze to vyrazne snizi uroven odrbavani v jinych komponentach, kde pak musis mit divne workaroundy tipu kontroly shapu, supported decoder apod.

autoregressive dekodér má nějakej switch? Vidim jen, že to dělá reduce_mean na {train,runtime}_xents (místo aby dělal mean jen přes validní pozice, ale jinak je to stejný)

Mam na mysli toto:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/decoders/autoregressive.py#L291

seq2seq.sequence_loss ma prepinac average_across_timesteps, ktery je tedka True. Kdyby se vypnul, tak by to odpovidalo formatu train_xents v labeleru

už to vidim...

hotovo. musel se ještě změnit perplexity runner, kterej počítá s průměrama přes čas.

varisd

Moc se mi nelibi reseni v PerplexityRunner, ale vzhledem k tomu, ze PR #801 ho uplne vyradi, tak ok.

generalize labeler and allow re-use embeddings for labeling

2a4aea2

jlibovicky requested review from jindrahelcl and varisd February 22, 2019 16:06

jindrahelcl requested changes Feb 23, 2019

View reviewed changes

neuralmonkey/decoders/sequence_labeler.py Show resolved Hide resolved

neuralmonkey/decoders/sequence_labeler.py Show resolved Hide resolved

neuralmonkey/runners/label_runner.py Outdated Show resolved Hide resolved

varisd requested changes Feb 26, 2019

View reviewed changes

varisd reviewed Mar 6, 2019

View reviewed changes

Fixing tests and adressing reviews

76c6f11

jindrahelcl requested a review from varisd March 13, 2019 15:07

jindrahelcl approved these changes Mar 13, 2019

View reviewed changes

varisd requested changes Mar 13, 2019

View reviewed changes

Update pylint, fix bugs introduced while fixing bugs

86bca58

jindrahelcl requested a review from varisd March 13, 2019 15:54

varisd requested changes Mar 13, 2019

View reviewed changes

jindrahelcl added 3 commits March 14, 2019 14:59

refactoring xents in autoregressive decoder to (batch, time)

e51a98e

fix: runtime mask is bool, need to convert

1bdbfaf

transposing decoder train mask for perplexity runner

1a9d54a

jindrahelcl requested a review from varisd March 14, 2019 14:53

varisd approved these changes Mar 14, 2019

View reviewed changes

varisd mentioned this pull request Mar 14, 2019

added simplified BERT support #791

Closed

jindrahelcl merged commit 7007a9e into master Mar 14, 2019

jindrahelcl deleted the labeler branch March 14, 2019 20:18

varisd mentioned this pull request Mar 15, 2019

added mask option for evaluators; added xent runner; added PerplexityEvaluator #801

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize sequence labeler and allow re-use embeddings for labeling #798

Generalize sequence labeler and allow re-use embeddings for labeling #798

jlibovicky commented Feb 22, 2019

jindrahelcl left a comment

varisd left a comment

varisd Feb 26, 2019

jindrahelcl Mar 13, 2019

varisd Mar 13, 2019

varisd Feb 26, 2019

jindrahelcl Mar 13, 2019

varisd Mar 13, 2019

varisd Mar 6, 2019

jindrahelcl Mar 13, 2019

varisd left a comment

jindrahelcl commented Mar 13, 2019

jindrahelcl commented Mar 13, 2019

varisd left a comment

varisd Mar 13, 2019

jindrahelcl Mar 13, 2019

varisd Mar 13, 2019

jindrahelcl Mar 14, 2019

varisd Mar 14, 2019

jindrahelcl Mar 14, 2019

jindrahelcl Mar 14, 2019

varisd left a comment

Generalize sequence labeler and allow re-use embeddings for labeling #798

Generalize sequence labeler and allow re-use embeddings for labeling #798

Conversation

jlibovicky commented Feb 22, 2019

jindrahelcl left a comment

Choose a reason for hiding this comment

varisd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd left a comment

Choose a reason for hiding this comment

jindrahelcl commented Mar 13, 2019

jindrahelcl commented Mar 13, 2019

varisd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

varisd left a comment

Choose a reason for hiding this comment