-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic Weight Consolidation + regularization refactor #742
base: master
Are you sure you want to change the base?
Conversation
neuralmonkey/learning_utils.py
Outdated
@@ -423,7 +423,7 @@ def _check_savable_dict(data): | |||
return False | |||
|
|||
supported_type = Union[ | |||
List[Dict[str, np.ndarray]], | |||
List[Dict[str, Union[np.ndarray, np.float32]]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Je to určitě np.float32
nemá to být normální float
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Je to kvuli gradient_runneru, ktery vraci dict gradientu a ty jsou vystupem volani session.run, takze myslim, ze np.float32 je na miste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@varisd má pravdu. session.run()
vrací numpyovský floaty, který nejsou subclassy od pythonovskejch floatů, takže by to padalo.
image_summaries=None) | ||
|
||
|
||
class GradientRunner(BaseRunner[SupportedDecoder]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dokumentace: co to dělá, k čeu je tpo potenciálně dobrý.
gradient_dict[name] = val | ||
|
||
self.result = ExecutionResult( | ||
outputs=[gradient_dict], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Musí to být v tom listu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jo, jinak ti to bude vracet, jen nazvy tech promennych a ne dicty. Muze za to tento radek:
https://github.com/ufal/neuralmonkey/blob/master/neuralmonkey/runners/base_runner.py#L77
@@ -25,10 +26,9 @@ class CrossEntropyTrainer(GenericTrainer): | |||
def __init__(self, | |||
decoders: List[Any], | |||
decoder_weights: List[ObjectiveWeight] = None, | |||
l1_weight: float = 0., | |||
l2_weight: float = 0., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Já bych tady tu L2 a L1 nechal jako syntactic sugar a v konftruktoru udělal regularizátora, co zkonstruuje ty regularizátoři a přidá je to listu.
class EWCRegularizer(BaseRegularizer): | ||
"""Regularizer based on the Elastic Weight Consolidation. | ||
|
||
TODO description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO
from neuralmonkey.logging import log | ||
|
||
|
||
class BaseRegularizer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pojmenova bych to jenom Regularizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Akorát bez toho překlepu :-D
Note, that currently this branch removes the default l1, l2 logging into summaries |
Oh, I didn't even notice. I'd like to keep the L2 plot in TensorBoard. |
The L2 plot is back. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Přestože jsi změnil milion testů, funkcionalita, kterou tady introducuješ (jmenovitě gradient runner a EWC regularizer) zůstává netestovaná.
Refaktor na regularizery mi přijde užitečnej, ale ještě bych se zamyslel, jestli váha regularizeru je opravdu součástí regularizeru. Kdyby to tak nebylo, odpadly by trampoty s defaultníma vahama, kterejm se věnuju v tom review.
regularizers = [] | ||
if l1_weight > 0.: | ||
if L1Regularizer in [type(r) for r in regularizers]: | ||
warn("You specified both trainer l1_weight " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
já bych tady newarnoval, ale rovnou chcípal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... a jak na to koukám, tak je to docela humus, protože na ten seznam regularizes
neimposuješ žádný constrainty, takže tam klidně dva L1 regularizery bejt můžou a stěžovat si to nebude.
... a co když někdo podědí od L1 regularizeru (třeba aby mu dal předdefiovanou váhu)? to to projde bez varování, protože type(r)
nebude L1Regularizer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proto tam je warn (kdyby si nahodou nevsim, ze to definuje dvema zpusoby)... nechci uzivateli svazovat ruce, at kldine do seznamu hodi L1Regularizeru, kolik chce
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
házet víc stejnějch regularizerů je blbost a mělo by to spadnout protože je to téměř jistě něco, co nechceš. Stačí sečíst ty weights a bude ti stačit jen jeden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klidne muzu to kontrolu vyhodit uplne, co ja vim
|
||
if l2_weight > 0.: | ||
if L2Regularizer in [type(r) for r in regularizers]: | ||
warn("You specified both trainer l2_weight " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
|
||
from neuralmonkey.model.model_part import ModelPart | ||
from neuralmonkey.runners.base_runner import ( | ||
Executable, ExecutionResult, NextExecute) | ||
from neuralmonkey.trainers.regularizers import ( | ||
Regularizer, L2Regularizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to se nevejde na jednu řádku?
collections=["summary_train"]) | ||
self.losses = [o.loss for o in objectives] + reg_values | ||
|
||
# we always want to include l2 values in the summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L1 taky!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L2 staci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nechápu. ty grafy nejsou stejný a každá diagnostická informace dobrá, nebo ne?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
myslim, ze docela koreluji + tys je nekdy pouzival? (vetsinou ti staci l2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to že dvě veličiny korelujou neznamená, že je zajímavá jen jedna. už jsme viděli i grafy kde L1 rostla a L2 klesala.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hod mi use case, kdy ti v minulosti analyza L1 pomohla
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoď mi důkaz, že mi v budoucnosti nemůže pomoct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, pridam to tam
|
||
# we always want to include l2 values in the summary | ||
if L2Regularizer not in [type(r) for r in self.regularizers]: | ||
reg_values.append(L2Regularizer().value(regularizable)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nešlo by to udělat nějakejma statickejma metodama? tady takhle konstruovat něco, co se konstruuje z konfigu mi přijde ohavný
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
navíc, co je tohle vůbec za humus? :-) Přidáváš tady do reg_values
, který pak zipuješ s něčím, kam jsi nic nepřidal, takže se to tam afaik stejně neukáže. reg_values
se nesmí měnit (taky kvůli sémantice - obsahuje to hodnoty regularizátorů, ale já L2 nepoužívám jako regularizátor, takže by jeho hodnota neměla bejt v tomhle seznamu)
udělal bych tady zvláštní if a v něm rovnou volal tf.summary.scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ten L2Regularizer by se mel pridat i do self.regularizers.
reg_values se potom, co se pridaji do summaries, dale uz nikde nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jenže to neni regularizer. Takže až k tomu za měsíc někdo přijde a bude chtít něco dělat s regularizerama, tak se tenhle sémantickej bug projeví.
Jde mi jen o to, aby v proměnnejch, co se nějak jmenujou, byly věci, který tomu odpovídaj.
A neadresoval jsi mojí poznámku o tom, že by se neměl volat konstruktor, ale že by to mělo bejt buď funkcí nebo statickou metodou.
|
||
def __init__(self, | ||
name: str = "train_l1", | ||
weight: float = 1.0e-8) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
už jsem to napsal níž - vyházet neopodstatněný defaultní hodnoty u name i u weight
pokud se nějaká hodnota doporučuje (což u L1 ani L2 neni ten případ), napsal bych to do docstringu.
|
||
log("Loading gradient estimates from {}".format(gradients_file)) | ||
self.gradients = np.load(gradients_file) | ||
log("Gradient estimates loaded") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tečky za větama v lozích (4x) :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, ale tohle neplati pro celou codebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No jo no, ale když to vidim tak vidim že by tam asi měla bejt...
self.gradients = np.load(gradients_file) | ||
log("Gradient estimates loaded") | ||
|
||
def value(self, variables: List[tf.Tensor]) -> float: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tahle funkce chce trochu komentářů nebo docstring co se tu děje
if (var_name in self.gradients.files | ||
and self.init_vars.has_tensor(var_name)): | ||
init_var = self.init_vars.get_tensor(var_name) | ||
gradient = tf.constant( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proč to tady obaluješ tou konstantou? tf.square
neakceptuje čísla? a co víc, nemůžeš si tu druhou mocninu předpočítat mimo tensorflow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ad constanta: chci ty gradienty mit v grafu pojmenovane
ad predpocitani: muzu, dobry napad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jestli to je kvůli jménu, tak je to asi takhle dobrý, protože kdyby sis předpočítával tu druhou mocninu vedle, tak bys pak ten gradient
v grafu vůbec neměl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
momentalne to resim... je dobre to mit pojmenovane, square se predpocita o krok drive v numpy... jeste jsem to nepushnul
return ewc_value | ||
|
||
|
||
L1 = L1Regularizer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tady bych tu defalutní váhu asi dal, ale napsal bych to do modulovýho dokstringu. Ale i tak mam strach že dostupnost defaultní hodnoty pro tyhle regularizery dá userům pocit, že defaultní hodnota je to, co maj použít, ale to vůbec neni pravda, tadyta 1e-8 je založená na nějaký jiný defaultní hodnotě, pro kterou nemáme žádný vysvětlení.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tak tyhlety zkratky vyhodim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jsem pro
Právě kvůli ty sémantice bych to nedělal.. regularizers jsou regularizers a
new regularizers and then some..
Dne pá 3. 8. 2018 21:48 uživatel Dušan Variš <[email protected]>
napsal:
… ***@***.**** commented on this pull request.
------------------------------
In neuralmonkey/trainers/generic_trainer.py
<#742 (comment)>:
>
# unweighted losses for fetching
- self.losses = [o.loss for o in objectives] + [l1_value, l2_value]
- tf.summary.scalar("train_l1", l1_value,
- collections=["summary_train"])
- tf.summary.scalar("train_l2", l2_value,
- collections=["summary_train"])
+ self.losses = [o.loss for o in objectives] + reg_values
+
+ # we always want to include l2 values in the summary
+ if L2Regularizer not in [type(r) for r in self.regularizers]:
+ reg_values.append(L2Regularizer().value(regularizable))
Ten L2Regularizer by se mel pridat i do self.regularizers.
reg_values se potom, co se pridaji do summaries, dale uz nikde
nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#742 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABwcs4sS68v6cTEfJRcyTu-TgJa6xpDFks5uNKkLgaJpZM4VkjNe>
.
|
S/new/ne/
Dne pá 3. 8. 2018 21:54 uživatel Jindra Helcl <[email protected]>
napsal:
… Právě kvůli ty sémantice bych to nedělal.. regularizers jsou regularizers
a new regularizers and then some..
Dne pá 3. 8. 2018 21:48 uživatel Dušan Variš ***@***.***>
napsal:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In neuralmonkey/trainers/generic_trainer.py
> <#742 (comment)>:
>
> >
> # unweighted losses for fetching
> - self.losses = [o.loss for o in objectives] + [l1_value, l2_value]
> - tf.summary.scalar("train_l1", l1_value,
> - collections=["summary_train"])
> - tf.summary.scalar("train_l2", l2_value,
> - collections=["summary_train"])
> + self.losses = [o.loss for o in objectives] + reg_values
> +
> + # we always want to include l2 values in the summary
> + if L2Regularizer not in [type(r) for r in self.regularizers]:
> + reg_values.append(L2Regularizer().value(regularizable))
>
> Ten L2Regularizer by se mel pridat i do self.regularizers.
>
> reg_values se potom, co se pridaji do summaries, dale uz nikde
> nepouzivaji, takze neni problem tam pridat tenhle "default" regularizer
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#742 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABwcs4sS68v6cTEfJRcyTu-TgJa6xpDFks5uNKkLgaJpZM4VkjNe>
> .
>
|
@@ -7,8 +7,7 @@ | |||
from neuralmonkey.model.model_part import ModelPart | |||
from neuralmonkey.runners.base_runner import ( | |||
Executable, ExecutionResult, NextExecute) | |||
from neuralmonkey.trainers.regularizers import ( | |||
Regularizer, L2Regularizer) | |||
from neuralmonkey.trainers.regularizers import (Regularizer, L2Regularizer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bez závorek
@@ -40,6 +39,7 @@ class Objective(NamedTuple( | |||
|
|||
|
|||
# pylint: disable=too-few-public-methods,too-many-locals,too-many-branches | |||
# pylint: disable=too-many-statements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many statements znamená, že by se to mělo rozsekat na funkce. nejde to nějak?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
souhlasim, nechtelo se mi v tom vcera vrtat... kazdopadne to jeste v tomhle PR zkusim zapracovat
collections=["summary_train"]) | ||
self.losses = [o.loss for o in objectives] + reg_values | ||
|
||
# we always want to include l2 values in the summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hoď mi důkaz, že mi v budoucnosti nemůže pomoct
class Regularizer: | ||
"""Base class for the regularizers.""" | ||
class Regularizer(metaclass=ABCMeta): | ||
"""Base clas s for regularizers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ulítnul ti tady tabulátor
"""Base clas s for regularizers. | ||
|
||
Regularizer objects are used to introduce additional loss terms to | ||
the trainerthus constraining the model variable during training. These |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/trainerthus/trainer, thus/
@@ -84,15 +95,18 @@ class EWCRegularizer(Regularizer): | |||
|
|||
Implements Elastic Weight Consolidation from the "Overcoming catastrophic | |||
forgetting in neural networks" paper. | |||
The regularizer applies separate regularization weight to each trainable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a separate
@@ -84,15 +95,18 @@ class EWCRegularizer(Regularizer): | |||
|
|||
Implements Elastic Weight Consolidation from the "Overcoming catastrophic | |||
forgetting in neural networks" paper. | |||
The regularizer applies separate regularization weight to each trainable | |||
variable based on how important the variable was for the previously |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on its importance for the previously learned task
|
||
log("Loading initial variables for EWC from {}".format(variables_file)) | ||
log("Loading initial variables for EWC from " | ||
"{}.".format(variables_file)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
konkatenaci stringů nedělej a druhej řádek (pokud se to fakt nevejde na jednu) začni .format
.
ewc_value = tf.constant(0.0) | ||
for var in variables: | ||
var_name = var.name.split(":")[0] | ||
var_name = var.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uvědomuješ si, že přístup přes tečku nic nestojí a že tohle je vlastně kopírování jedný lokální proměnný do druhý, jejíž jméno se liší jedním znakem?
name="{}_init_value".format(init_var_name)) | ||
grad_squared = tf.constant( | ||
np.square(self.gradients[var_name]), | ||
name="{}_ewc_weight".format(init_var_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Teď si ale právě nepojmenováváš v grafu gradienty, ale jejich druhý mocniny. To jsi mohl docílit už tím, že bys napsal tf.square(gradient, name="kráva")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nevim, takhle ale uz do grafu pridavam predpociane druhe mocniny... v opacnem pripade (s tf.square) by se ten vypocet provadel pokazde, kdyz bys potreboval ten vysledek
ve vysledku tam asi takovy rozdil v rychlosti nebude, ale prijde mi to rozumnejsi si konstanty predpocitat, nez je fouknes do grafu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chápu správně, že GradientRunner
se používá s natrénovaným modelem, na kterej na jednu nějakou batch aplikuju trainer, kterej vrátí nějaký gradienty?
Btw. pořád to neni testovaný v žádným testu. Navrhuju, aby jeden test seběhnul ten runner, kterej to někde uloží, a druhej test pustil ten EWC regularizer. Anebo to jen fikaně přidat do testu, kterej má ukládání už pořešený. Zkrátka aby se použil jak ewc regularizer, tak gradient runner.
Jinak se mi líbí jak už to hezky konverguje k mergovatelnosti. :-)
tests/nematus.ini
Outdated
@@ -87,7 +87,7 @@ rnn_cell="NematusGRU" | |||
; This block just fills the arguments of the trainer __init__ method. | |||
class=trainers.cross_entropy_trainer.CrossEntropyTrainer | |||
decoders=[<decoder>] | |||
l2_weight=1.0e-8 | |||
regularizers=[trainers.regularizers.L2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
klidně bych to z těch testů vyhodil a v jednom jich otestoval víc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jo, muzeme to povyhazovat
Jo, testy jsem jeste neresil. Ten gradient runner pouzivam, abych si vyprintil gradienty z celeho datasetu (vetsinou validacniho, protoze neni tak velky). Z toho si pak bokem udelam prumer gradientu pro kazdou vahu modelu; ty pak muzu pouzit v EWC. |
Tak kdyby se ti podařilo tuhle pipelinu nasimulovat v |
ping, uz jsem tuhle vetev rebasoval minimalne trikrat... pokud ji nemate v planu mergovat, dejte mi vedet |
Mno já jsem myslel, že se to zamerguje, až bude hotovej refaktoříček s tf datasetem (stejně jako hromada commitů, který si syslim ve štelářku a nedělám z toho PR). |
NOTE: currently is not working with DelayedUpdateTrainer due to a problem with trainer variable restoration.