[TOC]
Monitors allow user instrumentation of the training process.
Monitors are useful to track training, report progress, request early stopping and more. Monitors use the observer pattern and notify at the following points:
- when training begins
- before a training step
- after a training step
- when training ends
Monitors are not intended to be reusable.
There are a few pre-defined monitors:
CaptureVariable
: saves a variable's valuesGraphDump
: intended for debug only - saves all tensor valuesPrintTensor
: outputs one or more tensor values to logSummarySaver
: saves summaries to a summary writerValidationMonitor
: runs model validation, by periodically calculating eval metrics on a separate data set; supports optional early stopping
For more specific needs, you can create custom monitors by extending one of the following classes:
BaseMonitor
: the base class for all monitorsEveryN
: triggers a callback every N training steps
Example:
class ExampleMonitor(monitors.BaseMonitor):
def __init__(self):
print 'Init'
def begin(self, max_steps):
print 'Starting run. Will train until step %d.' % max_steps
def end(self):
print 'Completed run.'
def step_begin(self, step):
print 'About to run step %d...' % step
return ['loss_1:0']
def step_end(self, step, outputs):
print 'Done running step %d. The value of "loss" tensor: %s' % (
step, outputs['loss_1:0'])
linear_regressor = LinearRegressor()
example_monitor = ExampleMonitor()
linear_regressor.fit(
x, y, steps=2, batch_size=1, monitors=[example_monitor])
tf.contrib.learn.monitors.get_default_monitors(loss_op=None, summary_op=None, save_summary_steps=100, output_dir=None, summary_writer=None)
{#get_default_monitors}
Returns a default set of typically-used monitors.
loss_op
:Tensor
, the loss tensor. This will be printed usingPrintTensor
at the default interval.summary_op
: SeeSummarySaver
.save_summary_steps
: SeeSummarySaver
.output_dir
: SeeSummarySaver
.summary_writer
: SeeSummarySaver
.
list
of monitors.
Base class for Monitors.
Defines basic interfaces of Monitors. Monitors can either be run on all workers or, more commonly, restricted to run exclusively on the elected chief worker.
DEPRECATED FUNCTION
THIS FUNCTION IS DEPRECATED. It will be removed after 2016-12-05. Instructions for updating: Monitors are deprecated. Please use tf.train.SessionRunHook.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Callback at the end of training/evaluation.
session
: Atf.Session
object that can be used to run ops.
ValueError
: if we've not begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
Callback after the step is finished.
Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.
step
:int
, global step of the model.session
:Session
object.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Callback before training step begins.
You may use this callback to request evaluation of additional tensors in the graph.
step
:int
, the current value of the global step.
List of Tensor
objects or string tensor names to be run.
ValueError
: if we've already begun a step, orstep
< 0, orstep
>max_steps
.
Callback after training step finished.
This callback provides access to the tensors/ops evaluated at this step,
including the additional tensors for which evaluation was requested in
step_begin
.
In addition, the callback has the opportunity to stop training by returning
True
. This is useful for early stopping, for example.
Note that this method is not called if the call to Session.run()
that
followed the last call to step_begin()
failed.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
. True if training should stop.
ValueError
: if we've not begun a step, orstep
number does not match.
Captures a variable's values into a collection.
This monitor is useful for unit testing. You should exercise caution when using this monitor in production, since it never discards values.
This is an EveryN
monitor and has consistent semantic for every_n
and first_n
.
tf.contrib.learn.monitors.CaptureVariable.__init__(var_name, every_n=100, first_n=1)
{#CaptureVariable.init}
Initializes a CaptureVariable monitor.
var_name
:string
. The variable name, including suffix (typically ":0").every_n
:int
, print every N steps. SeePrintN.
first_n
:int
, also print the first N steps. SeePrintN.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.CaptureVariable.every_n_post_step(step, session)
{#CaptureVariable.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.CaptureVariable.every_n_step_begin(step)
{#CaptureVariable.every_n_step_begin}
tf.contrib.learn.monitors.CaptureVariable.every_n_step_end(step, outputs)
{#CaptureVariable.every_n_step_end}
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Returns the values captured so far.
dict
mapping int
step numbers to that values of the variable at the
respective step.
Saves checkpoints every N steps or N seconds.
tf.contrib.learn.monitors.CheckpointSaver.__init__(checkpoint_dir, save_secs=None, save_steps=None, saver=None, checkpoint_basename='model.ckpt', scaffold=None)
{#CheckpointSaver.init}
Initialize CheckpointSaver monitor.
checkpoint_dir
:str
, base directory for the checkpoint files.save_secs
:int
, save every N secs.save_steps
:int
, save every N steps.saver
:Saver
object, used for saving.checkpoint_basename
:str
, base name for the checkpoint files.scaffold
:Scaffold
, use to get saver object.
ValueError
: If bothsave_steps
andsave_secs
are notNone
.ValueError
: If bothsave_steps
andsave_secs
areNone
.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Callback after training step finished.
This callback provides access to the tensors/ops evaluated at this step,
including the additional tensors for which evaluation was requested in
step_begin
.
In addition, the callback has the opportunity to stop training by returning
True
. This is useful for early stopping, for example.
Note that this method is not called if the call to Session.run()
that
followed the last call to step_begin()
failed.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
. True if training should stop.
ValueError
: if we've not begun a step, orstep
number does not match.
Base class for monitors that execute callbacks every N steps.
This class adds three new callbacks:
- every_n_step_begin
- every_n_step_end
- every_n_post_step
The callbacks are executed every n steps, or optionally every step for the first m steps, where m and n can both be user-specified.
When extending this class, note that if you wish to use any of the
BaseMonitor
callbacks, you must call their respective super implementation:
def step_begin(self, step): super(ExampleMonitor, self).step_begin(step) return []
Failing to call the super implementation will cause unpredictable behavior.
The every_n_post_step()
callback is also called after the last step if it
was not already called through the regular conditions. Note that
every_n_step_begin()
and every_n_step_end()
do not receive that special
treatment.
Initializes an EveryN
monitor.
every_n_steps
:int
, the number of steps to allow between callbacks.first_n_steps
:int
, specifying the number of initial steps during which the callbacks will always be executed, regardless of the value ofevery_n_steps
. Note that this value is relative to the global step
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
Callback before every n'th step begins.
step
:int
, the current value of the global step.
A list
of tensors that will be evaluated at this step.
Callback after every n'th step finished.
This callback provides access to the tensors/ops evaluated at this step,
including the additional tensors for which evaluation was requested in
step_begin
.
In addition, the callback has the opportunity to stop training by returning
True
. This is useful for early stopping, for example.
step
:int
, the current value of the global step.outputs
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
. True if training should stop.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Monitor that exports Estimator every N steps.
Initializes ExportMonitor. (deprecated arguments)
SOME ARGUMENTS ARE DEPRECATED. They will be removed after 2016-09-23. Instructions for updating: The signature of the input_fn accepted by export is changing to be consistent with what's used by tf.Learn Estimator's train/evaluate. input_fn (and in most cases, input_feature_key) will both become required args.
every_n_steps
: Run monitor every N steps.export_dir
: str, folder to export.input_fn
: A function that takes no argument and returns a tuple of (features, labels), where features is a dict of string key toTensor
and labels is aTensor
that's currently not used (and so can beNone
).input_feature_key
: String key into the features dict returned byinput_fn
that corresponds to the rawExample
stringsTensor
that the exported model will take as input. Should beNone
if and only if you're passing in asignature_fn
that does not use the first arg (Tensor
ofExample
strings).exports_to_keep
: int, number of exports to keep.signature_fn
: Function that returns a default signature and a named signature map, givenTensor
ofExample
strings,dict
ofTensor
s for features anddict
ofTensor
s for predictions.default_batch_size
: Default batch size of theExample
placeholder.
ValueError
: Ifinput_fn
andinput_feature_key
are not both defined or are not bothNone
.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.ExportMonitor.every_n_post_step(step, session)
{#ExportMonitor.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.ExportMonitor.every_n_step_begin(step)
{#ExportMonitor.every_n_step_begin}
Callback before every n'th step begins.
step
:int
, the current value of the global step.
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.ExportMonitor.every_n_step_end(step, outputs)
{#ExportMonitor.every_n_step_end}
Returns the directory containing the last completed export.
The string path to the exported directory. NB: this functionality was added on 2016/09/25; clients that depend on the return value may need to handle the case where this function returns None because the estimator being fitted does not yet return a value during export.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Dumps almost all tensors in the graph at every step.
Note, this is very expensive, prefer PrintTensor
in production.
Initializes GraphDump monitor.
ignore_ops
:list
ofstring
. Names of ops to ignore. If None,GraphDump.IGNORE_OPS
is used.
Compares two GraphDump
monitors and returns differences.
other_dump
: AnotherGraphDump
monitor.step
:int
, step to compare on.atol
:float
, absolute tolerance in comparison of floating arrays.
Returns tuple:
matched
:list
of keys that matched.non_matched
:dict
of keys to tuple of 2 mismatched values.
ValueError
: if a key indata
is missing fromother_dump
atstep
.
Callback at the end of training/evaluation.
session
: Atf.Session
object that can be used to run ops.
ValueError
: if we've not begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
Callback after the step is finished.
Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.
step
:int
, global step of the model.session
:Session
object.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Writes trainable variable values into log every N steps.
Write the tensors in trainable variables every_n
steps,
starting with the first_n
th step.
tf.contrib.learn.monitors.LoggingTrainable.__init__(scope=None, every_n=100, first_n=1)
{#LoggingTrainable.init}
Initializes LoggingTrainable monitor.
scope
: An optional string to match variable names using re.match.every_n
: Print every N steps.first_n
: Print first N steps.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.LoggingTrainable.every_n_post_step(step, session)
{#LoggingTrainable.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.LoggingTrainable.every_n_step_begin(step)
{#LoggingTrainable.every_n_step_begin}
tf.contrib.learn.monitors.LoggingTrainable.every_n_step_end(step, outputs)
{#LoggingTrainable.every_n_step_end}
tf.contrib.learn.monitors.LoggingTrainable.run_on_all_workers
{#LoggingTrainable.run_on_all_workers}
tf.contrib.learn.monitors.LoggingTrainable.set_estimator(estimator)
{#LoggingTrainable.set_estimator}
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
NaN Loss monitor.
Monitors loss and stops training if loss is NaN. Can either fail with exception or just stop training.
tf.contrib.learn.monitors.NanLoss.__init__(loss_tensor, every_n_steps=100, fail_on_nan_loss=True)
{#NanLoss.init}
Initializes NanLoss monitor.
loss_tensor
:Tensor
, the loss tensor.every_n_steps
:int
, run check every this many steps.fail_on_nan_loss
:bool
, whether to raise exception when loss is NaN.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Prints given tensors every N steps.
This is an EveryN
monitor and has consistent semantic for every_n
and first_n
.
The tensors will be printed to the log, with INFO
severity.
tf.contrib.learn.monitors.PrintTensor.__init__(tensor_names, every_n=100, first_n=1)
{#PrintTensor.init}
Initializes a PrintTensor monitor.
tensor_names
:dict
of tag to tensor names oriterable
of tensor names (strings).every_n
:int
, print every N steps. SeePrintN.
first_n
:int
, also print the first N steps. SeePrintN.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.PrintTensor.every_n_post_step(step, session)
{#PrintTensor.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.PrintTensor.every_n_step_end(step, outputs)
{#PrintTensor.every_n_step_end}
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Steps per second monitor.
tf.contrib.learn.monitors.StepCounter.__init__(every_n_steps=100, output_dir=None, summary_writer=None)
{#StepCounter.init}
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.StepCounter.every_n_post_step(step, session)
{#StepCounter.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
Callback before every n'th step begins.
step
:int
, the current value of the global step.
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.StepCounter.every_n_step_end(current_step, outputs)
{#StepCounter.every_n_step_end}
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Monitor to request stop at a specified step.
Create a StopAtStep monitor.
This monitor requests stop after either a number of steps have been executed or a last step has been reached. Only of the two options can be specified.
if num_steps
is specified, it indicates the number of steps to execute
after begin()
is called. If instead last_step
is specified, it
indicates the last step we want to execute, as passed to the step_begin()
call.
num_steps
: Number of steps to execute.last_step
: Step after which to stop.
ValueError
: If one of the arguments is invalid.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Callback at the end of training/evaluation.
session
: Atf.Session
object that can be used to run ops.
ValueError
: if we've not begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
Callback after the step is finished.
Called after step_end and receives session to perform extra session.run calls. If failure occurred in the process, will be called as well.
step
:int
, global step of the model.session
:Session
object.
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Saves summaries every N steps.
tf.contrib.learn.monitors.SummarySaver.__init__(summary_op, save_steps=100, output_dir=None, summary_writer=None, scaffold=None)
{#SummarySaver.init}
Initializes a SummarySaver
monitor.
summary_op
:Tensor
of typestring
. A serializedSummary
protocol buffer, as output by TF summary methods likesummary.scalar
orsummary.merge_all
.save_steps
:int
, save summaries every N steps. SeeEveryN
.output_dir
:string
, the directory to save the summaries to. Only used if nosummary_writer
is supplied.summary_writer
:SummaryWriter
. IfNone
and anoutput_dir
was passed, one will be created accordingly.scaffold
:Scaffold
to get summary_op if it's not provided.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.SummarySaver.every_n_post_step(step, session)
{#SummarySaver.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.SummarySaver.every_n_step_end(step, outputs)
{#SummarySaver.every_n_step_end}
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Runs evaluation of a given estimator, at most every N steps.
Note that the evaluation is done based on the saved checkpoint, which will usually be older than the current step.
Can do early stopping on validation metrics if early_stopping_rounds
is
provided.
tf.contrib.learn.monitors.ValidationMonitor.__init__(x=None, y=None, input_fn=None, batch_size=None, eval_steps=None, every_n_steps=100, metrics=None, early_stopping_rounds=None, early_stopping_metric='loss', early_stopping_metric_minimize=True, name=None)
{#ValidationMonitor.init}
Initializes a ValidationMonitor.
x
: SeeBaseEstimator.evaluate
.y
: SeeBaseEstimator.evaluate
.input_fn
: SeeBaseEstimator.evaluate
.batch_size
: SeeBaseEstimator.evaluate
.eval_steps
: SeeBaseEstimator.evaluate
.every_n_steps
: Check for new checkpoints to evaluate every N steps. If a new checkpoint is found, it is evaluated. SeeEveryN
.metrics
: SeeBaseEstimator.evaluate
.early_stopping_rounds
:int
. If the metric indicated byearly_stopping_metric
does not change according toearly_stopping_metric_minimize
for this many steps, then training will be stopped.early_stopping_metric
:string
, name of the metric to check for early stopping.early_stopping_metric_minimize
:bool
, True ifearly_stopping_metric
is expected to decrease (thus early stopping occurs when this metric stops decreasing), False ifearly_stopping_metric
is expected to increase. Typically,early_stopping_metric_minimize
is True for loss metrics like mean squared error, and False for performance metrics like accuracy.name
: SeeBaseEstimator.evaluate
.
ValueError
: If both x and input_fn are provided.
Called at the beginning of training.
When called, the default graph is the one we are executing.
max_steps
:int
, the maximum global step this training will run until.
ValueError
: if we've already begun a run.
Returns the step at which the best early stopping metric was found.
Returns the best early stopping metric value found so far.
Returns True if this monitor caused an early stop.
Begin epoch.
epoch
:int
, the epoch number.
ValueError
: if we've already begun an epoch, orepoch
< 0.
End epoch.
epoch
:int
, the epoch number.
ValueError
: if we've not begun an epoch, orepoch
number does not match.
tf.contrib.learn.monitors.ValidationMonitor.every_n_post_step(step, session)
{#ValidationMonitor.every_n_post_step}
Callback after a step is finished or end()
is called.
step
:int
, the current value of the global step.session
:Session
object.
tf.contrib.learn.monitors.ValidationMonitor.every_n_step_begin(step)
{#ValidationMonitor.every_n_step_begin}
Callback before every n'th step begins.
step
:int
, the current value of the global step.
A list
of tensors that will be evaluated at this step.
tf.contrib.learn.monitors.ValidationMonitor.every_n_step_end(step, outputs)
{#ValidationMonitor.every_n_step_end}
tf.contrib.learn.monitors.ValidationMonitor.run_on_all_workers
{#ValidationMonitor.run_on_all_workers}
tf.contrib.learn.monitors.ValidationMonitor.set_estimator(estimator)
{#ValidationMonitor.set_estimator}
A setter called automatically by the target estimator.
If the estimator is locked, this method does nothing.
estimator
: the estimator that this monitor monitors.
ValueError
: if the estimator is None.
Overrides BaseMonitor.step_begin
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.
A list
, the result of every_n_step_begin, if that was called this step,
or an empty list otherwise.
ValueError
: if called more than once during a step.
Overrides BaseMonitor.step_end
.
When overriding this method, you must call the super implementation.
step
:int
, the current value of the global step.output
:dict
mappingstring
values representing tensor names to the value resulted from running these tensors. Values may be either scalars, for scalar tensors, or Numpyarray
, for non-scalar tensors.
bool
, the result of every_n_step_end, if that was called this step,
or False
otherwise.
Wraps monitors into a SessionRunHook.
tf.contrib.learn.monitors.RunHookAdapterForMonitors.__init__(monitors)
{#RunHookAdapterForMonitors.init}
tf.contrib.learn.monitors.RunHookAdapterForMonitors.after_create_session(session, coord)
{#RunHookAdapterForMonitors.after_create_session}
Called when new TensorFlow session is created.
This is called to signal the hooks that a new session has been created. This
has two essential differences with the situation in which begin
is called:
- When this is called, the graph is finalized and ops can no longer be added to the graph.
- This method will also be called as a result of recovering a wrapped session, not only at the beginning of the overall session.
session
: A TensorFlow Session that has been created.coord
: A Coordinator object which keeps track of all threads.
tf.contrib.learn.monitors.RunHookAdapterForMonitors.after_run(run_context, run_values)
{#RunHookAdapterForMonitors.after_run}
tf.contrib.learn.monitors.RunHookAdapterForMonitors.before_run(run_context)
{#RunHookAdapterForMonitors.before_run}
Cache for file writers.
This class caches file writers, one per directory.
Clear cached summary writers. Currently only used for unit tests.
Returns the FileWriter for the specified directory.
logdir
: str, name of the directory.
A FileWriter
.
tf.contrib.learn.monitors.replace_monitors_with_hooks(monitors_or_hooks, estimator)
{#replace_monitors_with_hooks}
Wraps monitors with a hook.
Monitor
is deprecated in favor of SessionRunHook
. If you're using a
monitor, you can wrap it with a hook using function. It is recommended to
implement hook version of your monitor.
monitors_or_hooks
: Alist
may contain both monitors and hooks.estimator
: AnEstimator
that monitor will be used with.
Returns a list of hooks. If there is any monitor in the given list, it is replaced by a hook.