From fc19a019c2a5c8d69e88463a6c21b9cf38a749d0 Mon Sep 17 00:00:00 2001 From: Annabelle Kelechi Nwankwo <33545211+Kanegraffiti@users.noreply.github.com> Date: Fri, 9 Aug 2024 17:11:29 +0100 Subject: [PATCH] Update checkpoints.rst --- docs/userguide/checkpoints.rst | 106 ++++++++++++++++----------------- 1 file changed, 50 insertions(+), 56 deletions(-) diff --git a/docs/userguide/checkpoints.rst b/docs/userguide/checkpoints.rst index 8867107b7a..31597cafbf 100644 --- a/docs/userguide/checkpoints.rst +++ b/docs/userguide/checkpoints.rst @@ -3,8 +3,8 @@ Memoization and checkpointing ----------------------------- -When an app is invoked several times with the same parameters, Parsl can -reuse the result from the first invocation without executing the app again. +When an :ref:`App ` is invoked several times with the same parameters, Parsl can +reuse the result from the first invocation without executing the :ref:`App ` again. This can save time and computational resources. @@ -12,7 +12,7 @@ This is done in two ways: * Firstly, *app caching* will allow reuse of results within the same run. -* Building on top of that, *checkpointing* will store results on the filesystem +* Building on top of that, *:ref:`Checkpointing `* will store results on the filesystem and reuse those results in later runs. .. _label-appcaching: @@ -20,15 +20,14 @@ This is done in two ways: App caching =========== - There are many situations in which a program may be re-executed over time. Often, large fragments of the program will not have changed -and therefore, re-execution of apps will waste valuable time and -computation resources. Parsl's app caching solves this problem by -storing results from apps that have successfully completed +and therefore, re-execution of :ref:`Apps ` will waste valuable time and +computation resources. Parsl's :ref:`App caching ` solves this problem by +storing results from :ref:`Apps ` that have successfully completed so that they can be re-used. -App caching is enabled by setting the ``cache`` +:ref:`App caching ` is enabled by setting the ``cache`` argument in the :func:`~parsl.app.app.python_app` or :func:`~parsl.app.app.bash_app` decorator to ``True`` (by default it is ``False``). @@ -38,33 +37,31 @@ decorator to ``True`` (by default it is ``False``). def hello (msg, stdout=None): return 'echo {}'.format(msg) -App caching can be globally disabled by setting ``app_cache=False`` +:ref:`App caching ` can be globally disabled by setting ``app_cache=False`` in the :class:`~parsl.config.Config`. -App caching can be particularly useful when developing interactive programs such as when -using a Jupyter notebook. In this case, cells containing apps are often re-executed -during development. Using app caching will ensure that only modified apps are re-executed. - +:ref:`App caching ` can be particularly useful when developing interactive programs such as when +using a Jupyter notebook. In this case, cells containing :ref:`Apps ` are often re-executed +during development. Using :ref:`App caching ` will ensure that only modified :ref:`Apps ` are re-executed. App equivalence ^^^^^^^^^^^^^^^ -Parsl determines app equivalence using the name of the app function: -if two apps have the same name, then they are equivalent under this +Parsl determines :ref:`App ` equivalence using the name of the :ref:`App ` function: +if two :ref:`Apps ` have the same name, then they are equivalent under this relation. -Changes inside the app, or by functions called by an app will not invalidate +Changes inside the :ref:`App `, or by functions called by an :ref:`App ` will not invalidate cached values. There are lots of other ways functions might be compared for equivalence, and `parsl.dataflow.memoization.id_for_memo` provides a hook to plug in alternate application-specific implementations. - Invocation equivalence ^^^^^^^^^^^^^^^^^^^^^^ -Two app invocations are determined to be equivalent if their +Two :ref:`App ` invocations are determined to be equivalent if their input arguments are identical. In simple cases, this follows obvious rules: @@ -79,9 +76,8 @@ In simple cases, this follows obvious rules: y = 7 f(y).result() - Internally, equivalence is determined by hashing the input arguments, and -comparing the hash to hashes from previous app executions. +comparing the hash to hashes from previous :ref:`App ` executions. This approach can only be applied to data types for which a deterministic hash can be computed. @@ -90,7 +86,7 @@ By default Parsl can compute sensible hashes for basic data types: str, int, float, None, as well as more some complex types: functions, and dictionaries and lists containing hashable types. -Attempting to cache apps invoked with other, non-hashable, data types will +Attempting to cache :ref:`Apps ` invoked with other, non-hashable, data types will lead to an exception at invocation. In that case, mechanisms to hash new types can be registered by a program by @@ -101,10 +97,10 @@ Ignoring arguments ^^^^^^^^^^^^^^^^^^ On occasion one may wish to ignore particular arguments when determining -app invocation equivalence - for example, when generating log file +:ref:`App ` invocation equivalence - for example, when generating log file names automatically based on time or run information. Parsl allows developers to list the arguments to be ignored -in the ``ignore_for_cache`` app decorator parameter: +in the ``ignore_for_cache`` :ref:`App ` decorator parameter: .. code-block:: python @@ -112,23 +108,22 @@ in the ``ignore_for_cache`` app decorator parameter: def hello (msg, stdout=None): return 'echo {}'.format(msg) - Caveats ^^^^^^^ -It is important to consider several important issues when using app caching: +It is important to consider several important issues when using :ref:`App caching `: -- Determinism: App caching is generally useful only when the apps are deterministic. - If the outputs may be different for identical inputs, app caching will obscure - this non-deterministic behavior. For instance, caching an app that returns +- Determinism: :ref:`App caching ` is generally useful only when the :ref:`Apps ` are deterministic. + If the outputs may be different for identical inputs, :ref:`App caching ` will obscure + this non-deterministic behavior. For instance, caching an :ref:`App ` that returns a random number will result in every invocation returning the same result. -- Timing: If several identical calls to an app are made concurrently having - not yet cached a result, many instances of the app will be launched. +- Timing: If several identical calls to an :ref:`App ` are made concurrently having + not yet cached a result, many instances of the :ref:`App ` will be launched. Once one invocation completes and the result is cached all subsequent calls will return immediately with the cached result. -- Performance: If app caching is enabled, there may be some performance +- Performance: If :ref:`App caching ` is enabled, there may be some performance overhead especially if a large number of short duration tasks are launched rapidly. This overhead has not been quantified. @@ -139,27 +134,27 @@ Checkpointing Large-scale Parsl programs are likely to encounter errors due to node failures, application or environment errors, and myriad other issues. Parsl offers an -application-level checkpointing model to improve resilience, fault tolerance, and +application-level :ref:`Checkpointing ` model to improve resilience, fault tolerance, and efficiency. .. note:: - Checkpointing builds on top of app caching, and so app caching must be - enabled. If app caching is disabled in the config ``Config.app_cache``, checkpointing will + :ref:`Checkpointing ` builds on top of :ref:`App caching `, and so :ref:`App caching ` must be + enabled. If :ref:`App caching ` is disabled in the config ``Config.app_cache``, :ref:`Checkpointing ` will not work. -Parsl follows an incremental checkpointing model, where each checkpoint file contains +Parsl follows an incremental :ref:`Checkpointing ` model, where each checkpoint file contains all results that have been updated since the last checkpoint. When a Parsl program loads a checkpoint file and is executed, it will use -checkpointed results for any apps that have been previously executed. -Like app caching, checkpoints -use the hash of the app and the invocation input parameters to identify previously computed -results. If multiple checkpoints exist for an app (with the same hash) +checkpointed results for any :ref:`Apps ` that have been previously executed. +Like :ref:`App caching `, checkpoints +use the hash of the :ref:`App ` and the invocation input parameters to identify previously computed +results. If multiple checkpoints exist for an :ref:`App ` (with the same hash) the most recent entry will be used. -Parsl provides four checkpointing modes: +Parsl provides four :ref:`Checkpointing ` modes: -1. ``task_exit``: a checkpoint is created each time an app completes or fails +1. ``task_exit``: a checkpoint is created each time an :ref:`App ` completes or fails (after retries if enabled). This mode minimizes the risk of losing information from completed tasks. @@ -189,7 +184,7 @@ Parsl provides four checkpointing modes: from parsl.configs.local_threads import config config.checkpoint_mode = 'dfk_exit' -4. ``manual``: in addition to these automated checkpointing modes, it is also possible +4. ``manual``: in addition to these automated :ref:`Checkpointing ` modes, it is also possible to manually initiate a checkpoint by calling ``DataFlowKernel.checkpoint()`` in the Parsl program code. @@ -204,16 +199,15 @@ Parsl provides four checkpointing modes: In all cases the checkpoint file is written out to the ``runinfo/RUN_ID/checkpoint/`` directory. .. Note:: Checkpoint modes ``periodic``, ``dfk_exit``, and ``manual`` can interfere with garbage collection. - In these modes task information will be retained after completion, until checkpointing events are triggered. - + In these modes task information will be retained after completion, until :ref:`Checkpointing ` events are triggered. Creating a checkpoint ^^^^^^^^^^^^^^^^^^^^^ -Automated checkpointing must be explicitly enabled in the Parsl configuration. -There is no need to modify a Parsl program as checkpointing will occur transparently. -In the following example, checkpointing is enabled at task exit. The results of -each invocation of the ``slow_double`` app will be stored in the checkpoint file. +Automated :ref:`Checkpointing ` must be explicitly enabled in the Parsl configuration. +There is no need to modify a Parsl program as :ref:`Checkpointing ` will occur transparently. +In the following example, :ref:`Checkpointing ` is enabled at task exit. The results of +each invocation of the ``slow_double`` :ref:`App ` will be stored in the checkpoint file. .. code-block:: python @@ -237,10 +231,10 @@ each invocation of the ``slow_double`` app will be stored in the checkpoint file print([d[i].result() for i in range(5)]) -Alternatively, manual checkpointing can be used to explictly specify when the checkpoint -file should be saved. The following example shows how manual checkpointing can be used. +Alternatively, manual :ref:`Checkpointing ` can be used to explicitly specify when the checkpoint +file should be saved. The following example shows how manual :ref:`Checkpointing ` can be used. Here, the ``dfk.checkpoint()`` function will save the results of the prior invocations -of the ``slow_double`` app. +of the ``slow_double`` :ref:`App `. .. code-block:: python @@ -271,14 +265,14 @@ of the ``slow_double`` app. Resuming from a checkpoint ^^^^^^^^^^^^^^^^^^^^^^^^^^ -When resuming a program from a checkpoint Parsl allows the user to select +When resuming a program from a checkpoint, Parsl allows the user to select which checkpoint file(s) to use. Checkpoint files are stored in the ``runinfo/RUNID/checkpoint`` directory. The example below shows how to resume using all available checkpoints. -Here, the program re-executes the same calls to the ``slow_double`` app +Here, the program re-executes the same calls to the ``slow_double`` :ref:`App ` as above and instead of waiting for results to be computed, the values -from the checkpoint file are are immediately returned. +from the checkpoint file are immediately returned. .. code-block:: python @@ -292,8 +286,8 @@ from the checkpoint file are are immediately returned. # Rerun the same workflow d = [] - for i in range(5): + for i in range 5: d.append(slow_double(i)) - # wait for results + # Wait for results print([d[i].result() for i in range(5)])