From e317e8ef896fdcf3406b791630192e6e8cc82398 Mon Sep 17 00:00:00 2001 From: Tom Close Date: Sun, 29 Dec 2024 22:27:43 +1100 Subject: [PATCH] finished getting-started (apart from debugging) tutorial --- .../source/tutorial/getting-started.ipynb | 111 ++++++++++++++++-- 1 file changed, 102 insertions(+), 9 deletions(-) diff --git a/new-docs/source/tutorial/getting-started.ipynb b/new-docs/source/tutorial/getting-started.ipynb index 1e91d0960..4ae22d0ed 100644 --- a/new-docs/source/tutorial/getting-started.ipynb +++ b/new-docs/source/tutorial/getting-started.ipynb @@ -6,15 +6,18 @@ "source": [ "# Getting started\n", "\n", - "## Running your first task\n", - "\n", "The basic runnable component of Pydra is a *task*. Tasks are conceptually similar to\n", - "functions, in that they take inputs, process them and then return results. However,\n", + "functions, in that they take inputs, operate on them and then return results. However,\n", "unlike functions, tasks are parameterised before they are executed in a separate step.\n", "This enables parameterised tasks to be linked together into workflows that are checked for\n", "errors before they are executed, and modular execution workers and environments to specified\n", "independently of the task being performed.\n", "\n", + "Tasks can encapsulate Python functions, shell-commands or workflows constructed from\n", + "task components.\n", + "\n", + "## Running your first task\n", + "\n", "Pre-defined task definitions are installed under the `pydra.tasks.*` namespace by separate\n", "task packages (e.g. `pydra-fsl`, `pydra-ants`, ...). Pre-define task definitions are run by\n", "\n", @@ -22,12 +25,12 @@ "* instantiate the class with the parameters of the task\n", "* \"call\" resulting object to execute it as you would a function (i.e. with the `my_task(...)`)\n", "\n", - "To demonstrate with a toy example, of loading a JSON file with the `pydra.tasks.common.LoadJson` task, this we first create an example JSON file" + "To demonstrate with a toy example of loading a JSON file with the `pydra.tasks.common.LoadJson` task, we first create an example JSON file to test with" ] }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 1, "metadata": {}, "outputs": [], "source": [ @@ -35,7 +38,7 @@ "from tempfile import mkdtemp\n", "import json\n", "\n", - "JSON_CONTENTS = {'a': True, 'b': 'two', 'c': 3, 'd': [7, 0.5598136790149003, 6]}\n", + "JSON_CONTENTS = {'a': True, 'b': 'two', 'c': 3, 'd': [7, 0.55, 6]}\n", "\n", "test_dir = Path(mkdtemp())\n", "json_file = test_dir / \"test.json\"\n", @@ -70,6 +73,19 @@ "assert result.output.out == JSON_CONTENTS" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The `Result` object contains\n", + "\n", + "* `output`: the outputs of the task (if there is only one output it is called `out` by default)\n", + "* `runtime`: information about the peak memory and CPU usage\n", + "* `errored`: the error status of the task\n", + "* `task`: the task object that generated the results\n", + "* `output_dir`: the output directory the results are stored in" + ] + }, { "cell_type": "markdown", "metadata": {}, @@ -143,14 +159,14 @@ "# as the number of NIfTI files\n", "VOXEL_SIZES = [0.5, 0.5, 0.5, 0.75, 0.75, 0.75, 1.0, 1.0, 1.0, 1.25]\n", "\n", - "mrgrid_varying_sizes = MrGrid().split(\n", + "mrgrid_varying_vox_sizes = MrGrid().split(\n", " (\"input\", \"voxel\"),\n", " input=nifti_dir.iterdir(),\n", " voxel=VOXEL_SIZES\n", ")\n", "\n", "# Run the task to resample all NIfTI files with different voxel sizes\n", - "result = mrgrid()" + "result = mrgrid_varying_vox_sizes(cache_dir=test_dir / \"cache\")" ] }, { @@ -159,7 +175,84 @@ "source": [ "## Cache directories\n", "\n", - "When a task runs, a hash is generated by the combination of all the inputs to the task and the task to be run." + "When a task runs, a unique hash is generated by the combination of all the inputs to the\n", + "task and the operation to be performed. This hash is used to name the output directory for\n", + "the task within the specified cache directory. Therefore, if you use the same cache\n", + "directory between runs and in a subsequent run the same task is executed with the same\n", + "inputs then the location of its output directory will also be the same, and the outputs\n", + "generated by the previous run are reused." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "mrgrid_varying_vox_sizes2 = MrGrid().split(\n", + " (\"input\", \"voxel\"),\n", + " input=nifti_dir.iterdir(),\n", + " voxel=VOXEL_SIZES\n", + ")\n", + "\n", + "# Result from previous run is reused as the task and inputs are identical\n", + "result1 = mrgrid_varying_vox_sizes2(cache_dir=test_dir / \"cache\")\n", + "\n", + "# Check that the output directory is the same for both runs\n", + "assert result1.output_dir == result.output_dir\n", + "\n", + "# Change the voxel sizes to resample the NIfTI files to for one of the files\n", + "mrgrid_varying_vox_sizes2.inputs.voxel[2] = [0.25]\n", + "\n", + "# Result from previous run is reused as the task and inputs are identical\n", + "result2 = mrgrid_varying_vox_sizes2(cache_dir=test_dir / \"cache\")\n", + "\n", + "# The output directory will be different as the inputs are now different\n", + "assert result2.output_dir != result.output_dir" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Note that for file objects, the contents of the files are used to calculate the hash\n", + "not their paths. Therefore, when inputting large files there might be some additional\n", + "overhead on the first run (the file hashes themselves are cached by path and mtime so\n", + "shouldn't need to be recalculated unless they are modified). However, this makes the\n", + "hashes invariant to file-system movement. For example, changing the name of one of the\n", + "files in the nifti directory won't invalidate the hash." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Rename a NIfTI file within the test directory\n", + "first_file = next(nifti_dir.iterdir())\n", + "first_file.rename(first_file.with_name(\"first.nii.gz\"))\n", + "\n", + "mrgrid_varying_vox_sizes3 = MrGrid().split(\n", + " (\"input\", \"voxel\"),\n", + " input=nifti_dir.iterdir(),\n", + " voxel=VOXEL_SIZES\n", + ")\n", + "\n", + "# Result from previous run is reused as the task and inputs are identical\n", + "result3 = mrgrid_varying_vox_sizes2(cache_dir=test_dir / \"cache\")\n", + "\n", + "# Check that the output directory is the same for both runs\n", + "assert result3.output_dir == result.output_dir" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Debugging\n", + "\n", + "Work in progress..." ] }, {