diff --git a/modules/unix-linux-introduction/_posts/2024-02-02-task-two.md b/modules/unix-linux-introduction/_posts/2024-02-02-task-two.md
index e39d7b2c2..4d6d60471 100644
--- a/modules/unix-linux-introduction/_posts/2024-02-02-task-two.md
+++ b/modules/unix-linux-introduction/_posts/2024-02-02-task-two.md
@@ -38,6 +38,10 @@ Shell variables can be specified using braces e.g. `echo ${MYVAR}`. These braces
variable name starts and ends so that you can do things like `echo ${MYVAR}_word`. Without the braces `echo $MYVAR_word` would
be interpreted as a reference to a variable called `MYVAR_word`.
+The variables that you set are only available in the shell session where you set them unless you use the `export` command. If
+you say `export MYVAR` then `MYVAR` is available to any commands that you run in your shell. For more on `export`,
+read [this guide](https://www.digitalocean.com/community/tutorials/export-command-linux) from DigitalOcean.
+
##### Advanced shell variable expansion
The [GNU Bash manual](https://www.gnu.org/software/bash/manual/html_node/Shell-Parameter-Expansion.html) has a section on shell
@@ -56,11 +60,13 @@ echo ${MYVAR#*/}
# this will print patrice
echo ${MYVAR##*/}
-# print the part of the variable's value that comes after the last pattern match
+# NOTE: the pattern now changes from /* to */ for the next two examples
+
+# print the part of the variable's value that comes before the last pattern match
# this will print /home/user
echo ${MYVAR%/*}
-# print the part of the variable's value that comes after the first pattern match
+# print the part of the variable's value that comes before the first pattern match
# this will print a blank line
echo ${MYVAR%%/*}
```
@@ -165,15 +171,162 @@ three
```
The Bash shell `$()` syntax provides an effective way to create lists to loop over. E.g. `$(cat myfile.txt)` creates a list of all the
-lines in the file and runs the loop once for each line (and for each word in a line if the lines have spaces in them).
+lines in the file and runs the loop once for each line (and for each word in a line if the lines have spaces in them). This is
+called _command substitution_ and what it does is to replace the `$()` with the output (`stdout`) of the command inside the brackets.
+
+For example, using an example from Task 1, `$(tail -n+2 cases.csv |cut -d, -f3)` would give a list of all the dates in the `cases.csv` file. When
+this is used with a `for` loop, all whitespace (spaces, tabs and newlines) acts to separate _items_ and then loop runs once for each item in the
+list. For a more formal description of what command substitution is doing, look in the [GNU Bash manual](https://www.gnu.org/software/bash/manual/html_node/Command-Substitution.html).
+
+
+#### Shell conditionals (If statements)
+
+One final component of Bash shell scripts is `if statements`. The basic structure of a conditional (i.e. `if statement`) is:
+
+```bash
+if SOMETEST
+then
+ # do something
+fi
+```
+
+For an introductory tutorial see [here](https://ryanstutorials.net/bash-scripting-tutorial/bash-if-statements.php) and for a more complete
+reference consult the [GNU Bash manual](https://www.gnu.org/software/bash/manual/html_node/Conditional-Constructs.html).
+
+An important point about the `SOMETEST` part listed above is that it is actually run as a command. Most commonly one uses the test command,
+which is expressed using square brackets (`[[ ]]`) but any command can be used. Any command that returns an exit status of 0 counts as True,
+and any other exit status counts as False.
+
+The square bracket tests (`[[ ]]`) allow [conditional expressions](https://www.gnu.org/software/bash/manual/html_node/Bash-Conditional-Expressions.html)
+and combinations of these conditional expressions. Some of the most common test are `-f` to test if a file exists and `-d` to test if a path is a
+directory, `=` to test if two strings are equal and `-eq` to test if two numbers are equal. There are many others, please refer to the linked tutorials.
+
+Expressions can be combined with logical operators: `&&` for AND, `||` for OR, `!` for NOT (changing True to False or False to True) and brackets `( )`
+for grouping.
+
+#### A practical shell script
+
+Here is an example shell script that fetches files using a file like the `file_list1.csv` mentioned previously:
+
+{% highlight bash linenos %}
+#!/bin/bash
+
+set -u
+set -e
+
+LISTFILE=$1
+
+if [[ ! -f $LISTFILE ]] ; then
+ echo "Incorrect filename, $LISTFILE not found" >&2
+ exit 1
+fi
+
+if [[ ! -d tracking ]] ; then
+ mkdir tracking
+fi
+
+for URL in $(cut -d, -f2 $LISTFILE); do
+ # links look like
+ # https://pathogen-genomics-march-2024.sanbi.ac.za/data/shell/reads/SRR8364252_1.fastq.gz
+ OUTPUT=${URL##*/}
+ if [[ ! -f tracking/$OUTPUT ]] ; then
+ wget -c -O $OUTPUT $URL
+ # curl -L -O $OUTPUT $URL
+ touch tracking/$OUTPUT
+ fi
+done
+{% endhighlight %}
+
+1. Line 1 has the `#!/bin/bash`, the so-called `shebang line`, that tells the system how to run this script
+
+2. Line 3 (`set -u`) tells the Bash shell that referring to uninitialised values should trigger an error. This will force the user to provide
+a command line parameter (i.e. the $1 variable)
+
+3. Line 4 (`set -e`) tells the Bash shell to exit if there is any error. This means that if e.g. a download fails and `wget` exits with an error,
+the script won't carry on running.
+
+4. Line 6 sets the `LISTFILE` variable to the value of `$1` and lines 8 through 11 check if the file exists. If the file does not exist
+(the `! -f` means "file does not exist") then a message is printed to `stderr` (that's what the `>&2` redirect does) and the program exits
+with an error (`exit 1`).
+
+5. Lines 13 through check for the existing of a directory (a subdirectory of the directory in which this script is run) called `tracking` and
+create it if it doesn't exist. This directory will be used to keep track fo which file has been downloaded.
+
+6. Line 17 sets up the loop. `cut -d, -f2` is used to extract the URL part from the LISTFILE and the list of URLs (links) is returned using a `$()`
+operator.
+
+7. Line 20 uses a pattern match to remove everything before the last match of the shell (i.e. glob) pattern `*/`. This leaves only the filename part of
+the URL. The OUTPUT variable is set to the destination filename.
+
+8. Line 21 tests to see if there is a file in the `tracking` directory with the `OUTPUT` filename. If no such file exists, the download will proceed
+
+9. Line 22 downloads the file, storing it in the `OUTPUT` filename. The `-c` flag of `wget` will continue a download if it finds a partially downloaded
+file.
+
+10. Line 23 is commented out but shows how to do a download using `curl`. The `-L` flag to `curl` ensures that if the server tells `curl` that the file has
+move, `curl` will download from the new link. Note that this `curl` command line will not resume partial downloads. The `-C -` option of `curl` does
+support resuming downloads but it has some technical differences from the way `wget` does resume of partial downloads that make it a bit trickier to use.
+
+11. Line 24 uses `touch` to make an empty file in the `tracking` directory with the same name as the `OUTPUT` filename.
+
+12. Lines 25 and 26 end the `if` statement and the `for` loop.
+
+In summary, this script downloads files as specified in `file_list1.csv` and uses a tracking system to ensure that already-downloaded files are not downloaded
+a second time. If it turns out to be necessary to re-download a file, the corresponding tracking file can be deleted from the `tracking` directory.
+
+
+#### Examining the downloaded files
+
+The files are gzip-compressed FASTQ data files from a DNA sequencer. The presumption is that each file is the output of a sequencing run. Let us examine
+the downloaded files. One simple test is to check the sizes of the files. Are there any files with unusual sizes?
+
+
+Show answer
+Sort the listing of file sizes with `ls -l *.fastq.gz |sort -k5nr`. The `-k5` specificies that `sort` must use the 5th column for sorting and the `n` specifies that
+this must be a numeric sort and finally the `r` specifies that the sort order must be reversed, with the results in descending, not ascending, order.
+
+Looking at the files sorted this way shows that `SRR8364257_1.fastq.gz` and `SRR8364257_2.fastq.gz` are dramatically smaller than the other files. This could be
+a signal that there is something wrong. Perhaps there were problems with the sequencing run or the sample being sequenced?
+
+
+##### Checksums and using them
+
+[Checksums](https://en.wikipedia.org/wiki/Checksum) are calculations used to verify that a file has not been altered or has been downloaded correctly. The
+checksums for the downloaded files are available at [this link](https://pathogen-genomics-march-2024.sanbi.ac.za/data/shell/reads/checksums.txt) and can
+be downloaded using e.g. `wget https://pathogen-genomics-march-2024.sanbi.ac.za/data/shell/reads/checksums.txt`. The checksums in this file are
+[SHA256](https://en.wikipedia.org/wiki/SHA-2) checksums generated using the `sha256sum` tool. Another common checksum type is [MD5](https://en.wikipedia.org/wiki/MD5)
+which can be generated using the `md5sum` tool.
+
+The `checksums.txt` file contains lines like:
+
+```
+d4d7dac720ae285317f8d8adba2ba9077a9e5e4f6099bc9aee3f9900c77a518a SRR8364252_1.fastq.gz
+38827e72486e2b1025270c619ea110ead5d80901a1f7f7a7ecc5a5f3f604b69b SRR8364252_2.fastq.gz
+```
+
+where the first column is the checksum and the second is the filename that the checksum applies to. With this file in hand and the files already downloaded,
+we can check if the files match their checksums:
+
+```
+sha256sum -c checksums.txt
+```
+
+What does this show? What might be going on?
+
+
+Show answer
+For most of the files we get the response that the file is OK and matches it's checksum. For sample `SRR8364261_2.fastq.gz` we get the message `FAILED`.
+
+Our download script did not show any errors, and we can verify this by manually re-downloading the file using `wget` or `curl` and checking the checksums
+again. This shows that either there is a problem with the checksum provided or the with the file on the remote server. This is a problem that you would raise
+with the people who are providing the downloadable files and checksums.
+
+
+##### Resources
+[Introduction to Bash Shell Scripting](https://www.linode.com/docs/guides/intro-bash-shell-scripting/)
+[Advanced Bash Scripting Guide](https://tldp.org/LDP/abs/html/)
+[Introduction to Shell scripting](https://sambitsinha.hashnode.dev/linux-shell-scripting-basics-an-introduction-to-shell-scripting-in-linux) and [Intermediate Shell Scripting](https://sambitsinha.hashnode.dev/intermediate-shell-scripting-in-linux) from [Sambit Sinha](https://sambitsinha.hashnode.dev/?source=top_nav_blog_home)
-1. Look at file_list1
-2. Write a loop to fetch data
- 2.1. dealing with incomplete fetch
-3. Checksums - check if the data is right
-4. Which is the smallest sample?
-5. zcat / zmore - looking inside a file
-6. count the number of reads in the files
\ No newline at end of file