Skip to content

Commit

Permalink
Merge pull request #2 from ajodeh-juma/nf-branch
Browse files Browse the repository at this point in the history
Add nextflow training module
  • Loading branch information
pvanheus authored Mar 1, 2024
2 parents 5bb500c + 7ceb3e1 commit 06c1189
Show file tree
Hide file tree
Showing 41 changed files with 565 additions and 1 deletion.
3 changes: 2 additions & 1 deletion _data/course.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

title: Bioinformatics for Genomic Epidemiology March 2024
description: An introductory course on Pathogen Bioinformatics for Genomic Epidemiology
modules:
Expand All @@ -7,4 +8,4 @@ modules:
- "nextflow"
- "r-programming"
- "galaxy-introduction"
- "cholera-case-study"
- "cholera-case-study"
Binary file added img/nextflow-Figure1.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure10.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure11-2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure11.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure12.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure13.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure13.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure14.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure14.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure15.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure15.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure16.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure16.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure17.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure17.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure18.1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure18.2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure19.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure2.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure20.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure21.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure23.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure24.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure25.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure26.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/nextflow-Figure3.png
Binary file added img/nextflow-Figure4.png
Binary file added img/nextflow-Figure5.png
Binary file added img/nextflow-Figure6.png
Binary file added img/nextflow-Figure7.png
Binary file added img/nextflow-Figure8.png
Binary file added img/nextflow-Figure9.png
21 changes: 21 additions & 0 deletions modules/nextflow/_posts/2000-01-01-nextflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: Introduction to Bioinformatics workflows with Nextflow
published: true
tags: ["nextflow"]
---

## Introduction to Bioinformatics workflows with Nextflow


<br>

#### Objectives

* Understand what a workflow management system is.
* Understand the benefits of using a workflow management system (WfMS).
* Explain the benefits of using Nextflow as part of your bioinformatics
workflow.
* Explain the components of a Nextflow script.
* Run a Nextflow script.

<br>
29 changes: 29 additions & 0 deletions modules/nextflow/_posts/2000-01-02-open-science.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
title: Open science
---

## Open science

"Open science is the framework with a focus on making science more accessible, inclusive and equitable for the benefit of all" – UNESCO

1. Open data (ensuring data is freely available and adheres to FAIR data
principles).
2. Open access publishing (making peer-review scholarly research articles and
literature freely available).
3. Open source (software code is easily accessible for inspection, modification
and improvement/enhancement).
4. Open peer review, open notebooks and open standards enable reproducible science.

<br>


## Reproducibility in open science

<center><img src="/img/nextflow-Figure1.jpeg" alt="Figure 1. " width="75%"/></center>
Yang-Min Kim, et al. (2018), _GigaScience_<https://doi.org/10.1093/gigascience/giy077>
<br>

“The collective endeavor of science depends on researchers being able to replicate the work of others”.

### Statements
* Using customized scripts, we computed/estimated ……..
79 changes: 79 additions & 0 deletions modules/nextflow/_posts/2000-01-03-workflows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: Workflows
---

## Workflows

Analyzing data involves a sequence of tasks, including gathering, cleaning, and
processing data. These sequence of tasks are called a **workflow** or a
**pipeline**.

<br>


<center><img src="/img/nextflow-Figure2.jpeg" alt="Figure 2. " width="75%"/></center>
Steinbiss, S., et al., (2016). Companion: a web server for annotation and analysis of parasite genomes. _Nucleic acids research_, 44(W1), W29–W34. <https://doi.org/10.1093/nar/gkw292>

<br>

## Workflow management systems

WfMS contain multiple features that simplify the development, monitoring, execution and sharing of pipelines. Key features include;
1. **Run time management**: Management of program execution on the operating
system and splitting tasks and data to run at the same time in a process
called parallelization.
2. **Software management**: Use of technology like containers, such as Docker
or Singularity, that packages up code and all its dependencies, so the
application runs reliably from one computing environment to another.
3. **Portability & Interoperability**: Workflows written on one system can be
run on another computing infrastructure e.g., local computer, compute
cluster, or cloud infrastructure.
4. **Reproducibility**: The use of software management systems and a pipeline
specification means that the workflow will produce the same results when
re-run, including on different computing platforms.
5. **Reentrancy**: Continuous checkpoints allow workflows to resume from the
last successfully executed steps.

<br>

<center><img src="/img/nextflow-Figure3.png" alt="Figure 2. " width="75%"/></center>

<br>

## Language

- Nextflow scripts are written using a language intended to simplify the writing
of workflows. Languages written for a specific field are called __*Domain
Specific Languages (DSL)*__, e.g., SQL is used to work with databases, and AWK
is designed for text processing.

- In practical terms the Nextflow scripting language is an extension of the
[Groovy programming language](https://groovy-lang.org/), which in turn is a
super-set of the Java programming language. Groovy simplifies the writing of
code and is more approachable than Java. Groovy semantics (syntax, control
structures, etc) are documented
[here](https://groovy-lang.org/semantics.html).

- The approach of having a simple DSL built on top of a more powerful general
purpose programming language makes Nextflow very flexible. The Nextflow syntax
can handle most workflow use cases with ease, and then Groovy can be used to
handle corner cases which may be difficult to implement using the DSL.

<br>

## DSL2 syntax

Nextflow (version > 20.07.1) provides a revised syntax to the original DSL,
known as DSL2. The DSL2 syntax introduces several improvements such as
**modularity** (separating components to provide flexibility and enable
reuse), and improved data flow manipulation. This further simplifies the writing
of complex data analysis pipelines, and enhances workflow **readability**, and
**reusability**.

This feature is enabled by the following directive at the beginning a workflow script:

`nextflow.enable.dsl=2`

<br>


151 changes: 151 additions & 0 deletions modules/nextflow/_posts/2000-01-04-processes-channels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
---
title: Processes, channels and workflows
---

## Processes, channels and workflows


Nextflow workflows have three main parts; **processes**, **channels**, and **workflows**.

- Processes describe a **task** to be run. A process script can be written in
any scripting language that can be executed by the Linux platform (Bash, Perl,
Ruby, Python, etc.). Processes spawn a task for each complete input set. Each
task is **executed independently** and cannot interact with another task. The
only way data can be passed between process tasks is via
**asynchronous queues**, called channels.
Processes define **inputs** and **outputs**
for a task.

- Channels are then used to manipulate the **flow of data** from one process to the
next.

- The interaction between processes, and ultimately the pipeline execution
flow itself, is then explicitly defined in a **workflow** section.

<br>

## Processes and channels

In the example below, we have a channel containing three elements, e.g., 3 data
files. We have a process that takes the channel as input. Since the channel has
three elements, three independent instances (tasks) of that process are run in
parallel. Each task generates an output, which is passed to another channel and
used as input for the next process.

<br>

<center><img src="/img/nextflow-Figure4.png" alt="Figure 2. " width="75%"/></center>

<br>

## Workflow execution

While a process defines what command or script must be executed, the **executor**
determines how that script is run in the target system.

If not otherwise specified, processes are executed on the **local** computer. The
local executor is very useful for pipeline development, testing, and small-scale
workflows, but for large scale computational pipelines, a **High-Performance
Cluster (HPC)** or **Cloud platform** is often required.

<br>

## Process execution block

```
nextflow.enable.dsl=2
process <NAME> {
[ directives ]
input:
< process inputs >
output:
< process outputs >
when:
< condition >
[script|shell|exec]:
< user script to be executed >
}
```
<br>

### Example of running a shell command

```
#!/usr/bin/env nextflow
nextflow.enable.dsl = 2
process INDEX {
cpus 4
input:
path transcriptome
output:
path ‘salmon_index’
script:
""”
salmon index –threads $task.cpus –t $transcriptome –i salmon_index
"""
}
```
<br>

### Example of running a python script
```
//process_python.nf
nextflow.enable.dsl=2
process PYSTUFF {
script:
"""
#!/usr/bin/env python
import gzip
reads = 0
bases = 0
with gzip.open('${projectDir}/data/yeast/reads/ref1_1.fq.gz', 'rb') as read:
for id in read:
seq = next(read)
reads += 1
bases += len(seq.strip())
next(read)
next(read)
print("reads", reads)
print("bases", bases)
"""
}
workflow {
PYSTUFF()
}
```
<br>

## Running a nextflow script

`nextflow run <script_name> <options/parameters>`

`nextflow run index_transcriptome.nf`

## Summary


<br>

<center><img src="/img/nextflow-Figure6.png" alt="Figure 2. " width="75%"/></center>

<br>


Loading

0 comments on commit 06c1189

Please sign in to comment.