forked from p2pu/course-in-a-box
-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from ajodeh-juma/nf-branch
Add nextflow training module
- Loading branch information
Showing
41 changed files
with
565 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
--- | ||
title: Introduction to Bioinformatics workflows with Nextflow | ||
published: true | ||
tags: ["nextflow"] | ||
--- | ||
|
||
## Introduction to Bioinformatics workflows with Nextflow | ||
|
||
|
||
<br> | ||
|
||
#### Objectives | ||
|
||
* Understand what a workflow management system is. | ||
* Understand the benefits of using a workflow management system (WfMS). | ||
* Explain the benefits of using Nextflow as part of your bioinformatics | ||
workflow. | ||
* Explain the components of a Nextflow script. | ||
* Run a Nextflow script. | ||
|
||
<br> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
title: Open science | ||
--- | ||
|
||
## Open science | ||
|
||
"Open science is the framework with a focus on making science more accessible, inclusive and equitable for the benefit of all" – UNESCO | ||
|
||
1. Open data (ensuring data is freely available and adheres to FAIR data | ||
principles). | ||
2. Open access publishing (making peer-review scholarly research articles and | ||
literature freely available). | ||
3. Open source (software code is easily accessible for inspection, modification | ||
and improvement/enhancement). | ||
4. Open peer review, open notebooks and open standards enable reproducible science. | ||
|
||
<br> | ||
|
||
|
||
## Reproducibility in open science | ||
|
||
<center><img src="/img/nextflow-Figure1.jpeg" alt="Figure 1. " width="75%"/></center> | ||
Yang-Min Kim, et al. (2018), _GigaScience_, <https://doi.org/10.1093/gigascience/giy077> | ||
<br> | ||
|
||
“The collective endeavor of science depends on researchers being able to replicate the work of others”. | ||
|
||
### Statements | ||
* Using customized scripts, we computed/estimated …….. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
--- | ||
title: Workflows | ||
--- | ||
|
||
## Workflows | ||
|
||
Analyzing data involves a sequence of tasks, including gathering, cleaning, and | ||
processing data. These sequence of tasks are called a **workflow** or a | ||
**pipeline**. | ||
|
||
<br> | ||
|
||
|
||
<center><img src="/img/nextflow-Figure2.jpeg" alt="Figure 2. " width="75%"/></center> | ||
Steinbiss, S., et al., (2016). Companion: a web server for annotation and analysis of parasite genomes. _Nucleic acids research_, 44(W1), W29–W34. <https://doi.org/10.1093/nar/gkw292> | ||
|
||
<br> | ||
|
||
## Workflow management systems | ||
|
||
WfMS contain multiple features that simplify the development, monitoring, execution and sharing of pipelines. Key features include; | ||
1. **Run time management**: Management of program execution on the operating | ||
system and splitting tasks and data to run at the same time in a process | ||
called parallelization. | ||
2. **Software management**: Use of technology like containers, such as Docker | ||
or Singularity, that packages up code and all its dependencies, so the | ||
application runs reliably from one computing environment to another. | ||
3. **Portability & Interoperability**: Workflows written on one system can be | ||
run on another computing infrastructure e.g., local computer, compute | ||
cluster, or cloud infrastructure. | ||
4. **Reproducibility**: The use of software management systems and a pipeline | ||
specification means that the workflow will produce the same results when | ||
re-run, including on different computing platforms. | ||
5. **Reentrancy**: Continuous checkpoints allow workflows to resume from the | ||
last successfully executed steps. | ||
|
||
<br> | ||
|
||
<center><img src="/img/nextflow-Figure3.png" alt="Figure 2. " width="75%"/></center> | ||
|
||
<br> | ||
|
||
## Language | ||
|
||
- Nextflow scripts are written using a language intended to simplify the writing | ||
of workflows. Languages written for a specific field are called __*Domain | ||
Specific Languages (DSL)*__, e.g., SQL is used to work with databases, and AWK | ||
is designed for text processing. | ||
|
||
- In practical terms the Nextflow scripting language is an extension of the | ||
[Groovy programming language](https://groovy-lang.org/), which in turn is a | ||
super-set of the Java programming language. Groovy simplifies the writing of | ||
code and is more approachable than Java. Groovy semantics (syntax, control | ||
structures, etc) are documented | ||
[here](https://groovy-lang.org/semantics.html). | ||
|
||
- The approach of having a simple DSL built on top of a more powerful general | ||
purpose programming language makes Nextflow very flexible. The Nextflow syntax | ||
can handle most workflow use cases with ease, and then Groovy can be used to | ||
handle corner cases which may be difficult to implement using the DSL. | ||
|
||
<br> | ||
|
||
## DSL2 syntax | ||
|
||
Nextflow (version > 20.07.1) provides a revised syntax to the original DSL, | ||
known as DSL2. The DSL2 syntax introduces several improvements such as | ||
**modularity** (separating components to provide flexibility and enable | ||
reuse), and improved data flow manipulation. This further simplifies the writing | ||
of complex data analysis pipelines, and enhances workflow **readability**, and | ||
**reusability**. | ||
|
||
This feature is enabled by the following directive at the beginning a workflow script: | ||
|
||
`nextflow.enable.dsl=2` | ||
|
||
<br> | ||
|
||
|
151 changes: 151 additions & 0 deletions
151
modules/nextflow/_posts/2000-01-04-processes-channels.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
--- | ||
title: Processes, channels and workflows | ||
--- | ||
|
||
## Processes, channels and workflows | ||
|
||
|
||
Nextflow workflows have three main parts; **processes**, **channels**, and **workflows**. | ||
|
||
- Processes describe a **task** to be run. A process script can be written in | ||
any scripting language that can be executed by the Linux platform (Bash, Perl, | ||
Ruby, Python, etc.). Processes spawn a task for each complete input set. Each | ||
task is **executed independently** and cannot interact with another task. The | ||
only way data can be passed between process tasks is via | ||
**asynchronous queues**, called channels. | ||
Processes define **inputs** and **outputs** | ||
for a task. | ||
|
||
- Channels are then used to manipulate the **flow of data** from one process to the | ||
next. | ||
|
||
- The interaction between processes, and ultimately the pipeline execution | ||
flow itself, is then explicitly defined in a **workflow** section. | ||
|
||
<br> | ||
|
||
## Processes and channels | ||
|
||
In the example below, we have a channel containing three elements, e.g., 3 data | ||
files. We have a process that takes the channel as input. Since the channel has | ||
three elements, three independent instances (tasks) of that process are run in | ||
parallel. Each task generates an output, which is passed to another channel and | ||
used as input for the next process. | ||
|
||
<br> | ||
|
||
<center><img src="/img/nextflow-Figure4.png" alt="Figure 2. " width="75%"/></center> | ||
|
||
<br> | ||
|
||
## Workflow execution | ||
|
||
While a process defines what command or script must be executed, the **executor** | ||
determines how that script is run in the target system. | ||
|
||
If not otherwise specified, processes are executed on the **local** computer. The | ||
local executor is very useful for pipeline development, testing, and small-scale | ||
workflows, but for large scale computational pipelines, a **High-Performance | ||
Cluster (HPC)** or **Cloud platform** is often required. | ||
|
||
<br> | ||
|
||
## Process execution block | ||
|
||
``` | ||
nextflow.enable.dsl=2 | ||
process <NAME> { | ||
[ directives ] | ||
input: | ||
< process inputs > | ||
output: | ||
< process outputs > | ||
when: | ||
< condition > | ||
[script|shell|exec]: | ||
< user script to be executed > | ||
} | ||
``` | ||
<br> | ||
|
||
### Example of running a shell command | ||
|
||
``` | ||
#!/usr/bin/env nextflow | ||
nextflow.enable.dsl = 2 | ||
process INDEX { | ||
cpus 4 | ||
input: | ||
path transcriptome | ||
output: | ||
path ‘salmon_index’ | ||
script: | ||
""” | ||
salmon index –threads $task.cpus –t $transcriptome –i salmon_index | ||
""" | ||
} | ||
``` | ||
<br> | ||
|
||
### Example of running a python script | ||
``` | ||
//process_python.nf | ||
nextflow.enable.dsl=2 | ||
process PYSTUFF { | ||
script: | ||
""" | ||
#!/usr/bin/env python | ||
import gzip | ||
reads = 0 | ||
bases = 0 | ||
with gzip.open('${projectDir}/data/yeast/reads/ref1_1.fq.gz', 'rb') as read: | ||
for id in read: | ||
seq = next(read) | ||
reads += 1 | ||
bases += len(seq.strip()) | ||
next(read) | ||
next(read) | ||
print("reads", reads) | ||
print("bases", bases) | ||
""" | ||
} | ||
workflow { | ||
PYSTUFF() | ||
} | ||
``` | ||
<br> | ||
|
||
## Running a nextflow script | ||
|
||
`nextflow run <script_name> <options/parameters>` | ||
|
||
`nextflow run index_transcriptome.nf` | ||
|
||
## Summary | ||
|
||
|
||
<br> | ||
|
||
<center><img src="/img/nextflow-Figure6.png" alt="Figure 2. " width="75%"/></center> | ||
|
||
<br> | ||
|
||
|
Oops, something went wrong.