Skip to content

Commit

Permalink
Merge pull request #69 from genepi/nf-test
Browse files Browse the repository at this point in the history
Nf test
  • Loading branch information
AmstlerStephan authored Jun 20, 2024
2 parents 2628ab6 + e80c842 commit 84c8f96
Show file tree
Hide file tree
Showing 119 changed files with 72,809 additions and 92 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/ci-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: CI Tests

on: [push, pull_request]

jobs:

test:

runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4, 5]

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up JDK 11
uses: actions/setup-java@v2
with:
java-version: '11'
distribution: 'adopt'

- name: Setup Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "latest-edge"

- name: Install nf-test
run: |
wget -qO- get.nf-test.com | bash -s 0.9.0-rc2
sudo mv nf-test /usr/local/bin/
- name: Run Tests (Shard ${{ matrix.shard }}/${{ strategy.job-total }})
run: nf-test test --ci --shard ${{ matrix.shard }}/${{ strategy.job-total }}
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
work
.nextflow
.nextflow.*
tests/output
.nf-test/
nf-test
.nf-test.log
57 changes: 57 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
cff-version: "1.2.0"
message: "If you use this software, please cite it as below."
title: "Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR"
authors:
- family-names: "Amstler"
given-names: "Stephan"
- family-names: "Streiter"
given-names: "Gertraud"
- family-names: "Pfurtscheller"
given-names: "Cathrin"
- family-names: "Forer"
given-names: "Lukas"
- family-names: "Di Maio"
given-names: "Silvia"
- family-names: "Weissensteiner"
given-names: "Hansi"
- family-names: "Paulweber"
given-names: "Bernhard"
- family-names: "Schoenherr"
given-names: "Sebastian"
- family-names: "Kronenberg"
given-names: "Florian"
- family-names: "Coassin"
given-names: "Stefan"
doi: "10.1101/2024.03.01.582741"
date-released: "2024-03-05"
license: "Apache-2.0"
repository-code: "https://github.com/genepi/umi-pipeline-nf"
preferred-citation:
type: "article"
authors:
- family-names: "Amstler"
given-names: "Stephan"
- family-names: "Streiter"
given-names: "Gertraud"
- family-names: "Pfurtscheller"
given-names: "Cathrin"
- family-names: "Forer"
given-names: "Lukas"
- family-names: "Di Maio"
given-names: "Silvia"
- family-names: "Weissensteiner"
given-names: "Hansi"
- family-names: "Paulweber"
given-names: "Bernhard"
- family-names: "Schoenherr"
given-names: "Sebastian"
- family-names: "Kronenberg"
given-names: "Florian"
- family-names: "Coassin"
given-names: "Stefan"
doi: "10.1101/2024.03.01.582741"
journal: "bioRxiv"
day: 5
month: 3
title: "Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR"
year: 2024
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Umi-pipeline-nf

**Umi-pipeline-nf** creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicons from nanopore sequencing data.
The pipeline can be run for the whole fastq_pass folder of your nanopore run and, per default, outputs the aligned consensus sequences of each UMI cluster in bam file. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences.
umi-pipeline-nf is inspired by a snakemake-based analysis pipeline ([ONT UMI analysis pipeline](https://github.com/nanoporetech/pipeline-umi-amplicon); originally developed by [Karst et al, Nat Biotechnol 18:165–169, 2021](https://www.nature.com/articles/s41592-020-01041-y)). We migrated the pipeline in [Nextflow](https://www.nextflow.io), included several optimizations and [additional functionalities](#main-adaptations).
Umi-pipeline-nf orignates from a snakemake-based analysis pipeline ([pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon); originally developed by [Karst et al, Nat Biotechnol 18:165–169, 2021](https://www.nature.com/articles/s41592-020-01041-y)). We migrated the pipeline to [Nextflow](https://www.nextflow.io) and included several optimizations and [additional functionalities](#main-adaptations).

![Workflow](docs/images/umi-pipeline-nf_metro-map.svg)

Expand Down Expand Up @@ -43,20 +43,26 @@ umi-pipeline-nf is inspired by a snakemake-based analysis pipeline ([ONT UMI ana
2. Download the pipeline and test it on a [minimal dataset](data/info.txt) with a single command.

```bash
nextflow run genepi/umi-pipeline-nf -r v0.1.0 -profile test,docker
nextflow run genepi/umi-pipeline-nf -r v0.2.1 -profile test,docker
```

3. Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible).

```bash
nextflow run genepi/umi-pipeline-nf -r v0.1.0 -c <custom.config> -profile docker
nextflow run genepi/umi-pipeline-nf -r v0.2.1 -c <custom.config> -profile custom,<docker,singularity>
```

## Citation

If you use the pipeline please cite [our Paper](https://www.biorxiv.org/content/10.1101/2024.03.01.582741v1):

Amstler S, Streiter G, Pfurtscheller C, Forer L, Di Maio S, Weissensteiner H, Paulweber B, Schoenherr S, Kronenberg F, Coassin S. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR. bioRxiv. 2024. doi: 10.1101/2024.03.01.582741.


### Credits

The pipeline was written by ([@StephanAmstler](https://github.com/AmstlerStephan)).
Nextflow template pipeline: [EcSeq](https://github.com/ecSeq).
Snakemake-based ONT pipeline: [nanoporetech/pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon).
Original workflow: [SorenKarst/longread_umi](https://github.com/SorenKarst/longread_umi).
Snakemake-based ONT pipeline for UMI nanopore sequencing analysis: [nanoporetech/pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon).
UMI-corrected nanopore sequencing analysis first shown by: [SorenKarst/longread_umi](https://github.com/SorenKarst/longread_umi).
5 changes: 5 additions & 0 deletions bin/extract_umis.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/extract_umis.py
"""

import argparse
import logging
import os
Expand Down
4 changes: 4 additions & 0 deletions bin/filter_reads.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/filter_reads.py
"""
import argparse
import logging
import os
Expand Down
5 changes: 5 additions & 0 deletions bin/parse_clusters.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/parse_clusters.py
"""

import argparse
import logging
import os
Expand Down
5 changes: 5 additions & 0 deletions bin/reformat_consensus.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/reformat_consensus.py
"""

import argparse
import logging
import sys
Expand Down
6 changes: 3 additions & 3 deletions config/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
// PROCESS RESOURCES
process {
withName: "POLISH_CLUSTER" {
memory = { 10.GB * task.attempt }
cpus = 2
memory = { 2.GB * task.attempt }
cpus = 1
}

errorStrategy = 'retry'
maxRetries = 3
maxRetries = 5
}
44 changes: 31 additions & 13 deletions config/custom.config
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,45 @@

params {

help = false
version = false
debug = false
help = false
version = false
debug = false

// required parameters

input = "PATH/TO/fastq_pass/"
output = "PATH/TO/OUTPUT_DIR"
reference = "PATH/TO/REF.fasta"
reference_fai = "PATH/TO/REF.fasta.fai"
bed = "PATH/TO/BED.bed"
input = "PATH/TO/fastq_pass/"
output = "PATH/TO/OUTPUT_DIR"
reference = "PATH/TO/REF.fasta"
reference_fai = "PATH/TO/REF.fasta.fai"
bed = "PATH/TO/BED.bed"

// adaptable parameters

output_format = "fastq"
filter_strategy_clusters = "quality"
//READ FILTERING
min_read_length = 0
min_qscore = 10

call_variants = true
variant_caller = "freebayes"
// SUBSAMPLING
subsampling = false
subsampling_seed = 11
subsampling_readnumber = 100000

// VARIANT_CALLING
call_variants = false
variant_caller = "freebayes"

medaka_model = "r1041_e82_400bps_hac_g615"
// ADVANCED
min_reads_per_barcode = 1000
umi_errors = 2
max_dist_umi = 2
min_reads_per_cluster = 20
max_reads_per_cluster = 60
min_consensus_quality = 40
masking_strategy = "softmask"
filter_strategy_clusters = "quality"
min_overlap = 0.95
balance_strands = true
medaka_model = "r1041_e82_400bps_hac_g615"
}

// NEXTFLOW REPORTING
Expand Down
33 changes: 13 additions & 20 deletions config/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,21 @@

params {

help = false
version = false
debug = true
help = false
version = false
debug = false

input = "$baseDir/data/fastq_pass/"
output = "umi-pipeline-nf_test-run"
reference = "$baseDir/data/ref/lpa-ref2645.fasta"
reference_fai = "$baseDir/data/ref/lpa-ref2645.fasta.fai"
bed = "$baseDir/data/ref/lpa-ref2645.bed"
input = "$baseDir/tests/input/pipeline/fastq_pass/"
output = "test_umi-pipeline-nf"
reference = "$baseDir/tests/input/pipeline/ref/lpa-ref2645.fasta"
reference_fai = "$baseDir/tests/input/pipeline/ref/lpa-ref2645.fasta.fai"
bed = "$baseDir/tests/input/pipeline/ref/lpa-ref2645.bed"

subsampling = false

min_reads_per_cluster = 10
max_reads_per_cluster = 20

write_reports = true
output_format = "fastq"
filter_strategy_clusters = "quality"
call_variants = true
variant_caller = "freebayes"

medaka_model = "r1041_e82_400bps_hac_g615"
min_reads_per_cluster = 10
max_reads_per_cluster = 20
min_reads_per_barcode = 0
call_variants = true
variant_caller = "freebayes"
}

// NEXTFLOW REPORTING
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
22 changes: 0 additions & 22 deletions data/info.txt

This file was deleted.

6 changes: 1 addition & 5 deletions env/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,4 @@ RUN conda update -y conda && \
conda clean --all

WORKDIR "/opt"
RUN wget https://github.com/seppinho/mutserve/releases/download/v2.0.0-rc15/mutserve.zip && \
unzip mutserve.zip
ENV PATH="/opt/mutserve:${PATH}"


RUN wget https://github.com/seppinho/mutserve/releases/download/v2.0.0-rc13.lpa/mutserve_LPA_adapted.jar
2 changes: 1 addition & 1 deletion env/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ dependencies:
- seqtk=1.3
- lofreq=2.1.5
- freebayes=1.3.2
- vcflib
- vcflib=1.0.0
- bedtools=2.30.0
- vsearch=2.21.2
- openjdk=11.0.9
Expand Down
7 changes: 4 additions & 3 deletions lib/processes/cluster.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@ vsearch_dir="vsearch_clusters"

process CLUSTER {
publishDir "${params.output}/${sample}/clustering/${type}", pattern: "${consensus_fasta}", mode: 'copy'
publishDir "${params.output}/${sample}/clustering/${type}", pattern: "cluster*", mode: 'copy'

input:
tuple val( sample ), val( target ), path( detected_umis_fastq )
val ( type )
output:
tuple val( "${sample}" ), val( "${target}" ), path( "${consensus_fasta}" ), emit:consensus_fasta
tuple val( "${sample}" ), val( "${target}" ), path( "cluster*" ), emit:cluster_fastas
tuple val( "${sample}" ), val( "${target}" ), path( "${consensus_fasta}" ), optional: true, emit:consensus_fasta
tuple val( "${sample}" ), val( "${target}" ), path( "cluster*" ), optional: true, emit:cluster_fastas

script:
def id = "${type}" == "raw" ? 0.8 : 0.99
def id = "${type}" == "raw" ? 0.90 : 0.99
"""
vsearch \
--clusterout_id \
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ process LOFREQ {
--ref ${reference} \
--out ${type}.vcf \
--call-indels \
--min-cov 5 \
--no-default-filter \
${bam}
"""
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ process MUTSERVE {

script:
"""
mutserve call \
java -jar /opt/mutserve_LPA_adapted.jar call \
--output ${type}.vcf \
--write-raw \
--reference ${reference} \
Expand Down
2 changes: 1 addition & 1 deletion lib/processes/reformat_filter_cluster.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process REFORMAT_FILTER_CLUSTER {
tag "${sample}"
// publishDir "${params.output}/${sample}/clustering/${type}/smolecule", pattern: "smolecule*", mode: 'copy'
publishDir "${params.output}/${sample}/clustering/${type}/smolecule", pattern: "smolecule*", mode: 'copy'
publishDir "${params.output}/${sample}/stats/${type}", pattern: "*tsv", mode: 'copy'

input:
Expand Down
6 changes: 3 additions & 3 deletions lib/workflows/umi-pipeline.nf
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ include {REFORMAT_FILTER_CLUSTER} from '../processes/reformat_filter_cluster.nf'
include {POLISH_CLUSTER} from '../processes/polish_cluster.nf'
include {FILTER_CONSENSUS_FASTQ} from '../processes/filter_consensus_fastq.nf'
include {REFORMAT_CONSENSUS_CLUSTER} from '../processes/reformat_consensus_cluster.nf'
include {LOFREQ as LOFREQ_CONSENSUS; LOFREQ as LOFREQ_FINAL_CONSENSUS} from '../processes/variant_calling/lofreq.nf'
include {MUTSERVE as MUTSERVE_CONSENSUS; MUTSERVE as MUTSERVE_FINAL_CONSENSUS} from '../processes/variant_calling/mutserve.nf'
include {FREEBAYES as FREEBAYES_CONSENSUS; FREEBAYES as FREEBAYES_FINAL_CONSENSUS} from '../processes/variant_calling/freebayes.nf'
include {LOFREQ as LOFREQ_CONSENSUS; LOFREQ as LOFREQ_FINAL_CONSENSUS} from '../processes/lofreq.nf'
include {MUTSERVE as MUTSERVE_CONSENSUS; MUTSERVE as MUTSERVE_FINAL_CONSENSUS} from '../processes/mutserve.nf'
include {FREEBAYES as FREEBAYES_CONSENSUS; FREEBAYES as FREEBAYES_FINAL_CONSENSUS} from '../processes/freebayes.nf'


// SUB-WORKFLOWS
Expand Down
Loading

0 comments on commit 84c8f96

Please sign in to comment.