Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nf test #69

Merged
merged 77 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
2f7e7cc
Update README.md
AmstlerStephan Mar 6, 2024
aaf42f1
adapt README
AmstlerStephan Mar 6, 2024
58d329c
add credits to original python scripts
AmstlerStephan Mar 6, 2024
f0853ed
Add citation file
AmstlerStephan Mar 6, 2024
ac658b7
add name of ONT repository
AmstlerStephan Mar 6, 2024
23677b8
Merge pull request #63 from genepi/add_credits
AmstlerStephan Mar 6, 2024
831d2e9
adapt citation
AmstlerStephan Mar 6, 2024
d49a5f1
remove repository citation
AmstlerStephan Mar 6, 2024
d97c06b
adapt citation
AmstlerStephan Mar 6, 2024
1436b84
adapt citation
AmstlerStephan Mar 6, 2024
9f36b15
Adapt citation file
AmstlerStephan Mar 6, 2024
845a98f
adapt citation
AmstlerStephan Mar 6, 2024
d3191d2
adapt config files
AmstlerStephan Mar 6, 2024
b5d5cbb
adapt test.config
AmstlerStephan Mar 6, 2024
c12b48f
reduce initial polish cluster resources
AmstlerStephan Mar 6, 2024
25a239c
prepare release
AmstlerStephan Mar 6, 2024
30f4728
Merge pull request #64 from genepi/Adapt-Dockerfile
AmstlerStephan Mar 6, 2024
24f0910
convert to 0.2.0
AmstlerStephan Mar 6, 2024
61556f8
Update test.config
AmstlerStephan Mar 6, 2024
aa9c8ca
Update custom.config
AmstlerStephan Mar 6, 2024
dfd14e6
Update nextflow.config
AmstlerStephan Mar 6, 2024
efc554e
Update README.md
AmstlerStephan Mar 6, 2024
f9599a8
Update README.md
AmstlerStephan Mar 6, 2024
fbf6e1f
init nf-test
AmstlerStephan Mar 7, 2024
3bb374f
add gitignore
AmstlerStephan Mar 7, 2024
64b6260
adapt config files to use nf-test
AmstlerStephan Mar 7, 2024
025ffa2
move test data
AmstlerStephan Mar 7, 2024
d6d0d43
adapt test.config
AmstlerStephan Mar 7, 2024
bb6b685
move testing data
AmstlerStephan Mar 7, 2024
4eabc2f
add data and tests for the cluster filtering and reformating
AmstlerStephan Mar 7, 2024
6a6ebb3
add further tests for reformat filter cluster
AmstlerStephan Mar 7, 2024
3b62f2b
add snapshot to test main workflow
AmstlerStephan Mar 7, 2024
a000151
no snapshots for workflows
AmstlerStephan Mar 7, 2024
3bf22c6
move testing data
AmstlerStephan Mar 7, 2024
9549bf7
Merge branch 'nf-test' of https://github.com/genepi/umi-pipeline-nf i…
AmstlerStephan Mar 7, 2024
9a3ffbd
adapt freebayes variant calling flags
AmstlerStephan Mar 8, 2024
30f9bd1
Update README.md
AmstlerStephan Mar 8, 2024
d34e858
Merge branch 'main' of https://github.com/genepi/umi-pipeline-nf into…
AmstlerStephan Mar 8, 2024
ace9534
adjust freebayes process
AmstlerStephan Mar 8, 2024
0b0f4b4
prepare release 0.2.1
AmstlerStephan Mar 8, 2024
8b471b3
adapt module structure
AmstlerStephan Mar 11, 2024
74543c6
adapt module paths in main workflow
AmstlerStephan Mar 11, 2024
adc32dc
add test data for variant calling
AmstlerStephan Mar 11, 2024
248918b
Add variant calling test data
AmstlerStephan Mar 11, 2024
5980ae9
Add test cases
AmstlerStephan Mar 11, 2024
1bd50f4
Add freebayes test cases
AmstlerStephan Mar 11, 2024
21bf952
Add tests for all variant callers
AmstlerStephan Mar 11, 2024
2fbbb8e
a
AmstlerStephan Mar 11, 2024
4dbee61
adapt lofreq parameters to have no minimal coverage
AmstlerStephan Mar 11, 2024
2f9aaf2
remove typo
AmstlerStephan Mar 11, 2024
170299f
Add fastq test files for merge input
AmstlerStephan Mar 12, 2024
1171cb7
Add tests and snapshot for merge_fastq
AmstlerStephan Mar 12, 2024
f7064c0
add test data for mapping
AmstlerStephan Mar 12, 2024
4ca53b9
add tests for consensus and final consensus reads
AmstlerStephan Mar 12, 2024
efe9a8e
Create test cases for split_reads
AmstlerStephan Mar 12, 2024
b4c843b
prepare tests for split reads process
AmstlerStephan Mar 15, 2024
36b8944
prepare tests for umi detection
AmstlerStephan Mar 15, 2024
b98af47
Add tests for clustering
AmstlerStephan Mar 15, 2024
6acd93d
remove params snippet
AmstlerStephan Mar 15, 2024
b19fc54
add tests for cluster polishing
AmstlerStephan Mar 15, 2024
0c2439f
add tests for cluster reformatting using
AmstlerStephan Mar 15, 2024
1521903
Add CI tests for nf-test
AmstlerStephan Jun 17, 2024
1cdef7b
Update ci-tests.yml
AmstlerStephan Jun 17, 2024
3077e99
Adapt docker file path
AmstlerStephan Jun 17, 2024
d2bf203
Merge branch 'nf-test' of https://github.com/genepi/umi-pipeline-nf i…
AmstlerStephan Jun 17, 2024
78c6dad
updated testing paramaters for faster execution
AmstlerStephan Jun 18, 2024
7f5d96e
Updated test cases
AmstlerStephan Jun 18, 2024
5b770cc
Updated local nf-test to v0.9.0 and updated snapshots
AmstlerStephan Jun 18, 2024
dccab02
Merge pull request #67 from genepi/main
AmstlerStephan Jun 18, 2024
b60ab80
specify minimap2 version
AmstlerStephan Jun 18, 2024
09ab258
test using existing release container
AmstlerStephan Jun 18, 2024
42f2b4b
remove docker initiation
AmstlerStephan Jun 18, 2024
8598c76
Temporarily remove snapshots of miniminap2
AmstlerStephan Jun 18, 2024
10e118d
adapt testing for merging fastq files
AmstlerStephan Jun 19, 2024
7fa1fd9
Adapt java version and update snapshots
AmstlerStephan Jun 20, 2024
fe58e66
java is installed in docker, obsolete to install it additionally
AmstlerStephan Jun 20, 2024
e80c842
Resolving merge_fastq tests
AmstlerStephan Jun 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/ci-tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: CI Tests

on: [push, pull_request]

jobs:

test:

runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3, 4, 5]

steps:
- name: Checkout
uses: actions/checkout@v2

- name: Set up JDK 11
uses: actions/setup-java@v2
with:
java-version: '11'
distribution: 'adopt'

- name: Setup Nextflow
uses: nf-core/setup-nextflow@v1
with:
version: "latest-edge"

- name: Install nf-test
run: |
wget -qO- get.nf-test.com | bash -s 0.9.0-rc2
sudo mv nf-test /usr/local/bin/

- name: Run Tests (Shard ${{ matrix.shard }}/${{ strategy.job-total }})
run: nf-test test --ci --shard ${{ matrix.shard }}/${{ strategy.job-total }}
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
work
.nextflow
.nextflow.*
tests/output
.nf-test/
nf-test
.nf-test.log
57 changes: 57 additions & 0 deletions CITATION.cff
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
cff-version: "1.2.0"
message: "If you use this software, please cite it as below."
title: "Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR"
authors:
- family-names: "Amstler"
given-names: "Stephan"
- family-names: "Streiter"
given-names: "Gertraud"
- family-names: "Pfurtscheller"
given-names: "Cathrin"
- family-names: "Forer"
given-names: "Lukas"
- family-names: "Di Maio"
given-names: "Silvia"
- family-names: "Weissensteiner"
given-names: "Hansi"
- family-names: "Paulweber"
given-names: "Bernhard"
- family-names: "Schoenherr"
given-names: "Sebastian"
- family-names: "Kronenberg"
given-names: "Florian"
- family-names: "Coassin"
given-names: "Stefan"
doi: "10.1101/2024.03.01.582741"
date-released: "2024-03-05"
license: "Apache-2.0"
repository-code: "https://github.com/genepi/umi-pipeline-nf"
preferred-citation:
type: "article"
authors:
- family-names: "Amstler"
given-names: "Stephan"
- family-names: "Streiter"
given-names: "Gertraud"
- family-names: "Pfurtscheller"
given-names: "Cathrin"
- family-names: "Forer"
given-names: "Lukas"
- family-names: "Di Maio"
given-names: "Silvia"
- family-names: "Weissensteiner"
given-names: "Hansi"
- family-names: "Paulweber"
given-names: "Bernhard"
- family-names: "Schoenherr"
given-names: "Sebastian"
- family-names: "Kronenberg"
given-names: "Florian"
- family-names: "Coassin"
given-names: "Stefan"
doi: "10.1101/2024.03.01.582741"
journal: "bioRxiv"
day: 5
month: 3
title: "Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR"
year: 2024
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Umi-pipeline-nf

**Umi-pipeline-nf** creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicons from nanopore sequencing data.
The pipeline can be run for the whole fastq_pass folder of your nanopore run and, per default, outputs the aligned consensus sequences of each UMI cluster in bam file. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences.
umi-pipeline-nf is inspired by a snakemake-based analysis pipeline ([ONT UMI analysis pipeline](https://github.com/nanoporetech/pipeline-umi-amplicon); originally developed by [Karst et al, Nat Biotechnol 18:165–169, 2021](https://www.nature.com/articles/s41592-020-01041-y)). We migrated the pipeline in [Nextflow](https://www.nextflow.io), included several optimizations and [additional functionalities](#main-adaptations).
Umi-pipeline-nf orignates from a snakemake-based analysis pipeline ([pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon); originally developed by [Karst et al, Nat Biotechnol 18:165–169, 2021](https://www.nature.com/articles/s41592-020-01041-y)). We migrated the pipeline to [Nextflow](https://www.nextflow.io) and included several optimizations and [additional functionalities](#main-adaptations).

![Workflow](docs/images/umi-pipeline-nf_metro-map.svg)

Expand Down Expand Up @@ -43,20 +43,26 @@ umi-pipeline-nf is inspired by a snakemake-based analysis pipeline ([ONT UMI ana
2. Download the pipeline and test it on a [minimal dataset](data/info.txt) with a single command.

```bash
nextflow run genepi/umi-pipeline-nf -r v0.1.0 -profile test,docker
nextflow run genepi/umi-pipeline-nf -r v0.2.1 -profile test,docker
```

3. Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible).

```bash
nextflow run genepi/umi-pipeline-nf -r v0.1.0 -c <custom.config> -profile docker
nextflow run genepi/umi-pipeline-nf -r v0.2.1 -c <custom.config> -profile custom,<docker,singularity>
```

## Citation

If you use the pipeline please cite [our Paper](https://www.biorxiv.org/content/10.1101/2024.03.01.582741v1):

Amstler S, Streiter G, Pfurtscheller C, Forer L, Di Maio S, Weissensteiner H, Paulweber B, Schoenherr S, Kronenberg F, Coassin S. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR. bioRxiv. 2024. doi: 10.1101/2024.03.01.582741.


### Credits

The pipeline was written by ([@StephanAmstler](https://github.com/AmstlerStephan)).
Nextflow template pipeline: [EcSeq](https://github.com/ecSeq).
Snakemake-based ONT pipeline: [nanoporetech/pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon).
Original workflow: [SorenKarst/longread_umi](https://github.com/SorenKarst/longread_umi).
Snakemake-based ONT pipeline for UMI nanopore sequencing analysis: [nanoporetech/pipeline-umi-amplicon](https://github.com/nanoporetech/pipeline-umi-amplicon).
UMI-corrected nanopore sequencing analysis first shown by: [SorenKarst/longread_umi](https://github.com/SorenKarst/longread_umi).
5 changes: 5 additions & 0 deletions bin/extract_umis.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/extract_umis.py
"""

import argparse
import logging
import os
Expand Down
4 changes: 4 additions & 0 deletions bin/filter_reads.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/filter_reads.py
"""
import argparse
import logging
import os
Expand Down
5 changes: 5 additions & 0 deletions bin/parse_clusters.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/parse_clusters.py
"""

import argparse
import logging
import os
Expand Down
5 changes: 5 additions & 0 deletions bin/reformat_consensus.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
"""
This is a modified version of the code present in:
https://github.com/nanoporetech/pipeline-umi-amplicon/blob/master/lib/umi_amplicon_tools/reformat_consensus.py
"""

import argparse
import logging
import sys
Expand Down
6 changes: 3 additions & 3 deletions config/base.config
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@
// PROCESS RESOURCES
process {
withName: "POLISH_CLUSTER" {
memory = { 10.GB * task.attempt }
cpus = 2
memory = { 2.GB * task.attempt }
cpus = 1
}

errorStrategy = 'retry'
maxRetries = 3
maxRetries = 5
}
44 changes: 31 additions & 13 deletions config/custom.config
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,45 @@

params {

help = false
version = false
debug = false
help = false
version = false
debug = false

// required parameters

input = "PATH/TO/fastq_pass/"
output = "PATH/TO/OUTPUT_DIR"
reference = "PATH/TO/REF.fasta"
reference_fai = "PATH/TO/REF.fasta.fai"
bed = "PATH/TO/BED.bed"
input = "PATH/TO/fastq_pass/"
output = "PATH/TO/OUTPUT_DIR"
reference = "PATH/TO/REF.fasta"
reference_fai = "PATH/TO/REF.fasta.fai"
bed = "PATH/TO/BED.bed"

// adaptable parameters

output_format = "fastq"
filter_strategy_clusters = "quality"
//READ FILTERING
min_read_length = 0
min_qscore = 10

call_variants = true
variant_caller = "freebayes"
// SUBSAMPLING
subsampling = false
subsampling_seed = 11
subsampling_readnumber = 100000

// VARIANT_CALLING
call_variants = false
variant_caller = "freebayes"

medaka_model = "r1041_e82_400bps_hac_g615"
// ADVANCED
min_reads_per_barcode = 1000
umi_errors = 2
max_dist_umi = 2
min_reads_per_cluster = 20
max_reads_per_cluster = 60
min_consensus_quality = 40
masking_strategy = "softmask"
filter_strategy_clusters = "quality"
min_overlap = 0.95
balance_strands = true
medaka_model = "r1041_e82_400bps_hac_g615"
}

// NEXTFLOW REPORTING
Expand Down
33 changes: 13 additions & 20 deletions config/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,21 @@

params {

help = false
version = false
debug = true
help = false
version = false
debug = false

input = "$baseDir/data/fastq_pass/"
output = "umi-pipeline-nf_test-run"
reference = "$baseDir/data/ref/lpa-ref2645.fasta"
reference_fai = "$baseDir/data/ref/lpa-ref2645.fasta.fai"
bed = "$baseDir/data/ref/lpa-ref2645.bed"
input = "$baseDir/tests/input/pipeline/fastq_pass/"
output = "test_umi-pipeline-nf"
reference = "$baseDir/tests/input/pipeline/ref/lpa-ref2645.fasta"
reference_fai = "$baseDir/tests/input/pipeline/ref/lpa-ref2645.fasta.fai"
bed = "$baseDir/tests/input/pipeline/ref/lpa-ref2645.bed"

subsampling = false

min_reads_per_cluster = 10
max_reads_per_cluster = 20

write_reports = true
output_format = "fastq"
filter_strategy_clusters = "quality"
call_variants = true
variant_caller = "freebayes"

medaka_model = "r1041_e82_400bps_hac_g615"
min_reads_per_cluster = 10
max_reads_per_cluster = 20
min_reads_per_barcode = 0
call_variants = true
variant_caller = "freebayes"
}

// NEXTFLOW REPORTING
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
22 changes: 0 additions & 22 deletions data/info.txt

This file was deleted.

6 changes: 1 addition & 5 deletions env/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,4 @@ RUN conda update -y conda && \
conda clean --all

WORKDIR "/opt"
RUN wget https://github.com/seppinho/mutserve/releases/download/v2.0.0-rc15/mutserve.zip && \
unzip mutserve.zip
ENV PATH="/opt/mutserve:${PATH}"


RUN wget https://github.com/seppinho/mutserve/releases/download/v2.0.0-rc13.lpa/mutserve_LPA_adapted.jar
2 changes: 1 addition & 1 deletion env/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ dependencies:
- seqtk=1.3
- lofreq=2.1.5
- freebayes=1.3.2
- vcflib
- vcflib=1.0.0
- bedtools=2.30.0
- vsearch=2.21.2
- openjdk=11.0.9
Expand Down
7 changes: 4 additions & 3 deletions lib/processes/cluster.nf
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,17 @@ vsearch_dir="vsearch_clusters"

process CLUSTER {
publishDir "${params.output}/${sample}/clustering/${type}", pattern: "${consensus_fasta}", mode: 'copy'
publishDir "${params.output}/${sample}/clustering/${type}", pattern: "cluster*", mode: 'copy'

input:
tuple val( sample ), val( target ), path( detected_umis_fastq )
val ( type )
output:
tuple val( "${sample}" ), val( "${target}" ), path( "${consensus_fasta}" ), emit:consensus_fasta
tuple val( "${sample}" ), val( "${target}" ), path( "cluster*" ), emit:cluster_fastas
tuple val( "${sample}" ), val( "${target}" ), path( "${consensus_fasta}" ), optional: true, emit:consensus_fasta
tuple val( "${sample}" ), val( "${target}" ), path( "cluster*" ), optional: true, emit:cluster_fastas

script:
def id = "${type}" == "raw" ? 0.8 : 0.99
def id = "${type}" == "raw" ? 0.90 : 0.99
"""
vsearch \
--clusterout_id \
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ process LOFREQ {
--ref ${reference} \
--out ${type}.vcf \
--call-indels \
--min-cov 5 \
--no-default-filter \
${bam}
"""
}
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ process MUTSERVE {

script:
"""
mutserve call \
java -jar /opt/mutserve_LPA_adapted.jar call \
--output ${type}.vcf \
--write-raw \
--reference ${reference} \
Expand Down
2 changes: 1 addition & 1 deletion lib/processes/reformat_filter_cluster.nf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
process REFORMAT_FILTER_CLUSTER {
tag "${sample}"
// publishDir "${params.output}/${sample}/clustering/${type}/smolecule", pattern: "smolecule*", mode: 'copy'
publishDir "${params.output}/${sample}/clustering/${type}/smolecule", pattern: "smolecule*", mode: 'copy'
publishDir "${params.output}/${sample}/stats/${type}", pattern: "*tsv", mode: 'copy'

input:
Expand Down
6 changes: 3 additions & 3 deletions lib/workflows/umi-pipeline.nf
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,9 @@ include {REFORMAT_FILTER_CLUSTER} from '../processes/reformat_filter_cluster.nf'
include {POLISH_CLUSTER} from '../processes/polish_cluster.nf'
include {FILTER_CONSENSUS_FASTQ} from '../processes/filter_consensus_fastq.nf'
include {REFORMAT_CONSENSUS_CLUSTER} from '../processes/reformat_consensus_cluster.nf'
include {LOFREQ as LOFREQ_CONSENSUS; LOFREQ as LOFREQ_FINAL_CONSENSUS} from '../processes/variant_calling/lofreq.nf'
include {MUTSERVE as MUTSERVE_CONSENSUS; MUTSERVE as MUTSERVE_FINAL_CONSENSUS} from '../processes/variant_calling/mutserve.nf'
include {FREEBAYES as FREEBAYES_CONSENSUS; FREEBAYES as FREEBAYES_FINAL_CONSENSUS} from '../processes/variant_calling/freebayes.nf'
include {LOFREQ as LOFREQ_CONSENSUS; LOFREQ as LOFREQ_FINAL_CONSENSUS} from '../processes/lofreq.nf'
include {MUTSERVE as MUTSERVE_CONSENSUS; MUTSERVE as MUTSERVE_FINAL_CONSENSUS} from '../processes/mutserve.nf'
include {FREEBAYES as FREEBAYES_CONSENSUS; FREEBAYES as FREEBAYES_FINAL_CONSENSUS} from '../processes/freebayes.nf'


// SUB-WORKFLOWS
Expand Down
Loading
Loading