Umi-pipeline-nf creates highly accurate single-molecule consensus sequences for unique molecular identifier (UMI)-tagged amplicons from nanopore sequencing data.
The pipeline can be run for the whole fastq_pass folder of your nanopore run and, per default, outputs the aligned consensus sequences of each UMI cluster in bam file. The optional variant calling creates a vcf file for all variants that are found in the consensus sequences.
Umi-pipeline-nf orignates from a snakemake-based analysis pipeline (pipeline-umi-amplicon; originally developed by Karst et al, Nat Methods 18:165–169, 2021). We migrated the pipeline to Nextflow and included several optimizations and additional functionalities.
- Input Fastq-files are merged and filtered.
- Reads are aligned against a reference genome and filtered to keep only full-length on-target reads.
- The flanking UMI sequences of all reads are extracted.
- The extracted UMIs are used to cluster the reads.
- Per cluster, highly accurate consensus sequences are created.
- The consensus sequences are aligned against the reference sequenced.
- An optional variant calling step can be performed.
- UMI-extraction, clustering, consensus sequence creation, and mapping are repeated.
- An optional variant calling step can be performed.
See the output documentation for a detailed overview of the pipeline and its output files.
- It comes with a docker/singularity container making installation simple, easy to use on clusters and results highly reproducible.
- The pipeline is optimized for parallelization.
- Additional UMI cluster splitting step to remove admixed UMI clusters.
- Read filtering strategy per UMI cluster was adapted to preserve the highest quality reads.
- Three commonly used variant callers (freebayes, lofreq or mutserve) are supported by the pipeline.
- The raw reads can be optionally subsampled.
- The raw reads can be filtered by read length and quality.
See the usage documentation for all of the available parameters of the pipeline.
-
Install
nextflow
. -
Download the pipeline and test it on a minimal dataset with a single command.
nextflow run genepi/umi-pipeline-nf -r v0.2.1 -profile test,docker
- Start running your own analysis!
3.1 Download and adapt the config/custom.config with paths to your data (relative and absolute paths possible).
nextflow run genepi/umi-pipeline-nf -r v0.2.1 -c <custom.config> -profile custom,<docker,singularity>
If you use the pipeline please cite our Paper:
Amstler, S., Streiter, G., Pfurtscheller, C. et al. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex lipoprotein(a) KIV-2 VNTR. Genome Med 16, 117 (2024). https://doi.org/10.1186/s13073-024-01391-8
The pipeline was written by (@StephanAmstler).
Nextflow template pipeline: EcSeq.
Snakemake-based ONT pipeline for UMI nanopore sequencing analysis: nanoporetech/pipeline-umi-amplicon.
UMI-corrected nanopore sequencing analysis first shown by: SorenKarst/longread_umi.