Nextflow Conversion of ASVTableTask.pm

MarkerGeneAnalysis16sDADA2

flowchart TD
    p0((Channel.fromList))
    p1[markerGeneAnalysis:downloadFiles]
    p2[markerGeneAnalysis:filterFastqs]
    p3[markerGeneAnalysis:buildErrors]
    p4[markerGeneAnalysis:fastqToAsv]
    p5[markerGeneAnalysis:mergeAsvsAndAssignToOtus]
    p6(( ))
    p7(( ))
    p8(( ))
    p0 -->|ids| p1
    p1 --> p2
    p2 --> p3
    p3 --> p4
    p4 --> p5
    p5 --> p8
    p5 --> p7
    p5 --> p6

Decription of nextflow configuration parameters:

param	value type	description
studyIdFile	path	Path to SRAID tsv file
platform	string	Platform the sequencing was done on. Used in filterFastqs step. Ex: "illumina"
isPaired	boolean	Is the data paired?
trimLeft	int	Amount to trim from left. Used in filterFastqs step.
trimLeftR	int	Amount to trim from reverse strand. Used in filterFastqs step.
truncLen	int	Amount to truncate from forward strand. Used in filterFastqs step.
truncLenR	int	Amount to truncate from reverse stand. Used in filterFastqs step.
maxLen	int	Maximum length. Used in filterFastqs step.
mergeTechReps	boolean	Would you like to merge TechReps in fastqToAsv step.
trainingSet	path	Path to training set fasta.
speciesAssignment	path	Path to species assignment fasta.
outputDir	path	Path where you would like results to be stored
nValue	scientific notation	Ex: 1e+02 Used in buildErrors Step. Setting value above 1e+02 can cause issues with memory

In order to run the script, you will need to supply to the nextflow.config file:

An apikey to be used to collect the fastq files from NCBI
The location of your training set file (a short example file has been supplied in /data)
The location of your species assignment file (a short example file has been supplied in /data)

The parameter names for these have already been specified, you only need to enter them into the empty quotes.

The output files from this tool will be formatted as (SRRID supplied in sraStudyIDFile parameter)_(outputName parameter).(nothing, "bootstraps", or "full")

Get Started

Install Nextflow

curl https://get.nextflow.io | bash
Run the script

nextflow run VEuPathDB/MarkerGeneAnalysis16sDADA2 -with-trace -c <config_file> -r main

Running this locally can commonly result in an OOME. This results from the derepFastq function in buildErrorsN.R having n as too great a value.

drpsF[[i]] <- derepFastq(inputFilesF[[i]], n = 1e+06, verbose = verbose)

The default n is 1 million reads. If an OOME occurs, a lower number can be entered.

Ex: drpsF[[i]] <- derepFastq(inputFilesF[[i]],n = 1e+03, verbose = verbose)

This is specified in the nextflow.config file parameter "nValue".

Documentation for derepFastq and other dada2 functionality can be found here.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
bin		bin
data		data
modules		modules
Dockerfile		Dockerfile
Jenkinsfile		Jenkinsfile
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nextflow Conversion of ASVTableTask.pm

Get Started

About

Releases

Packages

Languages

License

microbiomeDB/marker-gene-analysis-16s-DADA2-nextflow

Folders and files

Latest commit

History

Repository files navigation

Nextflow Conversion of ASVTableTask.pm

Get Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages