Skip to content

Commit

Permalink
Merge pull request #127 from olabiyi/DEV_Metagenomics_Illumina_NF_con…
Browse files Browse the repository at this point in the history
…version

Metagenomics Illumina Nextflow workflow: Edited README files and accession parameter
  • Loading branch information
asaravia-butler authored Oct 23, 2024
2 parents cfdd9a7 + da5d9db commit f34870c
Show file tree
Hide file tree
Showing 5 changed files with 32 additions and 14 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ nextflow run main.nf --help
#### 4a. Approach 1: Run slurm jobs in singularity containers with OSD accession as input
```bash
nextflow run main.nf -resume -profile slurm,singularity --GLDS_accession OSD-574
nextflow run main.nf -resume -profile slurm,singularity --accession OSD-574
```
<br>
Expand Down Expand Up @@ -149,9 +149,9 @@ nextflow run main.nf -resume -profile conda --csv_file SE_file.csv --conda.qc <p
*Required only if you would like to pull and process data directly from OSDR*
* `--GLDS_accession` – A Genelab / OSD accession number e.g. OSD-574.
* `--accession` – A Genelab / OSD accession number e.g. OSD-574.
*Required only if --GLDS_accession is not passed as an argument*
*Required only if --accession is not passed as an argument*
* `--csv_file` – A single-end or paired-end input csv file containing assay metadata for each sample, including sample_id, forward, reverse, and/or paired. Please see the sample [SE_file.csv](workflow_code/SE_file.csv) and [PE_file.csv](workflow_code/PE_file.csv) in this repository for examples on how to format this file.
Expand Down Expand Up @@ -204,4 +204,21 @@ To generate a README file, a protocols file, a md5sums table and a file associat
nextflow -C post_processing.config run post_processng.nf -resume -profile slurm,singularity
```
The outputs of the run will be in a directory called `Post_Processing` by default.
The outputs of the run will be in a directory called `Post_Processing` by default and they are as follows:
- Post_processing/FastQC_Outputs/filtered_multiqc_GLmetagenomics_report.zip (Filtered sequence multiqc report with paths purged)
- Post_processing/FastQC_Outputs/raw_multiqc_GLmetagenomics_report.zip (Raw sequence multiqc report with paths purged)
- Post_processing/<GLDS_accession>_-associated-file-names.tsv (File association table for curation)
- Post_processing/<GLDS_accession>_metagenomics-validation.log (Automatic verification and validation log file)
- Post_processing/processed_md5sum_GLmetagenomics.tsv (md5sums for the files to be released on OSDR)
- Post_processing/processing_info_GLmetagenomics.zip (Zip file containing all files used to run the workflow and required logs with paths purged)
- Post_processing/protocol.txt (File describing the methods used by the workflow)
- Post_processing/README_GLmetagenomics.txt (README file listing and describing the outputs of the workflow)
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ if (params.help) {
println(" > nextflow run main.nf -resume -profile slurm,conda --csv_file SE_file.csv")
println()
println("Example 3: Run jobs locally in conda environments, supply a GLDS accession, and specify the path to an existing conda environment.")
println(" > nextflow run main.nf -resume -profile conda --GLDS_accession OSD-574 --conda.qc <path/to/existing/conda/environment>")
println(" > nextflow run main.nf -resume -profile conda --accession OSD-574 --conda.qc <path/to/existing/conda/environment>")
println()
println("Required arguments:")
println("""-profile [STRING] Specifies the profile to be used to run the workflow. Options are [slurm, singularity, docker, and conda].
Expand Down Expand Up @@ -86,7 +86,7 @@ if (params.help) {
println(" --read_based_dir [PATH] Read-based analysis outputs directory. Default: ../Read-based_Processing/.")
println()
println("Genelab specific arguements:")
println(" --GLDS_accession [STRING] A Genelab accession number if the --csv_file parameter is not set. If this parameter is set, it will ignore the --csv_file parameter.")
println(" --accession [STRING] A Genelab accession number if the --csv_file parameter is not set. If this parameter is set, it will ignore the --csv_file parameter.")
println(" --RawFilePattern [STRING] If we do not want to download all files (which we often won't), we can specify a pattern here to subset the total files.")
println(" For example, if we know we want to download just the fastq.gz files, we can say 'fastq.gz'. We can also provide multiple patterns")
println(" as a comma-separated list. For example, If we want to download the fastq.gz files that also have 'NxtaFlex', 'metagenomics', and 'raw' in")
Expand Down Expand Up @@ -145,7 +145,7 @@ log.info """
You have set the following parameters:
Profile: ${workflow.profile}
Input csv file : ${params.csv_file}
GLDS Accession : ${params.GLDS_accession}
GLDS or OSD Accession : ${params.accession}
GLDS Raw File Pattern: ${params.RawFilePattern}
Workflow : ${params.workflow}
Nextflow Directory publishing mode: ${params.publishDir_mode}
Expand Down Expand Up @@ -317,9 +317,9 @@ workflow {
// Software Version Capturing - runsheet
software_versions_ch = Channel.empty()
// Parse file input
if(params.GLDS_accession){
if(params.accession){

GET_RUNSHEET(params.GLDS_accession)
GET_RUNSHEET(params.accession)
GET_RUNSHEET.out.input_file
.splitCsv(header:true)
.set{file_ch}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ params {
checkm = null // "/path/to/envs/checkm"
}

GLDS_accession = false // GLDS or OSD acession number for the data to be processed
accession = false // GLDS or OSD acession number for the data to be processed
// Pattern of files on OSDR for the GLDS_accession you want to process.
RawFilePattern = null // "_metaG", "_HRremoved"
errorStrategy = "terminate"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ echo $HOSTNAME

## Activate the conda environemnt containing the tools you need to run your job ##
## You can see a list of all available environments by running the command: conda env list ##
## If you need a conda envrionment installed request it using JIRA ##

source activate /path/to/envs/nextflow ## Replace conda_env_name with the name of the conda environment with nextflow installed ##

Expand All @@ -40,8 +39,10 @@ echo ""

## The command(s) that you want to run in this slurm job ##
export NXF_SINGULARITY_CACHEDIR=singularity/
#nextflow run main.nf -profile slurm,singularity -resume --csv_file PE_file.csv ## Replace command with the command(s) you want to run ##
nextflow run main.nf -profile slurm,singularity --GLDS_accession OSD-574 -resume
export TOWER_ACCESS_TOKEN=<ACCESS_TOKEN>
export TOWER_WORKSPACE_ID=<WORKSPACE_ID>
#nextflow run main.nf -profile slurm,singularity -resume --csv_file PE_file.csv -with-tower ## Replace command with the command(s) you want to run ##
nextflow run main.nf -profile slurm,singularity --accession OSD-574 -resume -with-tower


## Add a time-stamp at the end of the job then calculate how long the job took to run in seconds, minutes, and hours ##
Expand Down
2 changes: 1 addition & 1 deletion Metagenomics/Illumina/Workflow_Documentation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
|Pipeline Version|Current Workflow Version (for respective pipeline version)|Nextflow Version|
|:---------------|:---------------------------------------------------------|:---------------|
|*[GL-DPPD-7107-A.md](../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107-A.md)|[NF_MGIllumina-A_1.0.0](NF_MGIllumina-A)|23.10.1|
|[GL-DPPD-7107.md](../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107.md)|[SW_MGIllumina_2.0.4](SW_MGIllumina)|N/A (Snakemake vXXXX)|
|[GL-DPPD-7107.md](../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107.md)|[SW_MGIllumina_2.0.4](SW_MGIllumina)|N/A (Snakemake v7.26.0)|


*Current GeneLab Pipeline/Workflow Implementation
Expand Down

0 comments on commit f34870c

Please sign in to comment.