Merge pull request #127 from olabiyi/DEV_Metagenomics_Illumina_NF_con…

…version Metagenomics Illumina Nextflow workflow: Edited README files and accession parameter
nasa · Oct 23, 2024 · f34870c · f34870c
2 parents cfdd9a7 + da5d9db
commit f34870c
Show file tree

Hide file tree

Showing 5 changed files with 32 additions and 14 deletions.
diff --git a/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/README.md b/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/README.md
@@ -118,7 +118,7 @@ nextflow run main.nf --help
 #### 4a. Approach 1: Run slurm jobs in singularity containers with OSD accession as input
 
 ```bash
-nextflow run main.nf -resume -profile slurm,singularity --GLDS_accession OSD-574
+nextflow run main.nf -resume -profile slurm,singularity --accession OSD-574
 ```
 
 <br>
@@ -149,9 +149,9 @@ nextflow run main.nf -resume -profile conda --csv_file SE_file.csv --conda.qc <p
 
 *Required only if you would like to pull and process data directly from OSDR*
 
-* `--GLDS_accession` – A Genelab / OSD accession number e.g. OSD-574.
+* `--accession` – A Genelab / OSD accession number e.g. OSD-574.
 
-*Required only if --GLDS_accession is not passed as an argument*
+*Required only if --accession is not passed as an argument*
 
 * `--csv_file` –  A single-end or paired-end input csv file containing assay metadata for each sample, including sample_id, forward, reverse, and/or paired. Please see the sample [SE_file.csv](workflow_code/SE_file.csv) and [PE_file.csv](workflow_code/PE_file.csv) in this repository for examples on how to format this file.
 
@@ -204,4 +204,21 @@ To generate a README file, a protocols file, a md5sums table and a file associat
 nextflow -C post_processing.config run post_processng.nf -resume -profile slurm,singularity
 ``` 
 
-The outputs of the run will be in a directory called `Post_Processing` by default.
+The outputs of the run will be in a directory called `Post_Processing` by default and they are as follows:
+
+ - Post_processing/FastQC_Outputs/filtered_multiqc_GLmetagenomics_report.zip (Filtered sequence multiqc report with paths purged) 
+
+ - Post_processing/FastQC_Outputs/raw_multiqc_GLmetagenomics_report.zip (Raw sequence multiqc report with paths purged)
+
+ - Post_processing/<GLDS_accession>_-associated-file-names.tsv (File association table for curation)
+
+ - Post_processing/<GLDS_accession>_metagenomics-validation.log (Automatic verification and validation log file)
+
+ - Post_processing/processed_md5sum_GLmetagenomics.tsv (md5sums for the files to be released on OSDR)
+
+ - Post_processing/processing_info_GLmetagenomics.zip  (Zip file containing all files used to run the workflow and required logs with paths purged) 
+
+ - Post_processing/protocol.txt  (File describing the methods used by the workflow)
+
+ - Post_processing/README_GLmetagenomics.txt (README file listing and describing the outputs of the workflow)
+
diff --git a/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/main.nf b/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/main.nf
@@ -23,7 +23,7 @@ if (params.help) {
   println("   > nextflow run main.nf -resume -profile slurm,conda --csv_file SE_file.csv")
   println()
   println("Example 3: Run jobs locally in conda environments, supply a GLDS accession, and specify the path to an existing conda environment.")
-  println("   > nextflow run main.nf -resume -profile conda --GLDS_accession OSD-574 --conda.qc <path/to/existing/conda/environment>")
+  println("   > nextflow run main.nf -resume -profile conda --accession OSD-574 --conda.qc <path/to/existing/conda/environment>")
   println()
   println("Required arguments:")
   println("""-profile [STRING] Specifies the profile to be used to run the workflow. Options are [slurm, singularity, docker, and  conda].
@@ -86,7 +86,7 @@ if (params.help) {
   println("      --read_based_dir [PATH] Read-based analysis outputs directory.  Default: ../Read-based_Processing/.")
   println()
   println("Genelab specific arguements:")
-  println("      --GLDS_accession [STRING]  A Genelab accession number if the --csv_file parameter is not set. If this parameter is set, it will ignore the --csv_file parameter.")
+  println("      --accession [STRING]  A Genelab accession number if the --csv_file parameter is not set. If this parameter is set, it will ignore the --csv_file parameter.")
   println("      --RawFilePattern [STRING]  If we do not want to download all files (which we often won't), we can specify a pattern here to subset the total files.")
   println("                                 For example, if we know we want to download just the fastq.gz files, we can say 'fastq.gz'. We can also provide multiple patterns")
   println("                                 as a comma-separated list. For example, If we want to download the fastq.gz files that also have 'NxtaFlex', 'metagenomics', and 'raw' in") 
@@ -145,7 +145,7 @@ log.info """
          You have set the following parameters:
          Profile: ${workflow.profile} 
          Input csv file : ${params.csv_file}
-         GLDS Accession : ${params.GLDS_accession}
+         GLDS or OSD Accession : ${params.accession}
          GLDS Raw File Pattern: ${params.RawFilePattern}         
          Workflow : ${params.workflow}
          Nextflow Directory publishing mode: ${params.publishDir_mode}
@@ -317,9 +317,9 @@ workflow {
      // Software Version Capturing - runsheet
      software_versions_ch = Channel.empty()
      // Parse file input
-       if(params.GLDS_accession){
+       if(params.accession){
 
-       GET_RUNSHEET(params.GLDS_accession)
+       GET_RUNSHEET(params.accession)
        GET_RUNSHEET.out.input_file
            .splitCsv(header:true)
            .set{file_ch}

diff --git a/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/nextflow.config b/Metagenomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/nextflow.config
@@ -113,7 +113,7 @@ params {
           checkm       = null          // "/path/to/envs/checkm"
           }
 
-    GLDS_accession = false // GLDS or OSD acession number for the data to be processed
+    accession = false // GLDS or OSD acession number for the data to be processed
     // Pattern of files on OSDR for the GLDS_accession you want to process.
     RawFilePattern = null // "_metaG", "_HRremoved"
     errorStrategy = "terminate"

diff --git a/...genomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/slurm_submit.slurm b/...genomics/Illumina/Workflow_Documentation/NF_MGIllumina-A/workflow_code/slurm_submit.slurm
@@ -26,7 +26,6 @@ echo $HOSTNAME
 
 ## Activate the conda environemnt containing the tools you need to run your job ##
 ## You can see a list of all available environments by running the command: conda env list ##
-## If you need a conda envrionment installed request it using JIRA ##
 
 source activate /path/to/envs/nextflow ## Replace conda_env_name with the name of the conda environment with nextflow installed ##
 
@@ -40,8 +39,10 @@ echo ""
 
 ## The command(s) that you want to run in this slurm job ##
 export NXF_SINGULARITY_CACHEDIR=singularity/
-#nextflow run main.nf -profile slurm,singularity -resume --csv_file PE_file.csv ## Replace command with the command(s) you want to run ##
-nextflow run main.nf -profile slurm,singularity --GLDS_accession OSD-574 -resume 
+export TOWER_ACCESS_TOKEN=<ACCESS_TOKEN>
+export TOWER_WORKSPACE_ID=<WORKSPACE_ID>
+#nextflow run main.nf -profile slurm,singularity -resume --csv_file PE_file.csv -with-tower ## Replace command with the command(s) you want to run ##
+nextflow run main.nf -profile slurm,singularity --accession OSD-574 -resume -with-tower
 
 
 ## Add a time-stamp at the end of the job then calculate how long the job took to run in seconds, minutes, and hours ##

diff --git a/Metagenomics/Illumina/Workflow_Documentation/README.md b/Metagenomics/Illumina/Workflow_Documentation/README.md
@@ -7,7 +7,7 @@
 |Pipeline Version|Current Workflow Version (for respective pipeline version)|Nextflow Version| 
 |:---------------|:---------------------------------------------------------|:---------------|
 |*[GL-DPPD-7107-A.md](../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107-A.md)|[NF_MGIllumina-A_1.0.0](NF_MGIllumina-A)|23.10.1|
-|[GL-DPPD-7107.md](../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107.md)|[SW_MGIllumina_2.0.4](SW_MGIllumina)|N/A (Snakemake vXXXX)|
+|[GL-DPPD-7107.md](../Pipeline_GL-DPPD-7107_Versions/GL-DPPD-7107.md)|[SW_MGIllumina_2.0.4](SW_MGIllumina)|N/A (Snakemake v7.26.0)|
 
 
 *Current GeneLab Pipeline/Workflow Implementation