Module/minimap2/1.0 #262

hayashaalan · 2023-03-21T00:04:17Z

Pull Request Checklists

Important: When opening a pull request, keep only the applicable checklist and delete all other sections.

Checklist for New Module

Required

If applicable

I added more granular output subdirectories.
I added rules to the reference_files workflow to generate any new reference files.
I added subdirectories with large intermediate files to the list of scratch_subdirectories in the default.yaml configuration file.
I updated the list of available wildcards for the input files in the default.yaml configuration file.

Checklist for Updated Module

Important! If you are updating the module version, ensure the previous version of the module is restored from master.
If you want to restore a deleted file or directory from the remote master, you can use git checkout origin/master path/to/file,
then a git commit will ensure that file is tracked on your branch again.
Example:

mv modules/strelka/1.1 modules/strelka/1.2
git checkout origin/master modules/strelka/1.1

Kdreval

Thanks @hayashaalan !! Great!! I think my biggest confusion is whether (or not) we need to include the utils here - not clear if you run it with or without that module. I assumed without since it is commented. Maybe I am not right?

modules/minimap2/1.0/config/default.yaml

Kdreval · 2023-03-22T17:11:35Z

modules/minimap2/1.0/config/default.yaml

+        # TODO: Update the list of available wildcards, if applicable
+        inputs:
+            # Available wildcards: {seq_type} {genome_build} {sample_id}
+            sample_fastq: "__UPDATE__"


Are any indexes required for this tool to work? Maybe we should add them here as inputs as well?

Also minimap can work with paired end data - can you think of a way you might be able to adapt this so that it can use paired or unpaired fastq files?

No index files are required to run this tool

Kdreval · 2023-03-22T17:15:42Z

modules/minimap2/1.0/config/default.yaml

+            samtools: "{MODSDIR}/envs/samtools-1.9.yaml"
+
+        threads:
+            minimap2: 4


Just wonder how long this tool takes to run and if we should consider giving it more resources here?

I gave it more resources (10) and now it takes ~24 hours

Kostia's point is fair - it makes sense to set the default here MUCH higher. I think you should make 24 the default.

modules/minimap2/1.0/envs/minimap2-2.24.yaml

modules/minimap2/1.0/minimap2.smk

Kdreval · 2023-03-22T17:30:11Z

modules/minimap2/1.0/minimap2.smk

+    input:
+        bam = CFG["dirs"]["sort_bam"] + "{seq_type}--{genome_build}/{sample_id}.sort.bam",
+        bai = CFG["dirs"]["sort_bam"] + "{seq_type}--{genome_build}/{sample_id}.sort.bam.bai",
+        sorted_bam = str(rules._minimap2_symlink_bam.input.bam)


usually the output of the rule is requested as input of the following rule, not the input like in this case

This is to delete the unsorted bam once the sorted bam is created by the utils module

To be clear - it's not the sorted bam that's being deleted but the un-sorted bam, right?

Duplicate marking isn't necessary for PromethION, but I think it's still an important step for short-read WGS to include in this module. Can you add some rules to complete duplicate marking for short reads? You can use wildcard constraints and an input function to determine which output to symlink depending on the seq_type. (Also, this could be used for capture data in theory, right?)

modules/minimap2/1.0/minimap2.smk

modules/minimap2/CHANGELOG.md

modules/utils/2.1/config/default.yaml

lkhilton · 2023-03-22T18:44:57Z

modules/minimap2/1.0/config/default.yaml

+        # TODO: Update the list of available wildcards, if applicable
+        inputs:
+            # Available wildcards: {seq_type} {genome_build} {sample_id}
+            sample_fastq: "__UPDATE__"


Also minimap can work with paired end data - can you think of a way you might be able to adapt this so that it can use paired or unpaired fastq files?

modules/minimap2/1.0/config/default.yaml

modules/minimap2/1.0/minimap2.smk

lkhilton · 2023-08-22T22:33:55Z

modules/minimap2/1.0/config/default.yaml

        inputs:
            # Available wildcards: {seq_type} {genome_build} {sample_id}
-            sample_fastq: "__UPDATE__"
-            reference_build: "__UPDATE__"
+            sample_fastq:


I think this needs some documentation about the {number} wildcard and how to correctly specify paired and unpaired fastq files.

lkhilton · 2023-08-22T22:36:14Z

modules/minimap2/1.0/config/default.yaml

+            samtools: "{MODSDIR}/envs/samtools-1.9.yaml"
+
+        threads:
+            minimap2: 4


Kostia's point is fair - it makes sense to set the default here MUCH higher. I think you should make 24 the default.

lkhilton · 2023-08-22T22:36:43Z

modules/minimap2/1.0/config/default.yaml

+
+        threads:
+            minimap2: 4
+            samtools: 1


Ditto for samtools - make the default higher please!

lkhilton · 2023-08-22T22:37:05Z

modules/minimap2/1.0/config/default.yaml

+            samtools: 1
+
+        resources:
+            minimap2: 


Is this sufficient memory allocation for a PromethION whole genome?

lkhilton · 2023-08-22T22:37:47Z

modules/minimap2/1.0/envs/minimap2-2.24.yaml

@@ -0,0 +1 @@
+/projects/rmorin/projects/gambl-repos/gambl-hshaalan/src/lcr-modules/envs/minimap2/minimap2-2.24.yaml


Can you please update the symlink to point to the lcr-modules/envs/minimap2 directory to parallel the samtools symlink below?

lkhilton · 2023-08-22T22:40:15Z

modules/minimap2/1.0/minimap2.smk

+        sam = str(rules._minimap2_run.output.sam)
+    output:
+        bam = CFG["dirs"]["minimap2"] + "{seq_type}--{genome_build}/{sample_id}_out.bam",
+        complete = touch(CFG["dirs"]["minimap2"] + "{seq_type}--{genome_build}/{sample_id}_out.bam.complete")


Kostia's point is that writing the .complete file alone isn't enough. For it to have the desired effect (forcing this rule to re-run if it's interrupted/the bam file is incomplete) it has to be an input to a subsequent rule. Can you add the complete file from this rule as an input to the next rule in the module?

lkhilton · 2023-08-22T22:43:19Z

modules/minimap2/1.0/minimap2.smk

+    input:
+        bam = CFG["dirs"]["sort_bam"] + "{seq_type}--{genome_build}/{sample_id}.sort.bam",
+        bai = CFG["dirs"]["sort_bam"] + "{seq_type}--{genome_build}/{sample_id}.sort.bam.bai",
+        sorted_bam = str(rules._minimap2_symlink_bam.input.bam)


To be clear - it's not the sorted bam that's being deleted but the un-sorted bam, right?

Duplicate marking isn't necessary for PromethION, but I think it's still an important step for short-read WGS to include in this module. Can you add some rules to complete duplicate marking for short reads? You can use wildcard constraints and an input function to determine which output to symlink depending on the seq_type. (Also, this could be used for capture data in theory, right?)

Kdreval · 2024-03-16T05:30:55Z

@hayashaalan just checking if you have any updates for this?

Kdreval · 2024-07-26T16:44:27Z

@hayashaalan do you have any updates for this module?

hayashaalan added 2 commits March 14, 2023 17:24

Add initial draft of minimap2 version 1.0

8a550aa

minimap2 module

f0c6e6b

Kdreval reviewed Mar 22, 2023

View reviewed changes

lkhilton reviewed Mar 22, 2023

View reviewed changes

hayashaalan added 2 commits May 3, 2023 10:50

minimap2 changes

b8141c7

add key to config

9158dae

lkhilton requested changes Aug 22, 2023

View reviewed changes

minimap2 updates

24e6c0a

update config

8234f11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module/minimap2/1.0 #262

Module/minimap2/1.0 #262

hayashaalan commented Mar 21, 2023 •

edited by Jacky-Yiu

Loading

Kdreval left a comment

Kdreval Mar 22, 2023

lkhilton Mar 22, 2023

hayashaalan May 3, 2023

Kdreval Mar 22, 2023

hayashaalan May 3, 2023

lkhilton Aug 22, 2023

Kdreval Mar 22, 2023

hayashaalan May 3, 2023

lkhilton Aug 22, 2023

lkhilton Mar 22, 2023

lkhilton Aug 22, 2023

lkhilton Aug 22, 2023

lkhilton Aug 22, 2023

lkhilton Aug 22, 2023

lkhilton Aug 22, 2023

lkhilton Aug 22, 2023

lkhilton Aug 22, 2023

Kdreval commented Mar 16, 2024

Kdreval commented Jul 26, 2024

		@@ -0,0 +1 @@
		/projects/rmorin/projects/gambl-repos/gambl-hshaalan/src/lcr-modules/envs/minimap2/minimap2-2.24.yaml

Module/minimap2/1.0 #262

Are you sure you want to change the base?

Module/minimap2/1.0 #262

Conversation

hayashaalan commented Mar 21, 2023 • edited by Jacky-Yiu Loading

Pull Request Checklists

Checklist for New Module

Required

If applicable

Checklist for Updated Module

Kdreval left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kdreval commented Mar 16, 2024

Kdreval commented Jul 26, 2024

hayashaalan commented Mar 21, 2023 •

edited by Jacky-Yiu

Loading