Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Module/minimap2/1.0 #262

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions envs/minimap2/minimap2-2.24.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
name: minimap2
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- k8=0.2.5
- libgcc-ng=12.2.0
- libgomp=12.2.0
- libstdcxx-ng=12.2.0
- libzlib=1.2.13
- minimap2=2.24
- zlib=1.2.13
prefix: /home/hshaalan/miniconda3/envs/minimap2
15 changes: 11 additions & 4 deletions modules/minimap2/1.0/config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,23 @@ lcr-modules:

minimap2:

# TODO: Update the list of available wildcards, if applicable
inputs:
# Available wildcards: {seq_type} {genome_build} {sample_id}
sample_fastq: "__UPDATE__"
reference_build: "__UPDATE__"
sample_fastq:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs some documentation about the {number} wildcard and how to correctly specify paired and unpaired fastq files.

promethION: "__UPDATE__"
genome: "__UPDATE__"

scratch_subdirectories: []

options:
minimap2: '-ax map-ont -L --MD -Y -R "@RG\tID:{sample_id}\tLB:{sample_id}\tPL:ONT\tSM:{sample_id}"'
minimap2:
# -ax map-ont: aligns long noisy reads (ONT) to a reference genome and outputs it in SAM format
# -L: writes CIGAR with >65535 operators at CG tag. This makes it compatible with older tools
# --MD: outputs the MD tag
# -Y: use soft clipping for supplementary alignments
# -R: SAM read group line in specified format
promethION: '-ax map-ont -L --MD -Y -R "@RG\tID:{sample_id}\tLB:{sample_id}\tPL:ONT\tSM:{sample_id}"'
genome: '-ax sr -L --MD -Y -R "@RG\tID:{sample_id}\tLB:{sample_id}\tPL:Illumina\tSM:{sample_id}"'
samtools: '-bhS'

conda_envs:
Expand Down
16 changes: 0 additions & 16 deletions modules/minimap2/1.0/envs/minimap2-2.24.yaml
hayashaalan marked this conversation as resolved.
Show resolved Hide resolved

This file was deleted.

1 change: 1 addition & 0 deletions modules/minimap2/1.0/envs/minimap2-2.24.yaml
hayashaalan marked this conversation as resolved.
Show resolved Hide resolved
42 changes: 33 additions & 9 deletions modules/minimap2/1.0/minimap2.smk
Original file line number Diff line number Diff line change
Expand Up @@ -37,12 +37,9 @@ if version.parse(current_version) < version.parse(min_oncopipe_version):
CFG = op.setup_module(
name = "minimap2",
version = "1.0",
# TODO: If applicable, add more granular output subdirectories
subdirectories = ["inputs", "minimap2", "sort_bam", "outputs"],
)

#include: "../../utils/2.1/utils.smk"

# Define rules to be run locally when using a compute cluster
localrules:
_minimap2_input_fastq,
Expand All @@ -52,30 +49,55 @@ localrules:

sample_ids_minimap2 = list(CFG['samples']['sample_id'])


##### RULES #####


def _input_fastq(wildcards):
CFG = config["lcr-modules"]["minimap2"]
if wildcards.seq_type == "promethION":
fastqs = CFG["inputs"]["sample_fastq"]["promethION"]
else:
fastqs = CFG["inputs"]["sample_fastq"]["genome"]
return(fastqs)


# Symlinks the input files into the module results directory (under '00-inputs/')
rule _minimap2_input_fastq:
input:
fastq = CFG["inputs"]["sample_fastq"]
fastq = _input_fastq
output:
fastq = CFG["dirs"]["inputs"] + "fastq/{seq_type}/{sample_id}.fastq.gz"
fastq = CFG["dirs"]["inputs"] + "fastq/{seq_type}/{sample_id}.fastq_{number}.gz"
run:
op.absolute_symlink(input.fastq, output.fastq)


def _get_fastq(wildcards):
CFG = config["lcr-modules"]["minimap2"]
if wildcards.seq_type == "promethION":
fastq = expand(str(rules._minimap2_input_fastq.output.fastq), zip,
seq_type = wildcards.seq_type,
sample_id = wildcards.sample_id,
number = "unpaired")
else:
fastq = expand(str(rules._minimap2_input_fastq.output.fastq), zip,
seq_type = wildcards.seq_type,
sample_id = wildcards.sample_id,
number = ["1", "2"])
return(fastq)


rule _minimap2_run:
input:
fastq = str(rules._minimap2_input_fastq.output.fastq),
fastq = _get_fastq,
fasta = reference_files("genomes/{genome_build}/genome_fasta/genome.fa")
output:
sam = pipe(CFG["dirs"]["minimap2"] + "{seq_type}--{genome_build}/{sample_id}_out.sam")
lkhilton marked this conversation as resolved.
Show resolved Hide resolved
log:
stdout = CFG["logs"]["minimap2"] + "{seq_type}--{genome_build}/{sample_id}/minimap2.stdout.log",
stderr = CFG["logs"]["minimap2"] + "{seq_type}--{genome_build}/{sample_id}/minimap2.stderr.log"
params:
opts = CFG["options"]["minimap2"]
opts = op.switch_on_wildcard("seq_type", CFG["options"]["minimap2"])
conda:
CFG["conda_envs"]["minimap2"]
threads:
Expand Down Expand Up @@ -118,6 +140,7 @@ rule _minimap2_samtools:
""")


# Create symlink in subdirectory where BAM files will be sorted by the `utils` module
rule _minimap2_symlink_bam:
lkhilton marked this conversation as resolved.
Show resolved Hide resolved
input:
bam = str(rules._minimap2_samtools.output.bam)
Expand All @@ -129,7 +152,8 @@ rule _minimap2_symlink_bam:
op.absolute_symlink(input.bam, output.bam)


# Symlinks the final output files into the module results directory (under '99-outputs/')
# This rule will trigger the `utils` rule for sorting the BAM file
# By this point, the sorted BAM file exists, so this rule deletes the original BAM file
rule _minimap2_output_bam:
input:
bam = CFG["dirs"]["sort_bam"] + "{seq_type}--{genome_build}/{sample_id}.sort.bam",
lkhilton marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -156,7 +180,7 @@ rule _minimap2_all:
],
zip, # Run expand() with zip(), not product()
seq_type=CFG["samples"]["seq_type"],
genome_build=CFG["inputs"]["reference_build"],
genome_build=CFG["samples"]["genome_build"],
sample_id=CFG["samples"]["sample_id"])


Expand Down
5 changes: 4 additions & 1 deletion modules/minimap2/CHANGELOG.md
hayashaalan marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,7 @@ This release was authored by Haya Shaalan.

<!-- TODO: Explain each important module design decision below. -->

- No module design decisions explained here yet.
- This module can take paired and unpaired fastq files.
- Performs short and long read alignment based on {seq_type}. Parameters are switched and can be configured through the config.
- Uses the utils module and writes the outputs to 99-outputs.
- Final output is a bam file with naming format: bam/{seq_type}--{genome_build}/{sample_id}.bam.
2 changes: 1 addition & 1 deletion modules/utils/2.1/config/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ lcr-modules:
## Lines commented out with `#?` can optionally be user-configured
## Lines commented out with `##` act as regular comments

paired_modules: ["bwa_mem", "star", minimap2]
paired_modules: ["bwa_mem", "star"]
## See main README.md for how to set `samples` in the Snakefile
#! samples: null

Expand Down