Skip to content

Commit

Permalink
Merge pull request nf-core#440 from genomic-medicine-sweden/add-retro…
Browse files Browse the repository at this point in the history
…seq-to-pipeline

Add mobile element calling to raredisease
  • Loading branch information
ramprasadn authored Jan 15, 2024
2 parents 2dbd6ea + 967b7de commit 4158741
Show file tree
Hide file tree
Showing 18 changed files with 513 additions and 14 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- ngsbits samplegender to check sex [#453](https://github.com/nf-core/raredisease/pull/453)
- New workflow for generating cgh files from SV vcfs for interpretation in the CytosSure interpretation software. Turned off by default [#456](https://github.com/nf-core/raredisease/pull/456/)
- Fastp to do adapter trimming. It can be skipped using `--skip_fastp` [#457](https://github.com/nf-core/raredisease/pull/457)
- New workflow for calling insertion of mobile elements [#440](https://github.com/nf-core/raredisease/pull/440)
- GATK CNVCaller uses segments instead of intervals, filters out "reference" segments between the calls, and fixes a bug with how `ch_readcount_intervals` was handled [#472](https://github.com/nf-core/raredisease/pull/472)
- bwa aligner [#474](https://github.com/nf-core/raredisease/pull/474)
- Add FOUND_IN tag, which mentions the variant caller that found the mutation, in the INFO column of the vcf files [#471](https://github.com/nf-core/raredisease/pull/471)
Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,10 @@

> Okonechnikov K, Conesa A, García-Alcalde F. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics. 2016;32(2):292-294. doi:10.1093/bioinformatics/btv566
- [RetroSeq](https://academic.oup.com/bioinformatics/article/29/3/389/257479)

> Thomas M. Keane, Kim Wong, David J. Adams, RetroSeq: transposable element discovery from next-generation sequencing data. Bioinformatics.2013 Feb 1;29(3):389-90. doi: 10.1093/bioinformatics/bts697
- [rhocall](https://github.com/dnil/rhocall)

- [Sentieon DNAscope](https://www.biorxiv.org/content/10.1101/2022.05.20.492556v1.abstract)
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,11 @@ On release, automated continuous integration tests run the pipeline on a full-si
- [Expansion Hunter](https://github.com/Illumina/ExpansionHunter)
- [Stranger](https://github.com/Clinical-Genomics/stranger)

**9. Rank variants - SV and SNV:**
**9. Variant calling - mobile elements:**

- [RetroSeq](https://github.com/tk2/RetroSeq)

**10. Rank variants - SV and SNV:**

- [GENMOD](https://github.com/Clinical-Genomics/genmod)

Expand Down
26 changes: 26 additions & 0 deletions assets/mobile_element_references_schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
{
"$schema": "http://json-schema.org/draft-07/schema",
"$id": "https://raw.githubusercontent.com/nf-core/raredisease/master/assets/mobile_element_references_schema.json",
"title": "Schema for mobile_element_references",
"description": "Schema for the file provided with params.mobile_element_references",
"type": "array",
"items": {
"type": "object",
"properties": {
"type": {
"type": "string",
"exists": true,
"pattern": "^\\S+$",
"errorMessage": "Mobile element type must be provided and cannot contain spaces"
},
"path": {
"type": "string",
"format": "file-path",
"exists": true,
"pattern": "^\\S+\\.bed$",
"errorMessage": "Bed file, cannot contain spaces and must have extension '.bed'"
}
},
"required": ["type", "path"]
}
}
73 changes: 73 additions & 0 deletions conf/modules/call_mobile_elements.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Config file for defining DSL2 per module options and publishing paths
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Available keys to override module options:
ext.args = Additional arguments appended to command in module.
ext.args2 = Second set of arguments appended to command in module (multi-tool modules).
ext.args3 = Third set of arguments appended to command in module (multi-tool modules).
ext.prefix = File name prefix for output files.
ext.when = Conditional clause
----------------------------------------------------------------------------------------
*/

process {

withName: '.*CALL_MOBILE_ELEMENTS:.*' {
publishDir = [
enabled: false
]
}

withName: '.*CALL_MOBILE_ELEMENTS:ME_SPLIT_ALIGNMENT' {
ext.args = { [
'--output-fmt bam',
'--fetch-pairs'
].join(' ') }
ext.args2 = { "${meta.interval}" }
ext.prefix = { "${meta.id}_${meta.interval}" }
}

withName: '.*CALL_MOBILE_ELEMENTS:RETROSEQ_DISCOVER' {
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_discover" }
}

withName: '.*CALL_MOBILE_ELEMENTS:RETROSEQ_CALL' {
ext.args = { '--soft' }
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_call" }
}

withName: '.*CALL_MOBILE_ELEMENTS:BCFTOOLS_REHEADER_ME' {
ext.args2 = { '--output-type v' }
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_reheader" }
}

withName: '.*CALL_MOBILE_ELEMENTS:BCFTOOLS_SORT_ME' {
ext.args = { '--output-type z' }
ext.prefix = { "${meta.id}_${meta.interval}_retroseq_sort" }
}

withName: '.*CALL_MOBILE_ELEMENTS:BCFTOOLS_CONCAT_ME' {
ext.args = { '--output-type z --allow-overlaps' }
ext.prefix = { "${meta.id}_mobile_elements" }
}

withName: '.*CALL_MOBILE_ELEMENTS:SVDB_MERGE_ME' {
ext.args = { '--bnd_distance 150 --overlap 0.5' }
ext.prefix = { "${meta.id}_mobile_elements" }
publishDir = [
path: { "${params.outdir}/call_mobile_elements" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: '.*CALL_MOBILE_ELEMENTS:TABIX_ME' {
publishDir = [
path: { "${params.outdir}/call_mobile_elements" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

}
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ params {

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta"
fai = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta.fai"
genome = 'GRCh37'
gnomad_af = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gnomad_reformated.tab.gz"
intervals_wgs = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/target_wgs.interval_list"
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand Down
2 changes: 2 additions & 0 deletions conf/test_one_sample.config
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,13 @@ params {

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta"
fai = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta.fai"
genome = 'GRCh37'
gnomad_af = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gnomad_reformated.tab.gz"
intervals_wgs = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/target_wgs.interval_list"
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_mt = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand Down
2 changes: 2 additions & 0 deletions conf/test_sentieon.config
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,13 @@ params {

// Genome references
fasta = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta"
fai = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reference.fasta.fai"
genome = 'GRCh37'
gnomad_af = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/gnomad_reformated.tab.gz"
intervals_wgs = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/target_wgs.interval_list"
intervals_y = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/targetY.interval_list"
known_dbsnp = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/dbsnp_-138-.vcf.gz"
mobile_element_references = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/mobile_element_references.tsv"
ml_model = "https://s3.amazonaws.com/sentieon-release/other/SentieonDNAscopeModel1.0.model"
reduced_penetrance = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/reduced_penetrance.tsv"
score_config_snv = "https://raw.githubusercontent.com/nf-core/test-datasets/raredisease/reference/rank_model_snv.ini"
Expand Down
25 changes: 13 additions & 12 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,18 +153,19 @@ The mandatory and optional parameters for each category are tabulated below.

| Mandatory | Optional |
| ------------------------------ | ------------------------------ |
| aligner<sup>1</sup> | fasta_fai<sup>3</sup> |
| fasta | bwamem2<sup>3</sup> |
| platform | bwa<sup>3</sup> |
| mito_name/mt_fasta<sup>2</sup> | known_dbsnp<sup>4</sup> |
| | known_dbsnp_tbi<sup>4</sup> |
| | min_trimmed_length<sup>5</sup> |

<sup>1</sup>Default value is bwamem2, but if you have a valid license for Sentieon, you have the option to use Sentieon as well.<br />
<sup>2</sup>f If mito_name is provided, mt_fasta can be generated by the pipeline.<br />
<sup>3</sup>fasta_fai, bwa, and bwamem2, if not provided by the user, will be generated by the pipeline when necessary.<br />
<sup>4</sup>Used only by Sentieon.<br />
<sup>5</sup>Default value is 40. Used only by fastp.<br />
| aligner<sup>1</sup> | fasta_fai<sup>4</sup> |
| fasta<sup>2</sup> | bwamem2<sup>4</sup> |
| platform | bwa<sup>4</sup> |
| mito_name/mt_fasta<sup>3</sup> | known_dbsnp<sup>5</sup> |
| | known_dbsnp_tbi<sup>5</sup> |
| | min_trimmed_length<sup>6</sup> |

<sup>1</sup>Default value is bwamem2. Other alternatives are bwa and sentieon (requires valid Sentieon license ).<br />
<sup>2</sup>Analysis set reference genome in fasta format, first 25 contigs need to be chromosome 1-22, X, Y and the mitochondria.<br />
<sup>3</sup>f If mito_name is provided, mt_fasta can be generated by the pipeline.<br />
<sup>4</sup>fasta_fai, bwa and bwamem2, if not provided by the user, will be generated by the pipeline when necessary.<br />
<sup>5</sup>Used only by Sentieon.<br />
<sup>6</sup>Default value is 40. Used only by fastp.<br />

##### 2. QC stats from the alignment files

Expand Down
2 changes: 1 addition & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ nextflow.enable.dsl = 2
GENOME PARAMETER VALUES
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

params.fasta = WorkflowMain.getGenomeAttribute(params, 'fasta')
params.fai = WorkflowMain.getGenomeAttribute(params, 'fai')
params.bwa = WorkflowMain.getGenomeAttribute(params, 'bwa')
Expand All @@ -33,6 +32,7 @@ params.intervals_wgs = WorkflowMain.getGenomeAttribute(params,
params.intervals_y = WorkflowMain.getGenomeAttribute(params, 'intervals_y')
params.known_dbsnp = WorkflowMain.getGenomeAttribute(params, 'known_dbsnp')
params.known_dbsnp_tbi = WorkflowMain.getGenomeAttribute(params, 'known_dbsnp_tbi')
params.mobile_element_references = WorkflowMain.getGenomeAttribute(params, 'mobile_element_references')
params.ml_model = WorkflowMain.getGenomeAttribute(params, 'ml_model')
params.mt_fasta = WorkflowMain.getGenomeAttribute(params, 'mt_fasta')
params.ngsbits_samplegender_method = WorkflowMain.getGenomeAttribute(params, 'ngsbits_samplegender_method')
Expand Down
54 changes: 54 additions & 0 deletions modules/local/retroseq/call/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
process RETROSEQ_CALL {
tag "$meta.id"
label 'process_low'

conda "bioconda::perl-retroseq=1.5=pl5321hdfd78af_1"
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1' : 'docker.io/clinicalgenomics/retroseq:1.5_9d4f3b5-1' }"


input:
tuple val(meta), path(tab), path(bam), path(bai)
tuple val(meta2), path(fasta)
tuple val(meta3), path(fai)

output:
tuple val(meta), path("*.vcf"), emit: vcf
path "versions.yml" , emit: versions

when:
task.ext.when == null || task.ext.when

script:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def VERSION = "1.5"

"""
retroseq.pl \\
-call \\
$args \\
-bam $bam \\
-input $tab \\
-ref $fasta \\
-output ${prefix}.vcf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
retroseq_call: $VERSION
END_VERSIONS
"""

stub:
def args = task.ext.args ?: ''
def prefix = task.ext.prefix ?: "${meta.id}"
def VERSION = "1.5"
"""
touch ${prefix}.vcf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
retroseq_call: $VERSION
END_VERSIONS
"""
}
69 changes: 69 additions & 0 deletions modules/local/retroseq/call/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
name: "retroseq_call"
description: RetroSeq is a tool for discovery and genotyping of transposable element variants (TEVs) from next-gen sequencing reads aligned to a reference genome in BAM format.
keywords:
- retroseq
- transposable elements
- genomics
tools:
- "retroseq":
description: "RetroSeq: discovery and genotyping of TEVs from reads in BAM format."
homepage: "https://github.com/tk2/RetroSeq"
documentation: "https://github.com/tk2/RetroSeq"
tool_dev_url: "https://github.com/tk2/RetroSeq"
doi: "10.1093/bioinformatics/bts697"
licence: "['GPL']"

input:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- tab:
type: file
description: Output file from running retroseq -call
pattern: "*.tab"
- bam:
type: file
description: Sorted BAM file
pattern: "*.bam"
- bai:
type: file
description: Index of the sorted BAM file
pattern: "*.bam"
- meta2:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- fasta:
type: file
description: Reference genome in fasta format
pattern: "*.fasta"
- meta3:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- fai:
type: file
description: Reference FASTA index
pattern: "*.fai"

output:
- meta:
type: map
description: |
Groovy Map containing sample information
e.g. `[ id:'test', single_end:false ]`
- versions:
type: file
description: File containing software versions
pattern: "versions.yml"
- vcf:
type: file
description: Output file containing TEVs and their location in the genome.
pattern: "*.vcf"

authors:
- "@peterpru"
Loading

0 comments on commit 4158741

Please sign in to comment.