Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with the VEP ANNOTATE snakemake wrapper #167

Open
moldach opened this issue Aug 31, 2020 · 1 comment
Open

Problem with the VEP ANNOTATE snakemake wrapper #167

moldach opened this issue Aug 31, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@moldach
Copy link

moldach commented Aug 31, 2020

Snakemake version

Snakemake

5.23.0

Wrapper

Issue

I'm trying to adapt my regular VEP code to use the snakemake wrapper instead but am running into an issue.

I want to make sure that a) the wrapper works for me and b) it produces the same results as the following:

vep \
        -i {input.sample} \
        --species "caenorhabditis_elegans" \
        --format "vcf" \
        --everything \
        --offline \
        --force_overwrite \
        --fasta {input.ref} \
        --gff {input.annot} \
        --tab \
        --variant_class \
        --regulatory \
        --show_ref_allele \
        --numbers \
        --symbol \
        --protein \
        -o {params.sample}

In order to use VEP with wrappers there are 3 different rules.

I have got the following two working:

# VEP Download Plugins
rule download_vep_plugins:
    output:
        directory("resources/vep/plugins")
    params:
        release=100
    wrapper:
        "0.64.0/bio/vep/plugins"

# VEP Cache
rule get_vep_cache:
    output:
        directory("resources/vep/cache")
    params:
        species="caenorhabditis_elegans",
        build="WBcel235",
        release="100"
    log:
        "logs/vep/cache.log"
    wrapper:
        "0.64.0/bio/vep/cache"

The third rule is to actually run VEP for which I've written the following rule:

rule variant_annotation:
    input:
        calls= lambda wildcards: getVCFs(wildcards.sample),
        cache="resources/vep/cache",
        plugins="resources/vep/plugins",
    output:
        calls=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.annotated.vcf"),
        stats=os.path.join(dirs_dict["ANNOT_DIR"],config["ANNOT_TOOL"],"{sample}.html")
    params:
        plugins=["LoFtool"],
        extra="--everything"
    message: """--- Annotating Variants."""
    resources:
        mem = 30000,
        time = 120
    threads: 4
    wrapper:
        "0.64.0/bio/vep/annotate"

When submitting the job this is the error I receive:

Building DAG of jobs...
Using shell: /cvmfs/soft.computecanada.ca/nix/var/nix/profiles/16.09/bin/bash
Provided cores: 4
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       variant_annotation
        1

[Tue Aug 25 11:23:04 2020]
Job 0: --- Annotating Variants.

Activating conda environment: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f
Failed to open VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf: unknown file type
Possible precedence issue with control flow operator at /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 805.
Traceback (most recent call last):
  File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py", line 44, in <module>
    "(bcftools view {snakemake.input.calls} | "
  File "/home/moldach/bin/snakemake/lib/python3.8/site-packages/snakemake/shell.py", line 156, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  (bcftools view VARIANT_CALLING/varscan/470_sorted_dedupped_snp_varscan.vcf | vep --everything --fork 4 --format vcf --vcf --cache --cache_version 100 --species caenorhabditis_elegans --assembly WBcel235 --dir_cache resources/vep/cache --dir_plugins resources/vep/plugins --offline --plugin LoFtool --output_file STDOUT --stats_file ANNOTATION/VEP/470.html | bcftools view -Ov > ANNOTATION/VEP/470.annotated.vcf)' returned non-zero exit status 1.
[Tue Aug 25 11:25:02 2020]
Error in rule variant_annotation:
    jobid: 0
    output: ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
    conda-env: /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f

RuleException:
CalledProcessError in line 393 of /scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile:
Command 'source /home/moldach/miniconda3/bin/activate '/scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/conda/f16fdb5f'; set -euo pipefail;  python /scratch/moldach/MADDOG/VCF-FILES/biostars439754/.snakemake/scripts/tmpm4v6gdij.wrapper.py' returned non-zero exit status 1.
  File "/scratch/moldach/MADDOG/VCF-FILES/biostars439754/Snakefile", line 393, in __rule_variant_annotation
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2017/Core/python/3.8.0/lib/python3.8/concurrent/futures/thread.py", line 57, in run
Removing output files of failed job variant_annotation since they might be corrupted:
ANNOTATION/VEP/470.annotated.vcf, ANNOTATION/VEP/470.html
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

This issue was originally brought up on StackOverflow, then on the ensembl-vep repo but seems it should be posted here for a definitive answer.

@moldach moldach added the bug Something isn't working label Aug 31, 2020
@moldach
Copy link
Author

moldach commented Sep 16, 2020

bcftools view is giving the warning from the output of samtools mpileup/varscan pileup2snp:

def getDeduppedBamsIndex(sample):
  return(list(os.path.join(aligns_dict[sample],"{0}.sorted.dedupped.bam.bai".format(sample,pair)) for pair in ['']))

rule mpilup:
    input:
	bam=lambda wildcards: getDeduppedBams(wildcards.sample),
        reference_genome=os.path.join(dirs_dict["REF_DIR"],config["REF_GENOME"])
    output:
	os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_{contig}.mpileup.gz"),
    log:
        os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"{sample}_{contig}_samtools_mpileup.log")
    params:
        extra=lambda wc: "-r {}".format(wc.contig)
    resources:
	mem = 1000,
        time = 30
    wrapper:
	"0.65.0/bio/samtools/mpileup"

rule mpileup_to_vcf:
    input:
	os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_{contig}.mpileup.gz"),
    output:
	os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_{contig}.vcf")
    message:
	"Calling SNP with Varscan2"
    threads:
	2 # Keep threading value to one for unzipped mpileup input
          # Set it to two for zipped mipileup files
    log:
        os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"varscan_{sample}_{contig}.log")
    resources:
	mem = 1000,
        time = 30
    wrapper:
	"0.65.0/bio/varscan/mpileup2snp"

rule vcf_merge:
    input:
	os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_I.vcf"),
        os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_II.vcf"),
        os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_III.vcf"),
        os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_IV.vcf"),
        os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_V.vcf"),
        os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_X.vcf"),
        os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}_MtDNA.vcf")
    output:
	os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"],"{sample}.vcf")
    log: os.path.join(dirs_dict["LOG_DIR"],config["CALLING_TOOL"],"{sample}_vcf-merge.log")
    resources:
	mem = 1000,
        time = 10
    threads: 1
    message: """--- Merge VarScan by Chromosome."""
    shell: """
	awk 'FNR==1 && NR!=1 {{ while (/^<header>/) getline; }} 1 {{print}} ' {input} > {output}
        """

calling_dir = os.path.join(dirs_dict["CALLING_DIR"],config["CALLING_TOOL"])
callings_locations = [calling_dir] * len_samples
callings_dict = dict(zip(sample_names, callings_locations))

def getVCFs(sample):
  return(list(os.path.join(callings_dict[sample],"{0}.vcf".format(sample,pair)) for pair in ['']))

rule annotate_variants:
    input:
	calls=lambda wildcards: getVCFs(wildcards.sample),
        cache="resources/vep/cache",
        plugins="resources/vep/plugins",
    output:
	calls="{sample}.annotated.vcf",
        stats="{sample}.html"
    params:
	# Pass a list of plugins to use, see https://www.ensembl.org/info/docs/tools/vep/script/vep_plugins.html
        # Plugin args can be added as well, e.g. via an entry "MyPlugin,1,FOO", see docs.
        plugins=["LoFtool"],
        extra="--everything"  # optional: extra arguments
    log:
        "logs/vep/{sample}.log"
    threads: 4
    resources:
	time=30,
        mem=5000
    wrapper:
	"0.65.0/bio/vep/annotate"

If I run bcftools view on the output I get the error (same for any of the contigs as well so its not the rule merge_vcf) causing the issue:

$ bcftools view variant_calling/varscan/MTG324.vcf 
Failed to read from variant_calling/varscan/MTG324.vcf: unknown file type

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant