Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Validation Error - tutorial full #332

Open
husamia opened this issue May 27, 2022 · 3 comments
Open

Docker Validation Error - tutorial full #332

husamia opened this issue May 27, 2022 · 3 comments
Assignees
Labels

Comments

@husamia
Copy link

husamia commented May 27, 2022

Describe the bug
There is an error running the workflow as it is.

To Reproduce
Built the docker image using Docker file provided with no errors. Downloaded the tutorial data and references needed per tutorial full. Then executed the tutorial workflow

root@7aa05bbe492b:/app/mavis# snakemake --jobs 1 --configfile=tests/full-tutorial.config.json

Expected behavior
I expected the results to be with no errors, however, I got error messages and the workflow didn't complete.

Input Data
Tutorial data per documentation

Configuration
Docker

Versions (please complete the following information):

  • OS:
  • Python
  • MAVIS Version 3.0.3
  • Blat/BWA Version

Additional context
I want to make sure I get the workflow running correctly with the tutorial data.

root@7aa05bbe492b:/app/mavis# snakemake --jobs 1 --configfile=tests/full-tutorial.config.json
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Singularity containers: ignored
Job stats:
job            count    min threads    max threads
-----------  -------  -------------  -------------
all                1              1              1
annotate         127              1              1
cluster            3              1              1
convert            5              1              1
copy_config        1              1              1
init_config        1              1              1
pairing            1              1              1
summary            1              1              1
validate         127              1              1
total            267              1              1

Select jobs to execute...

[Thu May 26 20:23:57 2022]
rule copy_config:
    output: output_dir_full/config.raw.json
    log: output_dir_full/logs/copy_config.snakemake.log.txt
    jobid: 8
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=4000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:23:57 2022]
Finished job 8.
1 of 267 steps (0.4%) done
Select jobs to execute...

[Thu May 26 20:23:57 2022]
rule init_config:
    input: output_dir_full/config.raw.json
    output: output_dir_full/config.json
    log: output_dir_full/logs/init_config.snakemake.log.txt
    jobid: 7
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

2022-05-26 20:23:59,479 [INFO] MAVIS: 3.0.0
2022-05-26 20:23:59,479 [INFO] hostname: 7aa05bbe492b
2022-05-26 20:23:59,479 [INFO] arguments
2022-05-26 20:23:59,479 [INFO]  command= 'setup'
2022-05-26 20:23:59,479 [INFO]  config= '/app/mavis/output_dir_full/config.raw.json'
2022-05-26 20:23:59,479 [INFO]  log= None
2022-05-26 20:23:59,479 [INFO]  log_level= 'INFO'
2022-05-26 20:23:59,479 [INFO]  outputfile= 'output_dir_full/config.json'
2022-05-26 20:24:00,826 [INFO] loading: ['/app/mavis/reference_inputs/ensembl69_hg19_annotations.v3.json']
2022-05-26 20:32:09,562 [WARNING] Skipped 19031 translations where the CDS length was not a multiple of 3
2022-05-26 20:32:09,562 [WARNING] Skipped 68 domains due to errors (coordinates defined outside the translated region)
2022-05-26 20:32:17,705 [INFO] writing: output_dir_full/config.json
2022-05-26 20:32:17,706 [INFO] run time (hh/mm/ss): 0:08:18
2022-05-26 20:32:17,706 [INFO] run time (s): 498
[Thu May 26 20:33:02 2022]
Finished job 7.
2 of 267 steps (1%) done
Select jobs to execute...

[Thu May 26 20:33:02 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/defuse.tab
    log: output_dir_full/logs/convert.snakemake.defuse.log.txt
    jobid: 139
    wildcards: alias=defuse
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:04 2022]
Finished job 139.
3 of 267 steps (1%) done
Select jobs to execute...

[Thu May 26 20:33:04 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/breakdancer.tab
    log: output_dir_full/logs/convert.snakemake.breakdancer.log.txt
    jobid: 6
    wildcards: alias=breakdancer
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:15 2022]
Finished job 6.
4 of 267 steps (1%) done
Select jobs to execute...

[Thu May 26 20:33:15 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/breakseq.tab
    log: output_dir_full/logs/convert.snakemake.breakseq.log.txt
    jobid: 9
    wildcards: alias=breakseq
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:18 2022]
Finished job 9.
5 of 267 steps (2%) done
Select jobs to execute...

[Thu May 26 20:33:18 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/manta.tab
    log: output_dir_full/logs/convert.snakemake.manta.log.txt
    jobid: 10
    wildcards: alias=manta
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:33:27 2022]
Finished job 10.
6 of 267 steps (2%) done
Select jobs to execute...

[Thu May 26 20:33:27 2022]
rule cluster:
    input: output_dir_full/converted_outputs/breakdancer.tab, output_dir_full/converted_outputs/breakseq.tab, output_dir_full/converted_outputs/manta.tab, output_dir_full/config.json
    output: output_dir_full/L1522785992-normal/cluster
    log: output_dir_full/logs/snakemake.cluster.L1522785992-normal.log.txt
    jobid: 5
    wildcards: library=L1522785992-normal
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:42:24 2022]
Finished job 5.
7 of 267 steps (3%) done
Select jobs to execute...

[Thu May 26 20:42:24 2022]
rule cluster:
    input: output_dir_full/converted_outputs/breakdancer.tab, output_dir_full/converted_outputs/breakseq.tab, output_dir_full/converted_outputs/manta.tab, output_dir_full/config.json
    output: output_dir_full/L1522785992-tumour/cluster
    log: output_dir_full/logs/snakemake.cluster.L1522785992-tumour.log.txt
    jobid: 142
    wildcards: library=L1522785992-tumour
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:51:24 2022]
Finished job 142.
8 of 267 steps (3%) done
Select jobs to execute...

[Thu May 26 20:51:24 2022]
rule convert:
    input: output_dir_full/config.json
    output: output_dir_full/converted_outputs/chimerascan.tab
    log: output_dir_full/logs/convert.snakemake.chimerascan.log.txt
    jobid: 138
    wildcards: alias=chimerascan
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 20:51:26 2022]
Finished job 138.
9 of 267 steps (3%) done
Select jobs to execute...

[Thu May 26 20:51:26 2022]
rule cluster:
    input: output_dir_full/converted_outputs/chimerascan.tab, output_dir_full/converted_outputs/defuse.tab, output_dir_full/config.json
    output: output_dir_full/L1522785992-trans/cluster
    log: output_dir_full/logs/snakemake.cluster.L1522785992-trans.log.txt
    jobid: 137
    wildcards: library=L1522785992-trans
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=16000, cpus=1, log_dir=output_dir_full/logs

[Thu May 26 21:00:06 2022]
Finished job 137.
10 of 267 steps (4%) done
Select jobs to execute...

[Thu May 26 21:00:06 2022]
rule validate:
    input: output_dir_full/L1522785992-normal/cluster
    output: output_dir_full/L1522785992-normal/validate/batch-39/validation-passed.tab
    log: output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt
    jobid: 86
    wildcards: library=L1522785992-normal, job_id=39
    resources: tmpdir=/tmp, time_limit=57600, mem_mb=18000, cpus=2, log_dir=output_dir_full/logs

[Thu May 26 21:00:43 2022]
Error in rule validate:
    jobid: 86
    output: output_dir_full/L1522785992-normal/validate/batch-39/validation-passed.tab
    log: output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt (check log file(s) for error message)
    shell:
        mavis validate --config output_dir_full/config.json --library L1522785992-normal --inputs output_dir_full/L1522785992-normal/cluster/batch-39.tab --output output_dir_full/L1522785992-normal/validate/batch-39 &> output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /app/mavis/.snakemake/log/2022-05-26T202356.690216.snakemake.log
@husamia husamia added the bug label May 27, 2022
@creisle
Copy link
Member

creisle commented May 27, 2022

@husamia Could you post the contents of the log file for the job that failed?
log: output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt (check log file(s) for error message)

Built the docker image using Docker file provided with no errors.

You should not need to build anything. To use the mavis docker image with snakemake you will need to use singularity flag --use-singularity

@husamia
Copy link
Author

husamia commented May 27, 2022

cat output_dir_full/logs/L1522785992-normal.validate.snakemake.batch-39.log.txt

`2022-05-26 21:00:07,691 [INFO] MAVIS: 3.0.0
2022-05-26 21:00:07,692 [INFO] hostname: 7aa05bbe492b
2022-05-26 21:00:07,692 [INFO] arguments
2022-05-26 21:00:07,692 [INFO]  command= 'validate'
2022-05-26 21:00:07,692 [INFO]  config= '/app/mavis/output_dir_full/config.json'
2022-05-26 21:00:07,692 [INFO]  inputs = ['output_dir_full/L1522785992-normal/cluster/batch-39.tab']
2022-05-26 21:00:07,692 [INFO]  library= 'L1522785992-normal'
2022-05-26 21:00:07,692 [INFO]  log= None
2022-05-26 21:00:07,692 [INFO]  log_level= 'INFO'
2022-05-26 21:00:07,692 [INFO]  output= 'output_dir_full/L1522785992-normal/validate/batch-39'
2022-05-26 21:00:07,722 [INFO] creating output directory: 'output_dir_full/L1522785992-normal/validate/batch-39'
2022-05-26 21:00:07,723 [INFO] loading: ['/app/mavis/reference_inputs/hg19.fa']
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 422, in load
    self.content = self.loader(*self.name, **self.opt)
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 256, in load_reference_genome
    for chrom, seq in SeqIO.to_dict(SeqIO.parse(fh, 'fasta')).items():
  File "/usr/local/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 789, in to_dict
    raise ValueError("Duplicate key '%s'" % key)
ValueError: Duplicate key 'chr1'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mavis", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/mavis/main.py", line 295, in main
    raise err
  File "/usr/local/lib/python3.7/site-packages/mavis/main.py", line 237, in main
    library=args.library,
  File "/usr/local/lib/python3.7/site-packages/mavis/validate/main.py", line 54, in main
    reference_genome = ReferenceFile.load_from_config(config, 'reference_genome', eager_load=True)
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 431, in load_from_config
    return ReferenceFile(file_type, *config.get(f'reference.{file_type}', []), **kwargs)
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 384, in __init__
    self.load()
  File "/usr/local/lib/python3.7/site-packages/mavis/annotate/file_io.py", line 426, in load
    raise err.__class__(message)
ValueError: Error in loading files: /app/mavis/reference_inputs/hg19.fa. Duplicate key 'chr1'`

@mattdoug604
Copy link
Member

mattdoug604 commented Jun 8, 2022

Thanks for providing the logs @husamia. We'll try to look into this soon.

@mattdoug604 mattdoug604 removed their assignment Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants