Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distiller is failing at bin_zoom_library_pairs with new reference genome #137

Open
gibcus opened this issue Mar 29, 2019 · 9 comments
Open

Comments

@gibcus
Copy link

gibcus commented Mar 29, 2019

dis.out contents:

N E X T F L O W ~ version 19.01.0
Launching dekkerlab/distiller-nf [mad_edison] - revision: 5f5b40f [ghpcc]
[warm up] executor > lsf
[76/2f5f3c] Submitted process > local_truncate_chunk_fastqs (library:ICRF-12min-S2-R2__galGal6 run:lane1)
[99/257402] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:03)
[f0/613fcd] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:01)
[80/0ae835] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:05)
[c1/0329d9] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:04)
[a8/268854] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:08)
[a0/efa5f1] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:02)
[11/20e124] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:09)
[c8/7c0943] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:06)
[16/c87ea8] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:07)
[1d/040136] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:11)
[e5/a8a883] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:12)
[e6/7dc0ec] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:10)
[17/3f266e] Submitted process > merge_dedup_splitbam (library:ICRF-12min-S2-R2__galGal6)
[96/a57503] Submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)
[d8/9d8b0c] Submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:mapq_30)
[e0/acfb4b] Submitted process > merge_stats_libraries_into_groups (library_group:ICRF-12m-R2)
[0c/a48dde] Submitted process > merge_stats_libraries_into_groups (library_group:all)
[96/a57503] NOTE: Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1) -- Execution is retried (1)
[42/b06c9f] Re-submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)
[42/b06c9f] NOTE: Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1) -- Execution is retried (2)
[56/efd5ad] Re-submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)
ERROR ~ Error executing process > 'bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)'

Caused by:
Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1)

Command executed:

bgzip -cd -@ 3 ICRF-12min-S2-R2__galGal6.galGal6.nodups.pairs.gz | cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 --assembly galGal6 galGal6.reduced.chrom.sizes:10000 - ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool

cooler zoomify --nproc 12 --out ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.mcool --resolutions 1000000,500000,250000,100000,50000,25000,10000 --balance ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool

Command exit status:
1

Command output:
(empty)

Command error:
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Writing indexes
INFO:cooler.create:Writing info
INFO:cooler.create:Done
INFO:cooler.create:Writing chunk 8: /tmp/tmp6loc8rar.multi.cool::8
INFO:cooler.create:Creating cooler at "/tmp/tmp6loc8rar.multi.cool::/8"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Writing indexes
INFO:cooler.create:Writing info
INFO:cooler.create:Done
INFO:cooler.create:Merging into ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool
INFO:cooler.create:Creating cooler at "ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool::/"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.reduce:nnzs: [0, 0, 0, 0, 0, 0, 0, 0, 0]
INFO:cooler.reduce:current: [0, 0, 0, 0, 0, 0, 0, 0, 0]
Traceback (most recent call last):
File "/miniconda3/bin/cooler", line 11, in
sys.exit(cli())
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/cooler/cli/cload.py", line 476, in pairs
h5opts=h5opts,
File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 670, in create_from_unordered
**kwargs)
File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 565, in create
file_path, target, meta.columns, iterable, h5opts, lock)
File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 204, in write_pixels
for i, chunk in enumerate(iterable):
File "/miniconda3/lib/python3.6/site-packages/cooler/reduce.py", line 162, in iter
ignore_index=True)
File "/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 225, in concat
copy=copy, sort=sort)
File "/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 259, in init
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

Work dir:
/nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6/work/56/efd5ad506debb426aa33922a1b1abe

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details
WARN: Killing pending tasks (1)


Sender: LSF System lsfadmin@c04b04
Subject: Job 2694432: <~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config> in cluster Exited

Job <~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config> was submitted from host by user in cluster at Thu Mar 28 19:49:37 2019.
Job was executed on host(s) <2*c04b04>, in queue , as user in cluster at Thu Mar 28 19:49:37 2019.
</home/jg14w> was used as the home directory.
</nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6> was used as the working directory.
Started at Thu Mar 28 19:49:37 2019.
Terminated at Fri Mar 29 03:55:45 2019.
Results reported at Fri Mar 29 03:55:45 2019.

Your job looked like:


LSBATCH: User input
~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config

Exited with exit code 1.

Resource usage summary:

CPU time :                                   380.92 sec.
Max Memory :                                 1287 MB
Average Memory :                             1045.12 MB
Total Requested Memory :                     16000.00 MB
Delta Memory :                               14713.00 MB
Max Swap :                                   -
Max Processes :                              3
Max Threads :                                91
Run time :                                   29168 sec.
Turnaround time :                            29168 sec.

The output (if any) is above this job summary.

PS:

Read file <dis.err> for stderr output of this job.

Contents of: /nl/umw_job_dekker/cshare/reference/sorted_chromsizes/galGal6.reduced.chrom.sizes:

chr1 197608386
chr2 149682049
chr3 110838418
chr4 91315245
chr5 59809098
chr6 36374701
chr7 36742308
chr8 30219446
chr9 24153086
chr10 21119840
chr11 20200042
chr12 20387278
chr13 19166714
chr14 16219308
chr15 13062184
chr16 2844601
chr17 10762512
chr18 11373140
chr19 10323212
chr20 13897287
chr21 6844979
chr22 5459462
chr23 6149580
chr24 6491222
chr25 3980610
chr26 6055710
chr27 8080432
chr28 5116882
chr30 1818525
chr31 6153034
chr32 725831
chr33 7821666
chrM 16784
chrW 6813114
chrZ 82529921

Pairs file:

/nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6/work/17/3f266ec32d5602fe6f19069856e46b/ICRF-12min-S2-R2__galGal6.galGal6.nodups.pairs.gz

pairs format v1.0.0
#sorted: chr1-chr2-pos1-pos2
#shape: upper triangle
#genome_assembly: unknown
#chromsize: ref|NC_001323.1| 16775
#chromsize: ref|NC_006088.5| 197608386
#chromsize: ref|NC_006089.5| 149682049
#chromsize: ref|NC_006090.5| 110838418
#chromsize: ref|NC_006091.5| 91315245
#chromsize: ref|NC_006092.5| 59809098
#chromsize: ref|NC_006093.5| 36374701
#chromsize: ref|NC_006094.5| 36742308
#chromsize: ref|NC_006095.5| 30219446
#chromsize: ref|NC_006096.5| 24153086
#chromsize: ref|NC_006097.5| 21119840
#chromsize: ref|NC_006098.5| 20200042
#chromsize: ref|NC_006099.5| 20387278
#chromsize: ref|NC_006100.5| 19166714
#chromsize: ref|NC_006101.5| 16219308
#chromsize: ref|NC_006102.5| 13062184
#chromsize: ref|NC_006103.5| 2844601
#chromsize: ref|NC_006104.5| 10762512
#chromsize: ref|NC_006105.5| 11373140
#chromsize: ref|NC_006106.5| 10323212
#chromsize: ref|NC_006107.5| 13897287
#chromsize: ref|NC_006108.5| 6844979
#chromsize: ref|NC_006109.5| 5459462
#chromsize: ref|NC_006110.5| 6149580
#chromsize: ref|NC_006111.5| 6491222
#chromsize: ref|NC_006112.4| 3980610
#chromsize: ref|NC_006113.5| 6055710
#chromsize: ref|NC_006114.5| 8080432
#chromsize: ref|NC_006115.5| 5116882
#chromsize: ref|NC_006119.4| 725831
#chromsize: ref|NC_006126.5| 6813114
#chromsize: ref|NC_006127.5| 82529921
#chromsize: ref|NC_008465.4| 7821666
#chromsize: ref|NC_028739.2| 1818525
#chromsize: ref|NC_028740.2| 6153034
#samheader: @sq SN:ref|NC_006088.5| LN:197608386
#samheader: @sq SN:ref|NC_006089.5| LN:149682049
#samheader: @sq SN:ref|NC_006090.5| LN:110838418
#samheader: @sq SN:ref|NC_006091.5| LN:91315245
#samheader: @sq SN:ref|NC_006092.5| LN:59809098
#samheader: @sq SN:ref|NC_006093.5| LN:36374701
#samheader: @sq SN:ref|NC_006094.5| LN:36742308
#samheader: @sq SN:ref|NC_006095.5| LN:30219446
#samheader: @sq SN:ref|NC_006096.5| LN:24153086
#samheader: @sq SN:ref|NC_006097.5| LN:21119840
#samheader: @sq SN:ref|NC_006098.5| LN:20200042

@mimakaev
Copy link

This is strange. Is there something after the header in the pairs file?

@sergpolly
Copy link
Member

  1. it seems like your bwa index is referring to chromosomes with names, like: ref|NC_001323.1|, ref|NC_006088.5|, ... etc
  2. your reduced.chomsizes file , however, refers to "human readable" chr1 , chr2, chr3, etc

Can you trace back how you created bwa index and reduced.chromsizes ? Did you use the same fasta as input ?
There can be different chromosome names in the index and reduced.chromsize but there must be an overlap as well !
example:

  • bwa index created using "normal" chroms + a lot of contigs;
  • reduced.chromsizes referring to "normal" chroms only
    In this example your mapping would be done using normal+contigs, thus reducing mapping ambiguity (or increasing sometimes?); and your .pairs would contain reads mapped to the contigs, at the same time "heatmaps"-coolers are going to be build WITHOUT the contigs, just the "normal" chromosomes only.

Another unrelated problem in your distiller run is this: --resolutions 1000000,500000,250000,100000,50000,25000,10000
You're asking cooler to build 25kb heatmaps based on 10kb ones - that is probably not going to "fly" , even after you fix your reference genome:
resolutions in the "ladder" must be multiples of the highest-one (smallest bin size-one) - because all lower resolution "heatmaps" are build upon the highest one by consecutive coarsening.

@nvictus
Copy link
Member

nvictus commented Mar 29, 2019

From the header, it like your chromosomes in the pairs file use ref|NC_xxx names instead of UCSC names (chr...). That must have been how they were encoded in the FASTA file.

You can confirm by checking after the header, as Max suggested.

If that's the case, your options are:

  • Fix the names in the FASTA file (or find a better source), re-index it for bwa, and re-distill

Options that involve manual intervention (for a one-off case), or modifying the pipeline:

  • Pipe your pairs files through a script to rename the chromosomes line-by-line and feed that to cooler cload pairs. (Would have to be done manually).
  • Edit the names in galGal6.chrom.sizes to match the NCBI ID names. That will probably be annoying because your coolers will use those names instead. You can rename the chromosomes in the cool files afterwards, though (manually).

@gibcus
Copy link
Author

gibcus commented Mar 29, 2019

Yup,
used NCBI fasta to generate reduced.chomsizes file.
I'll generate a new one from UCSC, I guess. and try @nvictus's suggestion for re-index, and re-distill.
Alternatively, I'll remap the whole d@rn thing.

@gibcus
Copy link
Author

gibcus commented Mar 29, 2019

"... Another unrelated problem in your distiller run is this: --resolutions 1000000,500000,250000,100000,50000,25000,10000
You're asking cooler to build 25kb heatmaps based on 10kb ones - that is probably not going to "fly" , even after you fix your reference genome:
resolutions in the "ladder" must be multiples of the highest-one (smallest bin size-one) - because all lower resolution "heatmaps" are build upon the highest one by consecutive coarsening."

Another rookie mistake...

@nvictus
Copy link
Member

nvictus commented Mar 29, 2019

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta.

EDIT: Just tested. Scratch the sensible order statement... maybe it was just a fluke the last couple genomes I tried it on.

@gibcus
Copy link
Author

gibcus commented Mar 29, 2019

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta.

I considered the "soft masked": galGal6.fa.gz, but I'll check twoBitToFA

@mimakaev
Copy link

mimakaev commented Mar 29, 2019 via email

@gibcus
Copy link
Author

gibcus commented Mar 29, 2019

Also, I generally start with 1kb resolution, not 10kb. It does not generate
that much extra space, but may end up being useful for averages/pileups
even in low-coverage datasets.

Indeed that was a space consideration, as the libraries did not have 1kb depth. I'll take your advice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants