distiller is failing at bin_zoom_library_pairs with new reference genome #137

gibcus · 2019-03-29T16:47:24Z

dis.out contents:

N E X T F L O W ~ version 19.01.0
Launching dekkerlab/distiller-nf [mad_edison] - revision: 5f5b40f [ghpcc]
[warm up] executor > lsf
[76/2f5f3c] Submitted process > local_truncate_chunk_fastqs (library:ICRF-12min-S2-R2__galGal6 run:lane1)
[99/257402] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:03)
[f0/613fcd] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:01)
[80/0ae835] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:05)
[c1/0329d9] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:04)
[a8/268854] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:08)
[a0/efa5f1] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:02)
[11/20e124] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:09)
[c8/7c0943] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:06)
[16/c87ea8] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:07)
[1d/040136] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:11)
[e5/a8a883] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:12)
[e6/7dc0ec] Submitted process > map_parse_sort_chunks (library:ICRF-12min-S2-R2__galGal6 run:lane1 chunk:10)
[17/3f266e] Submitted process > merge_dedup_splitbam (library:ICRF-12min-S2-R2__galGal6)
[96/a57503] Submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)
[d8/9d8b0c] Submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:mapq_30)
[e0/acfb4b] Submitted process > merge_stats_libraries_into_groups (library_group:ICRF-12m-R2)
[0c/a48dde] Submitted process > merge_stats_libraries_into_groups (library_group:all)
[96/a57503] NOTE: Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1) -- Execution is retried (1)
[42/b06c9f] Re-submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)
[42/b06c9f] NOTE: Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1) -- Execution is retried (2)
[56/efd5ad] Re-submitted process > bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)
ERROR ~ Error executing process > 'bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter)'

Caused by:
Process bin_zoom_library_pairs (library:ICRF-12min-S2-R2__galGal6 filter:no_filter) terminated with an error exit status (1)

Command executed:

bgzip -cd -@ 3 ICRF-12min-S2-R2__galGal6.galGal6.nodups.pairs.gz | cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 --assembly galGal6 galGal6.reduced.chrom.sizes:10000 - ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool

cooler zoomify --nproc 12 --out ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.mcool --resolutions 1000000,500000,250000,100000,50000,25000,10000 --balance ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool

Command exit status:
1

Command output:
(empty)

Command error:
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Writing indexes
INFO:cooler.create:Writing info
INFO:cooler.create:Done
INFO:cooler.create:Writing chunk 8: /tmp/tmp6loc8rar.multi.cool::8
INFO:cooler.create:Creating cooler at "/tmp/tmp6loc8rar.multi.cool::/8"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.create:Writing indexes
INFO:cooler.create:Writing info
INFO:cooler.create:Done
INFO:cooler.create:Merging into ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool
INFO:cooler.create:Creating cooler at "ICRF-12min-S2-R2__galGal6.galGal6.no_filter.10000.cool::/"
INFO:cooler.create:Writing chroms
INFO:cooler.create:Writing bins
INFO:cooler.create:Writing pixels
INFO:cooler.reduce:nnzs: [0, 0, 0, 0, 0, 0, 0, 0, 0]
INFO:cooler.reduce:current: [0, 0, 0, 0, 0, 0, 0, 0, 0]
Traceback (most recent call last):
File "/miniconda3/bin/cooler", line 11, in
sys.exit(cli())
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 764, in call
return self.main(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 1137, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/miniconda3/lib/python3.6/site-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/miniconda3/lib/python3.6/site-packages/cooler/cli/cload.py", line 476, in pairs
h5opts=h5opts,
File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 670, in create_from_unordered
**kwargs)
File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 565, in create
file_path, target, meta.columns, iterable, h5opts, lock)
File "/miniconda3/lib/python3.6/site-packages/cooler/create/_create.py", line 204, in write_pixels
for i, chunk in enumerate(iterable):
File "/miniconda3/lib/python3.6/site-packages/cooler/reduce.py", line 162, in iter
ignore_index=True)
File "/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 225, in concat
copy=copy, sort=sort)
File "/miniconda3/lib/python3.6/site-packages/pandas/core/reshape/concat.py", line 259, in init
raise ValueError('No objects to concatenate')
ValueError: No objects to concatenate

Work dir:
/nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6/work/56/efd5ad506debb426aa33922a1b1abe

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

-- Check '.nextflow.log' file for details
WARN: Killing pending tasks (1)

Sender: LSF System lsfadmin@c04b04
Subject: Job 2694432: <~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config> in cluster Exited

Job <~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config> was submitted from host by user in cluster at Thu Mar 28 19:49:37 2019.
Job was executed on host(s) <2*c04b04>, in queue , as user in cluster at Thu Mar 28 19:49:37 2019.
</home/jg14w> was used as the home directory.
</nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6> was used as the working directory.
Started at Thu Mar 28 19:49:37 2019.
Terminated at Fri Mar 29 03:55:45 2019.
Results reported at Fri Mar 29 03:55:45 2019.

Your job looked like:

LSBATCH: User input
~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config

Exited with exit code 1.

Resource usage summary:

CPU time :                                   380.92 sec.
Max Memory :                                 1287 MB
Average Memory :                             1045.12 MB
Total Requested Memory :                     16000.00 MB
Delta Memory :                               14713.00 MB
Max Swap :                                   -
Max Processes :                              3
Max Threads :                                91
Run time :                                   29168 sec.
Turnaround time :                            29168 sec.

The output (if any) is above this job summary.

PS:

Read file <dis.err> for stderr output of this job.

Contents of: /nl/umw_job_dekker/cshare/reference/sorted_chromsizes/galGal6.reduced.chrom.sizes:

chr1 197608386
chr2 149682049
chr3 110838418
chr4 91315245
chr5 59809098
chr6 36374701
chr7 36742308
chr8 30219446
chr9 24153086
chr10 21119840
chr11 20200042
chr12 20387278
chr13 19166714
chr14 16219308
chr15 13062184
chr16 2844601
chr17 10762512
chr18 11373140
chr19 10323212
chr20 13897287
chr21 6844979
chr22 5459462
chr23 6149580
chr24 6491222
chr25 3980610
chr26 6055710
chr27 8080432
chr28 5116882
chr30 1818525
chr31 6153034
chr32 725831
chr33 7821666
chrM 16784
chrW 6813114
chrZ 82529921

Pairs file:

/nl/umw_job_dekker/users/jg14w/Mapping/ICRF-12m-R2_galGal6/work/17/3f266ec32d5602fe6f19069856e46b/ICRF-12min-S2-R2__galGal6.galGal6.nodups.pairs.gz

pairs format v1.0.0
#sorted: chr1-chr2-pos1-pos2
#shape: upper triangle
#genome_assembly: unknown
#chromsize: ref|NC_001323.1| 16775
#chromsize: ref|NC_006088.5| 197608386
#chromsize: ref|NC_006089.5| 149682049
#chromsize: ref|NC_006090.5| 110838418
#chromsize: ref|NC_006091.5| 91315245
#chromsize: ref|NC_006092.5| 59809098
#chromsize: ref|NC_006093.5| 36374701
#chromsize: ref|NC_006094.5| 36742308
#chromsize: ref|NC_006095.5| 30219446
#chromsize: ref|NC_006096.5| 24153086
#chromsize: ref|NC_006097.5| 21119840
#chromsize: ref|NC_006098.5| 20200042
#chromsize: ref|NC_006099.5| 20387278
#chromsize: ref|NC_006100.5| 19166714
#chromsize: ref|NC_006101.5| 16219308
#chromsize: ref|NC_006102.5| 13062184
#chromsize: ref|NC_006103.5| 2844601
#chromsize: ref|NC_006104.5| 10762512
#chromsize: ref|NC_006105.5| 11373140
#chromsize: ref|NC_006106.5| 10323212
#chromsize: ref|NC_006107.5| 13897287
#chromsize: ref|NC_006108.5| 6844979
#chromsize: ref|NC_006109.5| 5459462
#chromsize: ref|NC_006110.5| 6149580
#chromsize: ref|NC_006111.5| 6491222
#chromsize: ref|NC_006112.4| 3980610
#chromsize: ref|NC_006113.5| 6055710
#chromsize: ref|NC_006114.5| 8080432
#chromsize: ref|NC_006115.5| 5116882
#chromsize: ref|NC_006119.4| 725831
#chromsize: ref|NC_006126.5| 6813114
#chromsize: ref|NC_006127.5| 82529921
#chromsize: ref|NC_008465.4| 7821666
#chromsize: ref|NC_028739.2| 1818525
#chromsize: ref|NC_028740.2| 6153034
#samheader: @sq SN:ref|NC_006088.5| LN:197608386
#samheader: @sq SN:ref|NC_006089.5| LN:149682049
#samheader: @sq SN:ref|NC_006090.5| LN:110838418
#samheader: @sq SN:ref|NC_006091.5| LN:91315245
#samheader: @sq SN:ref|NC_006092.5| LN:59809098
#samheader: @sq SN:ref|NC_006093.5| LN:36374701
#samheader: @sq SN:ref|NC_006094.5| LN:36742308
#samheader: @sq SN:ref|NC_006095.5| LN:30219446
#samheader: @sq SN:ref|NC_006096.5| LN:24153086
#samheader: @sq SN:ref|NC_006097.5| LN:21119840
#samheader: @sq SN:ref|NC_006098.5| LN:20200042

The text was updated successfully, but these errors were encountered:

mimakaev · 2019-03-29T16:59:47Z

This is strange. Is there something after the header in the pairs file?

sergpolly · 2019-03-29T17:04:05Z

it seems like your bwa index is referring to chromosomes with names, like: ref|NC_001323.1|, ref|NC_006088.5|, ... etc
your reduced.chomsizes file , however, refers to "human readable" chr1 , chr2, chr3, etc

Can you trace back how you created bwa index and reduced.chromsizes ? Did you use the same fasta as input ?
There can be different chromosome names in the index and reduced.chromsize but there must be an overlap as well !
example:

bwa index created using "normal" chroms + a lot of contigs;
reduced.chromsizes referring to "normal" chroms only
In this example your mapping would be done using normal+contigs, thus reducing mapping ambiguity (or increasing sometimes?); and your .pairs would contain reads mapped to the contigs, at the same time "heatmaps"-coolers are going to be build WITHOUT the contigs, just the "normal" chromosomes only.

Another unrelated problem in your distiller run is this: --resolutions 1000000,500000,250000,100000,50000,25000,10000
You're asking cooler to build 25kb heatmaps based on 10kb ones - that is probably not going to "fly" , even after you fix your reference genome:
resolutions in the "ladder" must be multiples of the highest-one (smallest bin size-one) - because all lower resolution "heatmaps" are build upon the highest one by consecutive coarsening.

nvictus · 2019-03-29T17:04:11Z

From the header, it like your chromosomes in the pairs file use ref|NC_xxx names instead of UCSC names (chr...). That must have been how they were encoded in the FASTA file.

You can confirm by checking after the header, as Max suggested.

If that's the case, your options are:

Fix the names in the FASTA file (or find a better source), re-index it for bwa, and re-distill

Options that involve manual intervention (for a one-off case), or modifying the pipeline:

Pipe your pairs files through a script to rename the chromosomes line-by-line and feed that to cooler cload pairs. (Would have to be done manually).
Edit the names in galGal6.chrom.sizes to match the NCBI ID names. That will probably be annoying because your coolers will use those names instead. You can rename the chromosomes in the cool files afterwards, though (manually).

gibcus · 2019-03-29T17:13:51Z

Yup,
used NCBI fasta to generate reduced.chomsizes file.
I'll generate a new one from UCSC, I guess. and try @nvictus's suggestion for re-index, and re-distill.
Alternatively, I'll remap the whole d@rn thing.

gibcus · 2019-03-29T17:21:32Z

"... Another unrelated problem in your distiller run is this: --resolutions 1000000,500000,250000,100000,50000,25000,10000
You're asking cooler to build 25kb heatmaps based on 10kb ones - that is probably not going to "fly" , even after you fix your reference genome:
resolutions in the "ladder" must be multiples of the highest-one (smallest bin size-one) - because all lower resolution "heatmaps" are build upon the highest one by consecutive coarsening."

Another rookie mistake...

nvictus · 2019-03-29T17:21:39Z

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta.

EDIT: Just tested. Scratch the sensible order statement... maybe it was just a fluke the last couple genomes I tried it on.

gibcus · 2019-03-29T17:24:03Z

I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta.

I considered the "soft masked": galGal6.fa.gz, but I'll check twoBitToFA

mimakaev · 2019-03-29T17:24:54Z

Also, I generally start with 1kb resolution, not 10kb. It does not generate that much extra space, but may end up being useful for averages/pileups even in low-coverage datasets.

…

On Fri, Mar 29, 2019 at 2:24 PM Johan Gibcus ***@***.***> wrote: I recommend downloading the 2bit file from UCSC goldenpath. The twoBitInfo command will dump the chromosomes in a sensible order (not sorted by size), and twoBitToFa will generate the fasta. I considered the "soft masked": galGal6.fa.gz, but I'll check twoBitToFA — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#137 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AJBEe2lrJueo3IuDLs5oeoj9Z44buGunks5vbkwzgaJpZM4cSzrp> .

gibcus · 2019-03-29T17:26:54Z

Also, I generally start with 1kb resolution, not 10kb. It does not generate
that much extra space, but may end up being useful for averages/pileups
even in low-coverage datasets.

Indeed that was a space consideration, as the libraries did not have 1kb depth. I'll take your advice!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

distiller is failing at bin_zoom_library_pairs with new reference genome #137

distiller is failing at bin_zoom_library_pairs with new reference genome #137

gibcus commented Mar 29, 2019

mimakaev commented Mar 29, 2019

sergpolly commented Mar 29, 2019

nvictus commented Mar 29, 2019

gibcus commented Mar 29, 2019

gibcus commented Mar 29, 2019

nvictus commented Mar 29, 2019 •

edited

Loading

gibcus commented Mar 29, 2019 •

edited

Loading

mimakaev commented Mar 29, 2019 via email

gibcus commented Mar 29, 2019

distiller is failing at bin_zoom_library_pairs with new reference genome #137

distiller is failing at bin_zoom_library_pairs with new reference genome #137

Comments

gibcus commented Mar 29, 2019

dis.out contents:

LSBATCH: User input ~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config

Contents of: /nl/umw_job_dekker/cshare/reference/sorted_chromsizes/galGal6.reduced.chrom.sizes:

Pairs file:

mimakaev commented Mar 29, 2019

sergpolly commented Mar 29, 2019

nvictus commented Mar 29, 2019

gibcus commented Mar 29, 2019

gibcus commented Mar 29, 2019

nvictus commented Mar 29, 2019 • edited Loading

gibcus commented Mar 29, 2019 • edited Loading

mimakaev commented Mar 29, 2019 via email

gibcus commented Mar 29, 2019

LSBATCH: User input
~/SSH_plumbing/nextflow run dekkerlab/distiller-nf -r ghpcc -params-file ICRF-12m-R2_galGal6.yml -profile custom --container_cache_dir /nl/umw_job_dekker/cshare/containers --custom_config /nl/umw_job_dekker/users/jg14w/Mapping/cluster.config

nvictus commented Mar 29, 2019 •

edited

Loading

gibcus commented Mar 29, 2019 •

edited

Loading