You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When I run test data using hic-pipeline, I found the stats is really different from tests/data/stats.txt provided in repo. Then when I run hic-pipeline and juicer separated on in-house data, stats is also way different. I wonder in which step causing the difference and which result is should I use.
OS/Platform
OS/Platform: Ubuntu 20.04.4 LTS
Singularity version: v3.11.4
Pipeline version: v1.15.1
Caper version: v2.2.3
Caper configuration file
backend=local
# Local directory for localized files and Cromwell's intermediate files.
# If not defined then Caper will make .caper_tmp/ on CWD or `local-out-dir`.
# /tmp is not recommended since Caper store localized data files here.
local-loc-dir=
cromwell=/home/myname/.caper/cromwell_jar/cromwell-82.jar
womtool=/home/myname/.caper/womtool_jar/womtool-82.jar
Input JSON file
caper run /home/myname/Tools/hic-pipeline/hic.wdl --singularity \
-i /home/myname/Tools/hic-pipeline/tests/functional/json/test_hic.json \
-m /home/myname/Tools/hic-pipeline/tests/testPipeline/testrun_metadata.json
Statistic from hic-pipeline
tests/data/stats.txt info in repo
Intra-fragment Reads: 6,969(57.59%)
Hi-C Contacts: 5,132(42.41%)
Ligation Motif Present: 3 (0.02%)
3' Bias (Long Range): 65% - 35%
Pair Type %(L-I-O-R): 25% - 23% - 27% - 25%
Inter-chromosomal: 6 (0.05%)
Intra-chromosomal: 5,126 (42.36%)
Short Range (<20Kb): 4,537 (37.49%)
Long Range (>20Kb): 589 (4.87%)
While when I run hic-pipeline for test data, statistic was like follows
Read type: Paired End
Sequenced Read Pairs: 332888
No chimera found: 11303 (3.40%)
One or both reads unmapped: 11303 (3.40%)
2 alignments: 321559 (96.60%)
2 alignments (A...B): 321558 (96.60%)
2 alignments (A1...A2B; A1B2...B1A2): 1 (0.00%)
3 or more alignments: 26 (0.01%)
Ligation Motif Present: 96 (0.03%)
Average insert size: 496.10
Total Unique: 310081 (96.43%, 93.15%)
Total Duplicates: 11478 (3.57%, 3.45%)
Library Complexity Estimate*: 4,396,440
Intra-fragment Reads: 150,908 (45.33% / 48.67%)
Below MAPQ Threshold: 44,764 (13.45% / 14.44%)
Hi-C Contacts: 114,409 (34.37% / 36.90%)
3' Bias (Long Range): 80% - 20%
Pair Type %(L-I-O-R): 25% - 25% - 25% - 25%
L-I-O-R Convergence: 10000000000
Inter-chromosomal: 193 (0.06% / 0.06%)
Intra-chromosomal: 114,216 (34.31% / 36.83%)
Short Range (<20Kb):
<500BP: 70,381 (21.14% / 22.70%)
500BP-5kB: 31,397 (9.43% / 10.13%)
5kB-20kB: 2,338 (0.70% / 0.75%)
Long Range (>20Kb): 10,100 (3.03% / 3.26%)
Hi-C Contacts: 5,132(42.41%) and Hi-C Contacts: 114,409 (34.37% / 36.90%) differ a lot. Then I ran Juicer alone using code below:
I also run the test using mm10 5G in-house data. I found the result from hic-pipeline is also differ from use of juicer only. Statistic shows below:
hic-pipeline(stats_30.txt):
Read type: Paired End
Sequenced Read Pairs: 44334059
No chimera found: 221708 (0.50%)
One or both reads unmapped: 221708 (0.50%)
2 alignments: 36958351 (83.36%)
2 alignments (A...B): 10224763 (23.06%)
2 alignments (A1...A2B; A1B2...B1A2): 26733588 (60.30%)
3 or more alignments: 7154000 (16.14%)
Ligation Motif Present: 37372441 (84.30%)
Average insert size: 215.17
Total Unique: 30990053 (83.85%, 69.90%)
Total Duplicates: 5968298 (16.15%, 13.46%)
Library Complexity Estimate*: 101,748,185
Intra-fragment Reads: 16,460,941 (37.13% / 53.12%)
Below MAPQ Threshold: 5,811,532 (13.11% / 18.75%)
Hi-C Contacts: 8,717,580 (19.66% / 28.13%)
3' Bias (Long Range): N/A
Pair Type %(L-I-O-R): N/A
Inter-chromosomal: 8,717,580 (19.66% / 28.13%)
Intra-chromosomal: 0 (0.00% / 0.00%)
Short Range (<20Kb):
<500BP: 0 (0.00% / 0.00%)
500BP-5kB: 0 (0.00% / 0.00%)
5kB-20kB: 0 (0.00% / 0.00%)
Long Range (>20Kb): 0 (0.00% / 0.00%)
hic-pipeline identified Hi-C Contacts: 8,717,580 (19.66% / 28.13%) while juicer have Hi-C Contacts: 24,908,998 (56.18% / 79.61%). And no Intra-chromosomal contact detected from hic-pipeline also weird to me.
The text was updated successfully, but these errors were encountered:
Describe the bug
When I run test data using hic-pipeline, I found the stats is really different from
tests/data/stats.txt
provided in repo. Then when I run hic-pipeline and juicer separated on in-house data, stats is also way different. I wonder in which step causing the difference and which result is should I use.OS/Platform
Caper configuration file
Input JSON file
Statistic from hic-pipeline
tests/data/stats.txt info in repo
While when I run hic-pipeline for test data, statistic was like follows
Hi-C Contacts: 5,132(42.41%) and Hi-C Contacts: 114,409 (34.37% / 36.90%) differ a lot. Then I ran Juicer alone using code below:
Output from inter_30.txt looks similar to re-run hic-pipeline
Inconsistency in in-house data
I also run the test using mm10 5G in-house data. I found the result from hic-pipeline is also differ from use of juicer only. Statistic shows below:
hic-pipeline(stats_30.txt):
juicer (inter_30.txt):
hic-pipeline identified Hi-C Contacts: 8,717,580 (19.66% / 28.13%) while juicer have Hi-C Contacts: 24,908,998 (56.18% / 79.61%). And no Intra-chromosomal contact detected from hic-pipeline also weird to me.
The text was updated successfully, but these errors were encountered: