-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bumphunter(): "Error in nulltabs[[i]] " depending on the number of bootstrap replicates #18
Comments
Hi, Let me give you an update. I figured out that the "Error in nulltabs[[i]] " error shows up depending not only on the number of replicates or number of samples , but also on the number of cores that are registered. For instance, I downgraded the number of cores from 14 to 4, and now I was able to run the 2000 bootstrap samplings on the 2000 samples without any error at all. So I am wondering, is this a well known issue with the bumbphunter() function? What configuration of the function would you recommend if one wish to maximize the computation capabilities in terms of cores workings that can be used when using large samples to speed up computation time? It took me one day to run the 2000 resamples in the 2000 samples, which is not bad, but time is precious, so if you have any recommendation to be able to cut down the computation time even more (given that I have powerful machine that is being underused) I would appreciate. Thanks ! M. |
Hi M. Is it always OK if the number of replicas is a multiple of the number of cores registered? Can you try, e.g., 30 resamples with 15 cores?
|
Hi Sam,
No, 30 resamples with 15 cores does not work, see below:
res <- bumphunter(M.combat, design, coeff=2, chr= manifestInfo$chr, pos=manifestInfo$pos, maxGap=300, B=30, cutoff=NULL, pickCutoff=TRUE, pickCutoffQ=0.995,nullMethod="bootstrap",verbose=TRUE)
[bumphunterEngine] Parallelizing using 15 workers/cores (backend: doParallelMC, version: 1.0.10).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Performing 30 bootstraps.
[bumphunterEngine] Computing marginal bootstrap p-values.
[bumphunterEngine] cutoff: 0.084
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 11875 bumps.
[bumphunterEngine] Computing regions for each bootstrap.
[bumphunterEngine] Estimating p-values and FWER.
Error in nulltabs[[i]] : subscript out of bounds
… On 31 Aug 2017, at 16:58, sampoll ***@***.***> wrote:
Hi M.
Is it always OK if the number of replicas is a multiple of the number of cores registered? Can you try, e.g., 30 resamples with 15 cores?
• Sam
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Hi M, I think you may be right about the memory issue. There is some discussion online that doParallel is built on the old multicore package, which runs parallel processes by executing fork. But if the forked process exits with an error, doPar just puts NULL in that process's output slots. I forked bumphunter and turned on the .verbose flag in the offending call to foreach. Can you try installing from my repo:
and running again, please? There should be some output that helps us figure out what's going on. (I hope.) My working theory is that one or more process is unable to allocate memory to work in and exiting with an error status.
|
Hi Sam,
I followed your instructions and this is what I got:
res <- bumphunter(M.combat, design, coeff=2, chr= manifestInfo$chr, pos=manifestInfo$pos, maxGap=300, B=30, cutoff=NULL, pickCutoff=TRUE, pickCutoffQ=0.995,nullMethod="bootstrap",verbose=TRUE)
[bumphunterEngine] Parallelizing using 15 workers/cores (backend: doParallelMC, version: 1.0.10).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Performing 30 bootstraps.
Loading required package: rngtools
Loading required package: pkgmaker
Loading required package: registry
Attaching package: 'pkgmaker'
The following object is masked from 'package:S4Vectors':
new2
The following object is masked from 'package:base':
isNamespaceLoaded
Error in loadNamespace(j <- i[[1L]], c(lib.loc, .libPaths()), versionCheck = vI[[j]]) :
namespace 'RSQLite' 1.1-2 is already loaded, but >= 2.0 is required
… On 08 Sep 2017, at 20:20, sampoll ***@***.***> wrote:
Hi M,
I think you may be right about the memory issue. There is some discussion online that doParallel is built on the old multicore package, which runs parallel processes by executing fork. But if the forked process exits with an error, doPar just puts NULL in that process's output slots.
I forked bumphunter and turned on the .verbose flag in the offending call to foreach. Can you try installing from my repo:
devtools::install_github("sampoll/bumphunter")
and running again, please? There should be some output that helps us figure out what's going on. (I hope.) My working theory is that one or more process is unable to allocate memory to work in and exiting with an error status.
• Sam
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Dear Rafa et al.,
I am running bumphunter() in an ubuntu server (100g of swap and 400g of RAM) on data from approx. 2000 EPIC arrays. P I am getting the following message "Error in nulltabs[[i]] : subscript out of bounds" (please see below the traceback(), my sessionInfo(). options for the function and output).
Interestingly, I noticed that I get the error when I increase the number of bootstrap replicates above 15. If I request 15 or less replicates then everything goes (apparently) smoothly and does not give any error. This is my command line:
It is not just an error message, indeed the “res” object is not created when this message appears, so the execution is halted.
My feeling is that this has something to do with the way the memory is being managed (but maybe I am wrong). With the current dataset, which is pretty large, It only let me do 15 replicates, which is a pretty low number of replicates (originally I attempted to do 2000 replicates, since in my experience in the epidemiology field you can see people usually doing bootstrap on no less than 1000 replicates for large studies). It is funny in that same server, I analyzed only 46 samples from a different study, and at that time the function would allow me to do up to 500 replicates (more than 500 replicates would also give me the error message).
So can you find a explanation for this error which shows up depending on the number of requested bootstrap replicates? What can I do for getting the function to do the 2000 replicates that I originally wanted to do?
Many thanks for your help,
María.
Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] doRNG_1.6.6
[2] rngtools_1.2.4
[3] pkgmaker_0.22
[4] registry_0.3
[5] org.Hs.eg.db_3.4.1
[6] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
[7] Hmisc_4.0-2
[8] Formula_1.2-1
[9] survival_2.41-3
[10] lattice_0.20-35
[11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[12] GenomicFeatures_1.27.14
[13] AnnotationDbi_1.37.4
[14] doParallel_1.0.10
[15] IlluminaHumanMethylationEPICmanifest_0.3.0
[16] IlluminaHumanMethylationEPICanno.ilm10b3.hg19_0.6.0
[17] limma_3.31.20
[18] RColorBrewer_1.1-2
[19] sva_3.23.0
[20] genefilter_1.57.0
[21] mgcv_1.8-17
[22] nlme_3.1-131
[23] ggplot2_2.2.1
[24] minfi_1.22.1
[25] bumphunter_1.15.0
[26] locfit_1.5-9.1
[27] iterators_1.0.8
[28] foreach_1.4.3
[29] Biostrings_2.43.7
[30] XVector_0.15.2
[31] SummarizedExperiment_1.5.7
[32] DelayedArray_0.1.8
[33] matrixStats_0.52.1
[34] Biobase_2.35.1
[35] GenomicRanges_1.27.23
[36] GenomeInfoDb_1.11.10
[37] IRanges_2.9.19
[38] S4Vectors_0.13.15
[39] BiocGenerics_0.21.3
loaded via a namespace (and not attached):
[1] bitops_1.0-6 httr_1.2.1
[3] backports_1.0.5 tools_3.5.0
[5] nor1mix_1.2-2 R6_2.2.0
[7] rpart_4.1-11 DBI_0.6-1
[9] lazyeval_0.2.0 colorspace_1.3-2
[11] nnet_7.3-12 gridExtra_2.2.1
[13] base64_2.0 compiler_3.5.0
[15] preprocessCore_1.37.0 htmlTable_1.9
[17] rtracklayer_1.35.12 scales_0.4.1
[19] checkmate_1.8.2 quadprog_1.5-5
[21] stringr_1.2.0 digest_0.6.12
[23] Rsamtools_1.27.16 foreign_0.8-68
[25] illuminaio_0.17.0 siggenes_1.49.0
[27] GEOquery_2.41.0 htmltools_0.3.5
[29] base64enc_0.1-3 htmlwidgets_0.8
[31] rlang_0.1.2 RSQLite_1.1-2
[33] mclust_5.2.3 BiocParallel_1.9.6
[35] acepack_1.4.1 RCurl_1.95-4.8
[37] magrittr_1.5 GenomeInfoDbData_0.99.0
[39] Matrix_1.2-10 Rcpp_0.12.10
[41] munsell_0.4.3 stringi_1.1.5
[43] MASS_7.3-47 zlibbioc_1.21.0
[45] plyr_1.8.4 grid_3.5.0
[47] splines_3.5.0 multtest_2.31.0
[49] annotate_1.53.1 knitr_1.15.1
[51] beanplot_1.2 codetools_0.2-15
[53] biomaRt_2.31.7 XML_3.98-1.6
[55] latticeExtra_0.6-28 data.table_1.10.4
[57] gtable_0.2.0 openssl_0.9.6
[59] reshape_0.8.6 xtable_1.8-2
[61] tibble_1.3.3 GenomicAlignments_1.11.12
[63] memoise_1.0.0 cluster_2.0.6
The text was updated successfully, but these errors were encountered: