Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bumphunter(): "Error in nulltabs[[i]] " depending on the number of bootstrap replicates #18

Open
mtellezp opened this issue Aug 24, 2017 · 5 comments

Comments

@mtellezp
Copy link

Dear Rafa et al.,

I am running bumphunter() in an ubuntu server (100g of swap and 400g of RAM) on data from approx. 2000 EPIC arrays. P I am getting the following message "Error in nulltabs[[i]] : subscript out of bounds" (please see below the traceback(), my sessionInfo(). options for the function and output).

Interestingly, I noticed that I get the error when I increase the number of bootstrap replicates above 15. If I request 15 or less replicates then everything goes (apparently) smoothly and does not give any error. This is my command line:

res <- bumphunter(M.combat, design, coeff=2, chr= manifestInfo$chr, pos=manifestInfo$pos, maxGap=300, B=18, cutoff=NULL, pickCutoff=TRUE, pickCutoffQ=0.995,nullMethod="bootstrap",verbose=TRUE)
[bumphunterEngine] Parallelizing using 14 workers/cores (backend: doParallelMC, version: 1.0.10).
[bumphunterEngine] Computing coefficients.
[bumphunterEngine] Performing 18 bootstraps.
[bumphunterEngine] Computing marginal bootstrap p-values.
[bumphunterEngine] cutoff: 0.085
[bumphunterEngine] Finding regions.
[bumphunterEngine] Found 11510 bumps.
[bumphunterEngine] Computing regions for each bootstrap.
[bumphunterEngine] Estimating p-values and FWER.
Error in nulltabs[[i]] : subscript out of bounds

It is not just an error message, indeed the “res” object is not created when this message appears, so the execution is halted.

My feeling is that this has something to do with the way the memory is being managed (but maybe I am wrong). With the current dataset, which is pretty large, It only let me do 15 replicates, which is a pretty low number of replicates (originally I attempted to do 2000 replicates, since in my experience in the epidemiology field you can see people usually doing bootstrap on no less than 1000 replicates for large studies). It is funny in that same server, I analyzed only 46 samples from a different study, and at that time the function would allow me to do up to 500 replicates (more than 500 replicates would also give me the error message).

So can you find a explanation for this error which shows up depending on the number of requested bootstrap replicates? What can I do for getting the function to do the 2000 replicates that I originally wanted to do?

Many thanks for your help,

María.

traceback()
4: bumphunterEngine(object, design = design, chr = chr, pos, cluster = cluster,
coef = coef, cutoff = cutoff, pickCutoff = pickCutoff, pickCutoffQ = pickCutoffQ,
maxGap = maxGap, nullMethod = nullMethod, smooth = smooth,
smoothFunction = smoothFunction, useWeights = useWeights,
B = B, permutations = NULL, verbose = verbose, ...)
3: .local(object, ...)
2: bumphunter(M.combat, design, coeff = 2, chr = manifestInfo$chr,
pos = manifestInfo$pos, maxGap = 300, B = 18, cutoff = NULL,
pickCutoff = TRUE, pickCutoffQ = 0.995, nullMethod = "bootstrap",
verbose = TRUE)
1: bumphunter(M.combat, design, coeff = 2, chr = manifestInfo$chr,
pos = manifestInfo$pos, maxGap = 300, B = 18, cutoff = NULL,
pickCutoff = TRUE, pickCutoffQ = 0.995, nullMethod = "bootstrap",
verbose = TRUE)

sessionInfo()
R Under development (unstable) (2017-05-08 r72665)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.2 LTS

Matrix products: default
BLAS: /usr/local/lib/R/lib/libRblas.so
LAPACK: /usr/local/lib/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=C LC_NUMERIC=C
[3] LC_TIME=es_ES.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=es_ES.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=es_ES.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] doRNG_1.6.6
[2] rngtools_1.2.4
[3] pkgmaker_0.22
[4] registry_0.3
[5] org.Hs.eg.db_3.4.1
[6] IlluminaHumanMethylationEPICanno.ilm10b2.hg19_0.6.0
[7] Hmisc_4.0-2
[8] Formula_1.2-1
[9] survival_2.41-3
[10] lattice_0.20-35
[11] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[12] GenomicFeatures_1.27.14
[13] AnnotationDbi_1.37.4
[14] doParallel_1.0.10
[15] IlluminaHumanMethylationEPICmanifest_0.3.0
[16] IlluminaHumanMethylationEPICanno.ilm10b3.hg19_0.6.0
[17] limma_3.31.20
[18] RColorBrewer_1.1-2
[19] sva_3.23.0
[20] genefilter_1.57.0
[21] mgcv_1.8-17
[22] nlme_3.1-131
[23] ggplot2_2.2.1
[24] minfi_1.22.1
[25] bumphunter_1.15.0
[26] locfit_1.5-9.1
[27] iterators_1.0.8
[28] foreach_1.4.3
[29] Biostrings_2.43.7
[30] XVector_0.15.2
[31] SummarizedExperiment_1.5.7
[32] DelayedArray_0.1.8
[33] matrixStats_0.52.1
[34] Biobase_2.35.1
[35] GenomicRanges_1.27.23
[36] GenomeInfoDb_1.11.10
[37] IRanges_2.9.19
[38] S4Vectors_0.13.15
[39] BiocGenerics_0.21.3

loaded via a namespace (and not attached):
[1] bitops_1.0-6 httr_1.2.1
[3] backports_1.0.5 tools_3.5.0
[5] nor1mix_1.2-2 R6_2.2.0
[7] rpart_4.1-11 DBI_0.6-1
[9] lazyeval_0.2.0 colorspace_1.3-2
[11] nnet_7.3-12 gridExtra_2.2.1
[13] base64_2.0 compiler_3.5.0
[15] preprocessCore_1.37.0 htmlTable_1.9
[17] rtracklayer_1.35.12 scales_0.4.1
[19] checkmate_1.8.2 quadprog_1.5-5
[21] stringr_1.2.0 digest_0.6.12
[23] Rsamtools_1.27.16 foreign_0.8-68
[25] illuminaio_0.17.0 siggenes_1.49.0
[27] GEOquery_2.41.0 htmltools_0.3.5
[29] base64enc_0.1-3 htmlwidgets_0.8
[31] rlang_0.1.2 RSQLite_1.1-2
[33] mclust_5.2.3 BiocParallel_1.9.6
[35] acepack_1.4.1 RCurl_1.95-4.8
[37] magrittr_1.5 GenomeInfoDbData_0.99.0
[39] Matrix_1.2-10 Rcpp_0.12.10
[41] munsell_0.4.3 stringi_1.1.5
[43] MASS_7.3-47 zlibbioc_1.21.0
[45] plyr_1.8.4 grid_3.5.0
[47] splines_3.5.0 multtest_2.31.0
[49] annotate_1.53.1 knitr_1.15.1
[51] beanplot_1.2 codetools_0.2-15
[53] biomaRt_2.31.7 XML_3.98-1.6
[55] latticeExtra_0.6-28 data.table_1.10.4
[57] gtable_0.2.0 openssl_0.9.6
[59] reshape_0.8.6 xtable_1.8-2
[61] tibble_1.3.3 GenomicAlignments_1.11.12
[63] memoise_1.0.0 cluster_2.0.6

@mtellezp
Copy link
Author

Hi, Let me give you an update.

I figured out that the "Error in nulltabs[[i]] " error shows up depending not only on the number of replicates or number of samples , but also on the number of cores that are registered. For instance, I downgraded the number of cores from 14 to 4, and now I was able to run the 2000 bootstrap samplings on the 2000 samples without any error at all.

So I am wondering, is this a well known issue with the bumbphunter() function? What configuration of the function would you recommend if one wish to maximize the computation capabilities in terms of cores workings that can be used when using large samples to speed up computation time?

It took me one day to run the 2000 resamples in the 2000 samples, which is not bad, but time is precious, so if you have any recommendation to be able to cut down the computation time even more (given that I have powerful machine that is being underused) I would appreciate.

Thanks !

M.

@sampoll
Copy link
Contributor

sampoll commented Aug 31, 2017

Hi M.

Is it always OK if the number of replicas is a multiple of the number of cores registered? Can you try, e.g., 30 resamples with 15 cores?

  • Sam

@mtellezp
Copy link
Author

mtellezp commented Sep 5, 2017 via email

@sampoll
Copy link
Contributor

sampoll commented Sep 8, 2017

Hi M,

I think you may be right about the memory issue. There is some discussion online that doParallel is built on the old multicore package, which runs parallel processes by executing fork. But if the forked process exits with an error, doPar just puts NULL in that process's output slots.

I forked bumphunter and turned on the .verbose flag in the offending call to foreach. Can you try installing from my repo:

devtools::install_github("sampoll/bumphunter")

and running again, please? There should be some output that helps us figure out what's going on. (I hope.) My working theory is that one or more process is unable to allocate memory to work in and exiting with an error status.

  • Sam

@mtellezp
Copy link
Author

mtellezp commented Sep 19, 2017 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants