Reading a subset from the matrices is slow. #18

hypercompetent · 2018-11-15T18:15:23Z

Via Jeremy:

If reading more than ~5% of the samples or genes using read_tome_[sample/gene]_data(), it's often faster to read the entire matrix with read_tome_dgCMatrix().

This may be due to having both open and close in read_tome_vector(). I think this can be optimized to either take one pass at read_tome_vector() or else keep the connection open for iteration over read_tome_vector().

hypercompetent · 2018-11-16T00:59:25Z

Pull request #19 should help a lot with read speed for subsets of samples or genes. @jeremymiller - could you test this some time using the Dev branch and let me know if this feels any better in actual use? devtools::install_github("AllenInstitute/scrattch.io", ref = "dev")

jeremymiller · 2018-11-16T22:58:40Z

This is quite a bit faster than before (!!!). Thank you @hypercompetent . Taking 1000 genes (~2% of the genes) takes ~3 seconds, compared with ~30 seconds for all of the data.

tome = "\\\\allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/tomes/facs/human_MTG_bioRxiv/MTG_all.tome"
system.time({
  plotGenes <- read_tome_gene_names(tome)[1:1000]
  plotData  <- read_tome_gene_data(tome = tome, genes = plotGenes, regions = "exon", units = "counts")
  dim(plotData)  
})
system.time({
  allData <- read_tome_dgCMatrix(tome,"/data/exon")
  dim(allData)
})

hypercompetent added the Optimization label Nov 15, 2018

hypercompetent self-assigned this Nov 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading a subset from the matrices is slow. #18

Reading a subset from the matrices is slow. #18

hypercompetent commented Nov 15, 2018

hypercompetent commented Nov 16, 2018

jeremymiller commented Nov 16, 2018

Reading a subset from the matrices is slow. #18

Reading a subset from the matrices is slow. #18

Comments

hypercompetent commented Nov 15, 2018

hypercompetent commented Nov 16, 2018

jeremymiller commented Nov 16, 2018