Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading a subset from the matrices is slow. #18

Open
hypercompetent opened this issue Nov 15, 2018 · 2 comments
Open

Reading a subset from the matrices is slow. #18

hypercompetent opened this issue Nov 15, 2018 · 2 comments
Assignees

Comments

@hypercompetent
Copy link
Member

Via Jeremy:

If reading more than ~5% of the samples or genes using read_tome_[sample/gene]_data(), it's often faster to read the entire matrix with read_tome_dgCMatrix().

This may be due to having both open and close in read_tome_vector(). I think this can be optimized to either take one pass at read_tome_vector() or else keep the connection open for iteration over read_tome_vector().

@hypercompetent
Copy link
Member Author

Pull request #19 should help a lot with read speed for subsets of samples or genes. @jeremymiller - could you test this some time using the Dev branch and let me know if this feels any better in actual use? devtools::install_github("AllenInstitute/scrattch.io", ref = "dev")

@jeremymiller
Copy link
Contributor

This is quite a bit faster than before (!!!). Thank you @hypercompetent . Taking 1000 genes (~2% of the genes) takes ~3 seconds, compared with ~30 seconds for all of the data.

tome = "\\\\allen/programs/celltypes/workgroups/rnaseqanalysis/shiny/tomes/facs/human_MTG_bioRxiv/MTG_all.tome"
system.time({
  plotGenes <- read_tome_gene_names(tome)[1:1000]
  plotData  <- read_tome_gene_data(tome = tome, genes = plotGenes, regions = "exon", units = "counts")
  dim(plotData)  
})
system.time({
  allData <- read_tome_dgCMatrix(tome,"/data/exon")
  dim(allData)
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants