Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Long Vectors? #91

Open
jonpeake opened this issue Nov 28, 2018 · 25 comments
Open

Support for Long Vectors? #91

jonpeake opened this issue Nov 28, 2018 · 25 comments

Comments

@jonpeake
Copy link

I'm working with data outside the realm of 2^31-1 elements, so I'm having issues converting matrices to big.matrix format using as.big.matrix (due to the inability to pass long vectors to .C or .Fortran, I'm assuming, but I'm not savvy enough to find where this occurs in the source code). I've found a work-around in creating a big.matrix and then assigning data column-wise from my original matrix, but this seems inefficient. I was wondering if there is a way to allow for conversion of matrices >2^31-1 elements, maybe using the dotCall64 package as a dependency? I've used this package in editing R base functions that call .C and .Fortran and it seems to work well.

@privefl
Copy link
Contributor

privefl commented Nov 28, 2018

I'm able to run

x <- matrix(1L, 2^16, 2^16)
X <- bigmemory::as.big.matrix(x)

@jonpeake
Copy link
Author

I'm able to run

x <- matrix(1L, 2^16, 2^16)
X <- bigmemory::as.big.matrix(x)

I'm trying to do it for a file-backed big.matrix. If I try to as a non-filebacked, it just crashes my OS (running Ubuntu 18.10).

@kaneplusplus
Copy link
Owner

Does it really crash the operating system or does it crash R? If the latter, is there an error message?

@jonpeake
Copy link
Author

I actually ended up clean installing Ubuntu 18.04 because I was having bugs with 18.10. After re-installing, I am able to use non-filebacked as.big.matrix. However, this doesn't solve my original problem of trying to use the filebacked as.big.matrix (which is what I really need). I still get the long-vector error, stemming from the SetMatrixElements sub-function.

@privefl
Copy link
Contributor

privefl commented Nov 30, 2018

What is the dimensions of your data?
It does not work with filebacked but works with non-filebacked?

@jonpeake
Copy link
Author

jonpeake commented Nov 30, 2018

My data are 50490x50490 of data type double. And correct, it works with non-filebacked, but does not work with filebacked.

@privefl
Copy link
Contributor

privefl commented Dec 1, 2018

Please try:

# devtools::install_github("privefl/bigstatsr")
X <- bigstatsr::as_FBM(x, backingfile = "data/test")
(desc <- sub("\\.bk$", ".desc", X$backingfile))
dput(X$bm.desc(), desc)

library(bigmemory)
X.bm <- attach.big.matrix(desc)

@jonpeake
Copy link
Author

jonpeake commented Dec 1, 2018

I tried that and it worked, but I would also like to be able to perform the biganalytics and bigalgebra (cdeterman fork) functions, which also run into the problem of not supporting long vectors. How did you get around the long vector problem in bigstatsr? I'm wondering if your method can be easily ported to bigmemory (specifically the SetMatrixElements function, where the error is ultimately occurring; see code below for the error message)

>Y=X.bm*X.bm
Error in SetMatrixElements(x@address, as.double(j), as.double(i), as.double(value)) : 
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:519
Error during wrapup: long vectors not supported yet: ../../src/include/Rinlinedfuns.h:519

@privefl
Copy link
Contributor

privefl commented Dec 2, 2018

I think the main difference is that I'm relying heavily on Rcpp to link R and C++. This keeps things really simple for me.
Yet, at the time bigmemory was developed, Rcpp was not as mature as now I guess.

@jonpeake
Copy link
Author

jonpeake commented Dec 3, 2018

@cdeterman @kaneplusplus Any chance you could look into using Rcpp for bigmemory to support long vectors?

@cdeterman
Copy link
Contributor

If I recall, the main limitation with long vectors had to do with the BLAS backends. Maybe certain BLAS support this and the installation can be conditional upon that? The C++ should be relatively straightforward to write but the BLAS backends I believe are the main factor. @kaneplusplus anything to confirm in that regard?

@kaneplusplus
Copy link
Owner

@cdeterman I think that's correct. At one point we linked bigmemory to the 64-bit api for the MKL in bigalgebra but since each of the BLAS set ups were slightly different it was difficult to do in general and we weren't seeing a lot of interest at the time.

@cdeterman
Copy link
Contributor

Perhaps that can be the thought then. If we can setup some sort of configuration at compile time to somehow detect what the BLAS backend is we could have the C++ conditionally (e.g. using ifdef or something) have the long vectors and otherwise have the R throw an error stating that the BLAS backend does not support them. Not sure exactly how at the moment but perhaps some food for thought.

@jonpeake
Copy link
Author

jonpeake commented Dec 4, 2018

Just came across a similar but different bug as well. When subsetting by a vector (i.e., I have a vector of indices that I want to use to set a value or vector of values for in a big.matrix), the SetIndivVectorElements.bm sub-function coerces the index vector to as.integer. This causes a bug where indices are coerced into NAs if you have over 2^31-1 elements in the original big.matrix, since this is the maximum value of an integer type in R. I found a workaround by just calling the bigmemory:::SetIndivVectorMatrixElements function directly and not calling as.integer in my function call, since for my purposes all of my indices are "integers" in the more broad sense of the word. Another workaround could be to use the bit64 package if you want to still coerce to integers.

@privefl
Copy link
Contributor

privefl commented Dec 5, 2018

  • Developer side: the simpler would be to pass those indices as NumericVector.

  • User side: you could use the 2-column matrix accessor instead. For example:

> library(bigmemory)

> X <- big.matrix(3, 3); X[] <- 1:9

> ktoij <- function(k, X) {
+   k <- k - 1
+   n <- nrow(X)
+   cbind(row = k %% n, col = k %/% n) + 1
+ }

> vec <- c(2, 4, 9)

> (ind <- ktoij(vec, X))
     row col
[1,]   2   1
[2,]   1   2
[3,]   3   3

> X[ind] <- 0

> X[]
     [,1] [,2] [,3]
[1,]    1    0    7
[2,]    0    5    8
[3,]    3    6    0

@jonpeake
Copy link
Author

jonpeake commented Dec 5, 2018

I tried initially to use the 2-column approach, but unfortunately the which function also comes across the dreaded 2^31-1 problem. Seems that although base R in general supports long vectors, a lot of the base functions have not been updated to reflect this support. I ended up having to do a workaround where I coerce my logical index matrix into a one-column big.matrix, then use the mwhich function provided in bigmemory to get the element-number-based index vector.

@prateeksasan1
Copy link

prateeksasan1 commented Sep 30, 2020

Hi,

I am facing the same problem. I am getting the following error.

Error in SetMatrixElements(x@address, as.double(j), as.double(i), as.double(value)) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535

Is there a resolution to this?

Thanks

@kaneplusplus
Copy link
Owner

Can you tell us which version of R you are using?

@prateeksasan1
Copy link

R/4.0.2

Its on my universities' server

@kaneplusplus
Copy link
Owner

Thanks for the extra information. Can you check the values your are sending to the assignment where this is happening? The error looks like it's coming from R, rather than bigmemory's C code. Can you call as.double on the values? My hunch is that an easy fix would be to break the assignment into a few smaller assignment but let's see if we can do a little better job tracking down the problem.

@elenabernabeu
Copy link

elenabernabeu commented Nov 18, 2021

Hi,

Continuing on from other people in this post, we are also having some issues with the long vector issue when using as.big.matrix directly on data. Specifically, getting the same error as @prateeksasan1:

Error in SetMatrixElements(x@address, as.double(j), as.double(i), as.double(value)) :
long vectors not supported yet: ../../src/include/Rinlinedfuns.h:535

Some background info:

  • We are working with R version 4.0.3, and data has around 10K rows and 400K columns (~30 gigs RDS file, ~65 gigs as tsv file) - matrix values are double type. No issues loading it into memory via readRDS. Get error message above when attempting as.big.matrix on data.
  • Working on uni server.
  • Have tried subsetting (creating two big matrices of 10Kx200K) as recommended and fusing using cbindBM() and this seems to have done the trick to get a big matrix directly when data is loaded into memory in R.
  • Goal is to run biglasso on data, which I've yet to test.

Wondering if you had any insights on this?

Thanks!

@scottgigante-immunai
Copy link

+1 to the need to store big matrices in memory with more than 2**32 entries. Is this planned at any point?

@kaneplusplus
Copy link
Owner

So, bigmemory supports more than 2**32 entries. It looks like as.big.matrix() doesn't because of R. My guess is that if you pre-create the big.matrix and copy your in-memory matrix to the big.matrix object in pieces, it will work fine.

@scottgigante-immunai
Copy link

Would you be able to build this workaround into as.big.matrix?

@kaneplusplus
Copy link
Owner

bigmemory is mostly in maintenance mode these days. I haven't had a lot of time to devote to it. I would happily take a pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants