-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sparse / slow #226
Comments
I asked @jdey4 if he could post a GH issue, so I'm unsure how he's running things. It is true that MORF is not very well tested and benchmarked currently. https://github.com/neurodata/scikit-tree/blob/95e2597d5948c77bea565fc91b82e1a00b43cac8/sktree/tree/manifold/_morf_splitter.pyx#L273-L307 shows that the projection matrix is sparse format of handling a vector of their feature indices and vector of weights. Only non-zero weights are stored. |
Possibly something for Edward's team et al. to consider? @jovo It would be nice to have some measure of performance that we can run from n_samples 100 to >> 100. |
Hi @adam2392 I used the following code snippet to train sporf: X has a shape of (2368, 3498706) and the above code runs fine, takes about 50 mins to train. But if I increase the max feature to 100, it breaks my RAM (64 GB, apple M1 Max). |
Ah I see. That's interesting. I wouldn't expect that to happen. How many trees are you training simultaneously? |
For now I am using 100 trees, but I would love to use 1000 trees. |
Sorry I am asking how many jobs are you training in parallel. I.e. if you're training 100 trees in parallel, I am less surprised that you're running out of RAM |
Ah, I was using the default parameters, for which n_jobs=None. |
Ah I see... that is then training 1 tree at a time. Can you inform:
|
I tried MORF on brain MRI data with X.shape=(2206, 3498706). The server that I used has 754 GB of memory. I used only 1 worker and the code broke. When I try to fit MORF with 100 features, it works. |
@adam2392 here is my code snippet. It works for max_patch_dims=(3,3,3). |
@adam2392 When @jdey4 runs MORF, it takes a long time and is slow. When we build the projection matrix in oblique trees, is it a sparse matrix? If not, can we make it be a sparse matrix, I believe if it is not a sparse matrix format, but it is a sparse matrix, then we can save a lot of RAM and time by using sparse.
The text was updated successfully, but these errors were encountered: