You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
classify.RandomForest uses the VIGRA library, which is written in C++ and there's not a straightforward way to have a progress bar. Instead, we could train a 1-tree random forest before beginning the full training to see how long it takes to train a tree.
On one test dataset, indications are that this would provide a good estimate of total time (i.e., initial overhead and other factors won't mess things up), since we get the following times for training 1, 2, and 3 trees:
using the threading library we could avoid any time wastage at all: train the 2 or 3 tree forest in one thread, full forest in another. The GIL doesn't actually stop C++ processes so these could run concurrently.
Taking this idea further, we could use vigra simply as a fast tree classifier, and make all the forest code Python side. This would enable multithreading (currently vigra is not), meaning 8-16-fold speedup. We could also build a feature-selection stage into the classifier.
We could then also swap out the vigra trees for something else if we found it, which is great since vigra is a pain to install and thus not on the cluster.
classify.RandomForest uses the VIGRA library, which is written in C++ and there's not a straightforward way to have a progress bar. Instead, we could train a 1-tree random forest before beginning the full training to see how long it takes to train a tree.
On one test dataset, indications are that this would provide a good estimate of total time (i.e., initial overhead and other factors won't mess things up), since we get the following times for training 1, 2, and 3 trees:
1 tree: 118.812502146 seconds
2 trees: 236.317131042 seconds
3 trees: 351.319090128 seconds
Each successive tree is very close to a multiple of the 1-tree training time. The progress bar could then be based on time.
The text was updated successfully, but these errors were encountered: