You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# I would strongly suggest not running past n=4
n =5
is a bit funny.
Re benchmark with Python do you
have some idea why DecisionTree.jl is a factor 3-4 slower?
know whether the results identical? (tree obtained, classification accuracy), this is not necessarily the case as DT.jl may be using a different algorithm
Also I think it would be interesting to test on pure decision trees (not on forest)
The text was updated successfully, but these errors were encountered:
My best guess is that python uses sparse data representations for the training data, and perhaps because in python trees are represented as arrays (cython pointers) rather than spawning large sets of nested node and leaf objects. Perhaps it would be useful to look deeper, say into memory allocations in Julia.
Both DT.jl and python scikitlearn are implementing the same model (CART) and seem to produce the
same decision tree and prediction accuracy on the titanic data. So I think the benchmark is very comparable.
I've now added a pure decision tree benchmark. It seems that Julia is slower by an order of magnitude, this also seems to get worse with more data.
Wow, an order of magnitude! well that'd be a nice side project to work on: build a decent DecisionTreeFast.jl, there's really no reason for it to be much slower than the Python one...
I think their code is not based on SkLearn but rather on something (that I don't know) like ml toolkit or some similar name. It'd be interesting to see if it compares favourably to that one or not (probably not).
is a bit funny.
Re benchmark with Python do you
Also I think it would be interesting to test on pure decision trees (not on forest)
The text was updated successfully, but these errors were encountered: