-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use generalized dot product in Mahalanobis distance. #242
Comments
This has been proposed earlier, but it is somewhat controversial. The 3-arg |
x-ref #173 |
I see you reasoning, but in the link you've provided I see only a small isolated benchmark. It can be true, that a 2 argument dot function might be faster on large inputs (which I suppose can be filed as a performance bug in the Julia itself). The problem, however, is the fact that 2-argument function allocates and creates a lot of unnecessary pressure on the garbage collector in real applications. While it can be faster in an isolated single test, I doubt it will be faster in a real application where you need to perform this operation hundreds or thousands of times and allocate gigabytes of memory (that is what happened to us). Ineffective RAM usage slows down this kind of operation a lot as it requires to run GC more often. I don't have a computer right now to test how it performs in case when you need to perform dot many many times, but I can benchmark it later, |
The issue with temporal storage in a generic context is that you never know whether the storage has the right eltype. For instance, you could use a real
It's not a performance bug. It is because the 2-arg |
But nothing really stops the 3-arg dot to also use BLAS given appropriate eltypes or even call 2-arg version in some specific situations (e.g. as you suggested for |
The promise of 3-arg dot in Base is not to allocate an intermediate vector, so you can't compute first |
It would be better to replace
dot(z, Q * z)
(here https://github.com/JuliaStats/Distances.jl/blob/master/src/mahalanobis.jl#L83 and in other places) withdot(z, Q, z)
without storing the intermediate result of Q*z. The problem here is thatQ * z
allocates and sometimes this allocation might slowdown the overall execution by a lot.This is the small benchmark:
Another nice addition would be to provide a
tmp
storage for inplacez = a - b
operation (which also allocates). That would help if some executes the mahalanobis distance in a tight for loop.In our application, simply by changing the
dot
and providing tmp storage we reduced allocations from1.107791 seconds (20.21 M allocations: 1.141 GiB, 18.80% gc time)
to0.532219 seconds (10.12 M allocations: 393.585 MiB, 17.05% gc time)
.This is the sketch of our implementation (without checks):
The text was updated successfully, but these errors were encountered: