[WIP] Fix AD issues with various kernels #154

sharanry · 2020-08-16T09:14:39Z

sharanry · 2020-08-16T09:41:25Z

I think the reason for failure of Zygote with MahalanobisKernel is mutating arrays at
https://github.com/JuliaStats/Distances.jl/blob/44036b573ec85287022f4368c4e1e279698bd031/src/mahalanobis.jl#L75.

src/basekernels/nn.jl

test/basekernels/fbm.jl

test/basekernels/gabor.jl

sharanry · 2020-08-16T11:19:52Z

I think the reason for failure of Zygote with MahalanobisKernel is mutating arrays at
https://github.com/JuliaStats/Distances.jl/blob/44036b573ec85287022f4368c4e1e279698bd031/src/mahalanobis.jl#L75.

@devmotion Do you suggest I override their pairwise implementation for now or open an issue/PR to Distances.jl?

devmotion · 2020-08-16T11:27:48Z

I guess that should/could be resolved by adding a custom ChainRules-based adjoint for pairwise(::Mahalanobis, ...), similar to https://github.com/FluxML/Zygote.jl/blob/956575ee2c732dee25324b59ba43fbb471a52d9a/src/lib/distances.jl#L19-L25. It seems it's not yet decided if Distances would accept ChainRules PRs (see JuliaStats/Distances.jl#172), so one could either make a PR to Zygote or add some piracy to KernelFunctions for now, I guess.

sharanry · 2020-08-19T12:39:04Z

Sorry for the delay. I am having a hard time defining adjoints which aren't very computationally expensive for the Maha kernel.

devmotion · 2020-08-19T15:31:31Z

I guess you shouldn't need anything in particular for the Mahalanobis kernel but rather (just) a custom adjoint for the distance computations in Distances? In this case the Matrix cookbook, and in particular equations 72 and 81 are helpful. They show you that d((x-y)'*Q*(x-y))/dx = (Q + Q') * (x - y), d((x-y)'*Q*(x-y))/dy = - (Q + Q') * (x - y), and d((x-y)'*Q*(x-y))/dQ = (x - y) * (x - y)' (you should recheck that I didn't make any stupid mistakes 😄). These expressions aren't too bad, I guess, but in general the Mahalanobis distance isn't computationally cheap so it's not too surprising that the adjoints aren't either.

BTW I just noticed that the docstring of the kernel is incorrect since the distance computation (

KernelFunctions.jl/src/basekernels/maha.jl

Line 21 in f144045

metric(κ::MahalanobisKernel) = SqMahalanobis(κ.P)

) uses P but not the inverse of P (Distances doesn't use the inverse either, according to their README).

sharanry · 2020-08-20T07:31:42Z

@devmotion Thanks for pointing out the typo in the docstring!
The problem I am facing is for pairwise computation. Defining an efficient adjoint seems quite tricky based on the implementation here. https://github.com/JuliaStats/Distances.jl/blob/44036b573ec85287022f4368c4e1e279698bd031/src/mahalanobis.jl#L62

src/basekernels/maha.jl

src/zygote_adjoints.jl

devmotion · 2020-08-21T08:07:15Z

src/zygote_adjoints.jl

+        a_b = map(
+            x -> (first(last(x)) - last(last(x)))*first(x), 
+            zip(
+                Δ,
+                Iterators.product(eachslice(a, dims=dims), eachslice(b, dims=dims))
+            )
+        )
+        δa = reduce(hcat, sum(map(x -> B_B_t*x, a_b), dims=1))
+        δB = sum(map(x -> x*transpose(x), a_b))


I would assume it should be possible to vectorize this code? What's the mathematical formula that you use here?

It is the same equations you mentioned earlier.
d((x-y)'*Q*(x-y))/dx = (Q + Q') * (x - y), d((x-y)'*Q*(x-y))/dy = - (Q + Q') * (x - y), and d((x-y)'*Q*(x-y))/dQ = (x - y)' * (x - y) .
But this is being done for all pairwise combinations together using map. It later sums these differences to get \deltaB and others.
Please note that the current implementation is not correct. I am still debugging it. (it is only partially matching the intended result) If you happen to find any obvious mistakes please let me know. I am facing trouble in reducing the results of individual pairwise pullbacks to the final pullback. The way I am summing them is probably wrong.

julia> using Distances, Random; julia> rng = MersenneTwister(123); julia> M1, M2 = rand(rng, 2,3), rand(rng, 2,3); julia> dist = SqMahalanobis(rand(rng, 2,2)) SqMahalanobis{Float64}([0.8654121434083455 0.2856979003853177; 0.617491887982287 0.46384720826189474]) julia> pairwise(dist, M1, M2; dims=2) 3×3 Array{Float64,2}: 0.371673 0.856348 0.742803 0.0233992 0.274278 0.276694 -0.036568 0.118487 0.0748149 julia> map(x -> evaluate(dist, first(x), last(x)), Iterators.product(eachslice(M1, dims=2), eachslice(M2, dims=2))) 3×3 Array{Float64,2}: 0.541253 0.912421 0.673273 0.0886328 0.285181 0.192394 0.0868399 0.166227 0.0616321

@devmotion isn't this wrong or have I done something silly? They are equal in case of euclidean. I feel this is the root of the problem.

It should work if dist.qmat is positive definite: JuliaStats/Distances.jl#174

This still does not solve the differences in the computed adjoints for the covariance matrix Q. My current implementation matches the second adjoint.

julia> using Distances, LinearAlgebra, FiniteDifferences, Random julia> FiniteDifferences.to_vec(dist::SqMahalanobis{Float64}) = vec(dist.qmat), x -> SqMahalanobis(reshape(x, size(dist.qmat)...)) julia> rng = MersenneTwister(123); julia> M1, M2 = rand(rng,3,1), rand(rng,3,1) ([0.7684476751965699; 0.940515000715187; 0.6739586945680673], [0.3954531123351086; 0.3132439558075186; 0.6625548164736534]) julia> Q = Matrix(Cholesky(rand(rng, 3, 3), 'U', 0)) 3×3 Array{Float64,2}: 0.343422 0.0638007 0.507151 0.0638007 0.0386393 0.19528 0.507151 0.19528 1.21186 julia> isposdef(Q) true julia> dist = SqMahalanobis(Q); julia> fdm=FiniteDifferences.Central(5, 1); julia> j′vp(fdm, pairwise, ones(1,1), dist, M1, M2)[1].qmat #A 3×3 Array{Float64,2}: 0.139125 0.365187 -0.238366 0.102751 0.393469 -0.404876 0.246873 0.419183 0.000130048 julia> j′vp(fdm, evaluate, 1, dist, M1[:, 1], M2[:, 1])[1].qmat #B 3×3 Array{Float64,2}: 0.139125 0.233969 0.00425358 0.233969 0.393469 0.00715332 0.00425358 0.00715332 0.000130048

IMO it is best if (Sq)Mahalanobis distance is actually parameterized by the decomposition of Q, i.e, the upper or lower triangular matrix which is not constrained.

Yes, that would be the most natural way to ensure that it is always positive semi-definite (if the diagonal is non-negative) and optimization is performed in the correct space. So I guess users would want to use this parameterization even if it is not enforced by KernelFunctions and not directly supported by SqMahalanobis by using something like

function mykernel(L) idxs = diagind(L) @inbounds for i in idxs L[i] = softplus(L[i]) end return MahalanobisKernel(Array(L * L')) end

Of course, it would be nice if (Sq)Mahalanobis would support specifying e.g. a Cholesky decomposition or PDMat directly (it could even be used for simplifying the computations since x'*Q*x = (L'*x)'*(L'*x) in this case), but can't we work around this by checking gradients of the mykernel setup instead of computing Q -> MahalanobisKernel(Q) directly? That's at least how we do it in DistributionsAD, e.g. in https://github.com/TuringLang/DistributionsAD.jl/blob/a96b159ab25aab67d1a2076726e8b9c392eb6fc7/test/ad/distributions.jl#L18-L34.

but can't we work around this by checking gradients of the mykernel setup instead of computing Q -> MahalanobisKernel(Q) directly?

Yeah that should work. Will try that out.

Regarding the issue with pairwise implementation which messes up FiniteDifferences results, do you suggest I override the implementation for the time being?

If you test the suggested parameterization the implementation of pairwise shouldn't matter (since we do not test the intermediate step which might be affected by it).

True. Could we also change our side of the parametrization? i.e, the way it is stored in the struct. We could continue to allow initialization using a full matrix. This should allow for seamless AD regardless of how the user decides to initialize them.

I'm not sure if we want to do that, I think this deserves some discussion first (and then a separate PR possibly). Ideally, Distances would just support arbitrary matrices and contain optimized implementations for specific array types. We just forward P to SqMahalanobis, so ideally we wouldn't perform any transformations or computations. I'm also a bit worried that focusing on a specific parameterization might make it difficult for users who would like to use a different one (but still no dense matrix) or might lead to confusing behaviour.

sharanry · 2020-08-24T12:37:51Z

test/basekernels/maha.jl

+    @test_broken j′vp(fdm, x -> MahalanobisKernel(Array(x[1]'*x[1]))(x[2], x[3]), 1, [U, v1, v2]) ≈
+    Zygote.pullback(x -> MahalanobisKernel(Array(x[1]'*x[1]))(x[2], x[3]), [U, v1, v2])[2](1)
+    @test all(j′vp(fdm, x -> SqMahalanobis(Array(x[1]'*x[1]))(x[2], x[3]), 1, [U, v1, v2])[1][1] .≈ 
+    Zygote.pullback(x -> SqMahalanobis(Array(x[1]'*x[1]))(x[2], x[3]), [U, v1, v2])[2](1)[1][1])


@devmotion I tried doing what you suggested. The tests still fail. This error probably propagates and causes even the first test to fail.

julia> j′vp(fdm, x -> SqMahalanobis(Array(x[1]'*x[1]))(x[2], x[3]), 1, [U, v1, v2])[1][1] 3×3 UpperTriangular{Float64,Array{Float64,2}}: 0.228808 0.00318764 -0.107503 ⋅ -0.000391803 0.0132135 ⋅ ⋅ 0.0438772 julia> Zygote.pullback(x -> SqMahalanobis(Array(x[1]'*x[1]))(x[2], x[3]), [U, v1, v2])[2](1)[1][1] 3×3 Array{Float64,2}: 0.228808 0.00318764 -0.107503 -0.0281234 -0.000391803 0.0132135 -0.0933875 -0.00130103 0.0438772

To me your output indicates that it basically works apart from the fact that Zygote incorrectly returns a dense matrix instead of an upper triangular matrix. Since U was upper triangular, only the values above and on the diagonal should be returned.

FiniteDifferences if pretty good in matching the types. Zygote isn't. Do you suggest we manually check if the upper triangular part matches for now?

Edit: I don't we are addressing the major issue here. Our goal is to make the overall adjoint correct for kernelmatrix. So maybe defining a custom zygote adjoint for UpperTriangular which outputs a UpperTriangular might solve the problem.

Were the call to UpperTriangular inside the function, then the adjoint that you would get from Zygote would also be UpperTriangular. Maybe just do that?

devmotion · 2020-08-24T12:43:33Z

test/basekernels/maha.jl

+    fdm = FiniteDifferences.Central(5, 1);
+
+
+    FiniteDifferences.to_vec(dist::SqMahalanobis{Float64}) = vec(dist.qmat), x -> SqMahalanobis(reshape(x, size(dist.qmat)...))


Is this needed? If possible, we should avoid this type piracy.

Yes j′vp only works when there is a to_vec function defined for each argument.

I'm wondering since according to the docs to_vec is only needed for the inputs xs... but not the evaluated function f in j'vp(fdm, f, xs...).

From what I understand, it is also needed for objects like SqMahalanobis if they have parameters like qmat.

That's correct, but actually for some reason we've not made FiniteDifferences handle functions-with-data properly yet, so you'll have to build the SqMaha object inside of the function that you're differentiating.

test/basekernels/maha.jl

theogf

Somehow the solution to have to define kernelmatrix again for the NeuralNetworkKernel seems very hacky, isn't there another solution?

src/zygote_adjoints.jl

willtebbutt

This is looking good. Just some style things.

src/basekernels/nn.jl

src/zygote_adjoints.jl

devmotion · 2020-08-31T07:22:08Z

src/zygote_adjoints.jl

+        )
+        δa = reduce(hcat, sum(map(x -> B_Bᵀ*x, a_b), dims=2))
+        δB = sum(map(x -> x*transpose(x), a_b))
+        return (qmat=δB,), δa, -δa


There is som discrepancy between the simple case above and this pullback - intuitively, from the simple case above I would assume that δB = sum_{i, j} (a_i - b_j) * (a_i - b_j)^T * Δ_{i,j}. However, here you compute δB = sum_{i, j} (a_i - b_j) * (a_i - b_j)^T * Δ_{i,j}^2. Probably one of them is incorrect (table 7 in https://notendur.hi.is/jonasson/greinar/blas-rmd.pdf indicates that the pairwise one is incorrect). Can we add the derivation of the adjoints according to https://www.juliadiff.org/ChainRulesCore.jl/dev/arrays.html as docstrings or comments, or maybe even have a separate PR for the Mahalanobis fixes?

Thanks for pointing this out. I think a separate PR for mahalanobis fixes makes more sense.

devmotion · 2020-08-31T16:43:54Z

Somehow the solution to have to define kernelmatrix again for the NeuralNetworkKernel seems very hacky, isn't there another solution?

I guess one could define a "PreMetric" that evaluates dot(x, y) / sqrt((1 + sum(abs2, x)) * (1 + sum(abs2, y))) (or asin(dot(x, y) / sqrt((1 + sum(abs2, x)) * (1 + sum(abs2, y)))))), similar to DotProduct, and make NeuralNetworkKernel a SimpleKernel. But even in this case one might want to implement (a) specialized version(s) of pairwise, so I'm not sure how much one would gain.

sharanry · 2020-09-07T09:05:38Z

Can we merge this and tackle each of the remaining AD issues in separate PRs? It is getting increasingly tricky to address multiple issues at once.

Currently this PR does the following:

Defines kernelmatrix function for NeuralNetworkKernel.
Defines Zygote adjoints for Mahalanobis distance metric.
Zygote tests pass for Exponential, FBM, NN and Gabor kernels.

devmotion · 2020-09-07T09:20:50Z

IMO this PR contains already too many changes, we should just focus on one AD problem at a time.

Defines Zygote adjoints for Mahalanobis distance metric.

I thought the idea was not include these adjoints since they were missing a clean derivation/documentation and were incorrect? Or are you talking about the non-pairwise adjoints only?

sharanry · 2020-09-07T09:24:14Z

I thought the idea was not include these adjoints since they were missing a clean derivation/documentation and were incorrect? Or are you talking about the non-pairwise adjoints only?

I meant only the non-pairwise adjoint . I will be removing the pairwise adjoints for now.

src/zygote_adjoints.jl

test/basekernels/maha.jl

sharanry · 2020-09-07T13:21:22Z

Any objections to merging this?

willtebbutt

I have no objections other than these tiny style-related things. This is a great PR.

test/zygote_adjoints.jl

sharanry added 3 commits August 16, 2020 14:11

Zygote passes for Exponential and FBM kernel

a6211d0

Zygote passes NN kernel

8704f18

Zygote passes Gabor kernel

8f44c51

devmotion reviewed Aug 16, 2020

View reviewed changes

Address code review

14db1f4

Fix mutating arrays problem for maha kernel

90c1dff

devmotion reviewed Aug 20, 2020

View reviewed changes

src/basekernels/maha.jl Outdated Show resolved Hide resolved

sharanry added 2 commits August 21, 2020 12:54

Add adjoint for maha distance metric

dcf1f6b

Fix zygote adjoint

16e8af6

devmotion reviewed Aug 21, 2020

View reviewed changes

src/zygote_adjoints.jl Outdated Show resolved Hide resolved

devmotion reviewed Aug 21, 2020

View reviewed changes

src/zygote_adjoints.jl Outdated Show resolved Hide resolved

sharanry added 3 commits August 21, 2020 13:28

Fix adjoint typo

ede5879

Fix buggy version of pairwise adjoint

e8b76ec

Fix typo

e236aaf

devmotion reviewed Aug 21, 2020

View reviewed changes

Forgot to add adjoint macro

d50c73f

devmotion mentioned this pull request Aug 21, 2020

Check that the matrix of (Sq)Mahalanobis is positive-definite? JuliaStats/Distances.jl#174

Closed

sharanry added 2 commits August 22, 2020 13:38

Add pairwise sqmahalanobis adjoint and test of sqmahalanobis

090cc8a

Maha kernel tests

45c14d6

sharanry commented Aug 24, 2020

View reviewed changes

devmotion reviewed Aug 24, 2020

View reviewed changes

test/basekernels/maha.jl Outdated Show resolved Hide resolved

theogf reviewed Aug 24, 2020

View reviewed changes

devmotion reviewed Aug 24, 2020

View reviewed changes

src/zygote_adjoints.jl Outdated Show resolved Hide resolved

sharanry added 4 commits August 26, 2020 13:41

Fix zygote adjoint for mahalanobis

b920c19

Fix docs for matern

2630adc

Merge branch 'master' into sharan/fix-AD-issues

31730a8

Make maha tests more readable

e81cb01

willtebbutt reviewed Aug 30, 2020

View reviewed changes

willtebbutt mentioned this pull request Aug 30, 2020

test utils revamp #159

Merged

2 tasks

Address style issues

4c2f233

devmotion reviewed Aug 31, 2020

View reviewed changes

Fix bugs in tests and adjoints

0023292

Fix maha tests

acdec1a

Remove pairwise maha adjoints for now.

f467162

devmotion reviewed Sep 7, 2020

View reviewed changes

src/zygote_adjoints.jl Outdated Show resolved Hide resolved

devmotion reviewed Sep 7, 2020

View reviewed changes

test/basekernels/maha.jl Outdated Show resolved Hide resolved

sharanry added 2 commits September 7, 2020 15:38

Fix style issues

651ae02

Update maha.jl

6b114d2

willtebbutt approved these changes Sep 7, 2020

View reviewed changes

test/zygote_adjoints.jl Outdated Show resolved Hide resolved

test/zygote_adjoints.jl Outdated Show resolved Hide resolved

Fix style in zygote_adjoints.jl

8655911

sharanry merged commit 5c24f1c into master Sep 8, 2020

sharanry deleted the sharan/fix-AD-issues branch September 8, 2020 09:13

		fdm = FiniteDifferences.Central(5, 1);


		FiniteDifferences.to_vec(dist::SqMahalanobis{Float64}) = vec(dist.qmat), x -> SqMahalanobis(reshape(x, size(dist.qmat)...))

[WIP] Fix AD issues with various kernels #154

[WIP] Fix AD issues with various kernels #154

Conversation

sharanry commented Aug 16, 2020

sharanry commented Aug 16, 2020

sharanry commented Aug 16, 2020

devmotion commented Aug 16, 2020

sharanry commented Aug 19, 2020 • edited Loading

devmotion commented Aug 19, 2020 • edited Loading

sharanry commented Aug 20, 2020 • edited Loading

Choose a reason for hiding this comment

sharanry Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

sharanry Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sharanry Aug 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmotion Aug 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sharanry Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sharanry Aug 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

theogf left a comment

Choose a reason for hiding this comment

willtebbutt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devmotion commented Aug 31, 2020

sharanry commented Sep 7, 2020

devmotion commented Sep 7, 2020 • edited Loading

sharanry commented Sep 7, 2020

sharanry commented Sep 7, 2020

willtebbutt left a comment

Choose a reason for hiding this comment

sharanry commented Aug 19, 2020 •

edited

Loading

devmotion commented Aug 19, 2020 •

edited

Loading

sharanry commented Aug 20, 2020 •

edited

Loading

sharanry Aug 21, 2020 •

edited

Loading

sharanry Aug 21, 2020 •

edited

Loading

sharanry Aug 22, 2020 •

edited

Loading

devmotion Aug 24, 2020 •

edited

Loading

sharanry Aug 26, 2020 •

edited

Loading

sharanry Aug 26, 2020 •

edited

Loading

devmotion commented Sep 7, 2020 •

edited

Loading