Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove _get_ν #452

Merged
merged 14 commits into from
May 17, 2022
Merged

Remove _get_ν #452

merged 14 commits into from
May 17, 2022

Conversation

devmotion
Copy link
Member

@devmotion devmotion commented Apr 13, 2022

Fix the bug introduced in #425 (see #450).

Removes the workaround.

src/basekernels/matern.jl Outdated Show resolved Hide resolved
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@theogf
Copy link
Member

theogf commented Apr 13, 2022

Is it possible to test this?

@codecov
Copy link

codecov bot commented Apr 13, 2022

Codecov Report

Merging #452 (4c45fe8) into master (35de8d2) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #452      +/-   ##
==========================================
- Coverage   93.18%   93.16%   -0.02%     
==========================================
  Files          52       52              
  Lines        1261     1259       -2     
==========================================
- Hits         1175     1173       -2     
  Misses         86       86              
Impacted Files Coverage Δ
src/TestUtils.jl 94.11% <100.00%> (ø)
src/basekernels/matern.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 35de8d2...4c45fe8. Read the comment docs.

@devmotion
Copy link
Member Author

The rule was wrong, so it seems it's not covered by the tests? Why was this workaround introduced?

@devmotion
Copy link
Member Author

It seems in #425 the commented out test_ADs(x->MaternKernel(nu=first(x)),[ν]) was replaced with test_ADs(() -> MaternKernel(; nu=ν)). Was this intended? And could this explain why the workaround is not tested?

@st--
Copy link
Member

st-- commented Apr 13, 2022

It seems in #425 the commented out test_ADs(x->MaternKernel(nu=first(x)),[ν]) was replaced with test_ADs(() -> MaternKernel(; nu=ν)). Was this intended? And could this explain why the workaround is not tested?

Yes, that was intended, as discussed on that PR (#425 (comment)), where @theogf wrote

I am proposing that we just ban differentiation through \nu and stop testing for it.

And while the change

-test_ADs(x->MaternKernel(nu=first(x)),[ν])
+test_ADs(() -> MaternKernel(; nu=ν))

worked for Forward- and ReverseDiff, the additional workaround in #425 was required for Zygote to also pass the AD test (w.r.t. the input features, not the kernel hyperparameters!). So the workaround is tested by virtue of the Zygote AD test on MaternKernel now also passing.

@devmotion
Copy link
Member Author

So the workaround is tested by virtue of the Zygote AD test on MaternKernel now also passing.

The main problem is that tests passed successfully for 08437cc even though the pullback is clearly incorrect. So I was wondering if the pullback was evaluated at all.

@@ -37,8 +37,18 @@ MaternKernel(; nu::Real=1.5, ν::Real=nu, metric=Euclidean()) = MaternKernel(ν,

@functor MaternKernel

# workaround for Zygote
# unclear why it's needed but it is fine since it's stated officially that we don't support differentiation with respect to ν
Copy link
Member

@st-- st-- Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this workaround, the Zygote AD test fails. You can see this in 30385ac and the corresponding build

@st--
Copy link
Member

st-- commented Apr 13, 2022

The main problem is that tests passed successfully for 08437cc even though the pullback is clearly incorrect.

What was incorrect about the pullback?

So I was wondering if the pullback was evaluated at all.

It was, I only added it because without Zygote errored, see #452 (comment)

@devmotion
Copy link
Member Author

What was incorrect about the pullback?

It returned the wrong number of outputs (was missing NoTangent() derivative for the function itself).

Without this workaround, the Zygote AD test fails. You can see this in 30385ac and the corresponding build

The problem is just that https://github.com/FluxML/Zygote.jl/blob/95e10218f0187a839ac0539fefcbf39c727bf068/src/lib/array.jl#L30 does not support NotImplemented. This should have been fixed in Zygote, introducing workarounds in KernelFunctions is the wrong approach.

@st--
Copy link
Member

st-- commented Apr 13, 2022

This should have been fixed in Zygote, introducing workarounds in KernelFunctions is the wrong approach.

Yeah, I completely agree, but unfortunately it hasn't been fixed in Zygote, and I think "workaround + AD is working" is still preferable over "no workaround + AD is broken".

@devmotion
Copy link
Member Author

Instead of making a PR to KernelFunctions introducing a workaround you could have made one for Zygote that fixes the issue 🤔

@st--
Copy link
Member

st-- commented Apr 13, 2022

Instead of making a PR to KernelFunctions introducing a workaround you could have made one for Zygote that fixes the issue 🤔

I managed to figure out how to fix the issue with a workaround in KernelFunctions. That doesn't mean I would immediately know how to identify & fix the underlying issue upstream. If I did know how to fix it there, I would have, as I have in various other cases. You seem to have an amazing overview as well as deep understanding of a lot of different packages. The ability and time for that is unfortunately not something that everyone has available in the same amount. I'm assuming you're not intending to, but you give the impression of being rather disparaging of other people who're not thinking the same way you are.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@devmotion
Copy link
Member Author

IMO usually it is very helpful to open at least Github issues in cases where it is not immediately clear how to fix the issue. Workarounds can be acceptable if they are fixed upstream eventually, but I assume it is unlikely that they are fixed if nobody makes the Zygote developers aware of it.

Generally, I would have given you this advice in your PR but as desired I don't review your PRs anymore.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@st--
Copy link
Member

st-- commented Apr 13, 2022

@devmotion I've pared it down to a minimal reproducible example: FluxML/Zygote.jl#1204 but I can't comment on anything else, if you've got further thoughts on what the issue is / what's needed to fix it, please comment there.

@devmotion
Copy link
Member Author

Test errors seem unrelated? I didn't change anything related to them, I think.

@devmotion devmotion changed the title Fix derivative of _get_ν Remove _get_ν Apr 14, 2022
@willtebbutt
Copy link
Member

Test errors seem unrelated? I didn't change anything related to them, I think.

Agreed. Maybe we need to relax a tolerance somewhere?

@st--
Copy link
Member

st-- commented Apr 14, 2022

Test errors seem unrelated? I didn't change anything related to them, I think.

Agreed. Maybe we need to relax a tolerance somewhere?

Yes, I think it's due to the changes in #447; the tests are stochastic (if you just re-run the failed tests, they might pass the next time - I've seen that in #454 as well which has zero code changes). In #447 I reduced the size of the kernel matrix evaluations; the isapprox() call (set with atol=1e-9, rtol=1e-9) is on the arrays as a whole, which does not look at individual elements, but instead checks whether the norm of the difference is within the tolerances, relative to the max of the norm of the individual matrices (see discussion). For the random inputs we're using in the tests, the norm of the kernel matrix is linear in the dimensions. So by reducing the size from ~1000 to ~10 we're reducing the norm by a factor ~100, so yes that suggests to me we should increase the rtol correspondingly.

Alternatively, we could change it to element-wise isapprox comparison? then it's independent of the matrix norm.

@devmotion
Copy link
Member Author

Shouldn't the scale be irrelevant if we use rtol? I think it's problematic that we don't use rtol in most of the tests but only in

@test kernelmatrix_diag(k, x0) kernelmatrix_diag(k, x0, x0) atol = atol rtol = rtol
@test kernelmatrix(k, x0) kernelmatrix(k, x0, x0) atol = atol rtol = rtol
. Probably because otherwise tests failed - which IMO, however, would not indicate a problem with rtol per se but that the default already was already too low even before the recent changes. I completely agree with https://discourse.julialang.org/t/unit-test-equality-for-arrays/8138/12?u=devmotion that in general you want rtol since it is completely unclear what a meaningful atol is in general. Specifying a small atol as well mainly helps with comparisons that involve exact 0s which would fail otherwise.

@st--
Copy link
Member

st-- commented Apr 14, 2022

I think it's problematic that we don't use rtol in most of the tests

Yes, I agree. Probably an oversight in the original test code of not realizing that by only setting atol you end up setting rtol=0...

@devmotion
Copy link
Member Author

Apparently #455 was not sufficient.

@devmotion devmotion requested a review from theogf April 19, 2022 18:19
@devmotion
Copy link
Member Author

Bump.

1 similar comment
@devmotion
Copy link
Member Author

Bump.

@theogf
Copy link
Member

theogf commented May 8, 2022

Could we keep the global atol and rtol variables?

@devmotion
Copy link
Member Author

I don't mind. It just seemed unnecessary to have global variables that are used only in a single place, and clearer (ie easier to spot) if the values are just used as default keyword arguments in the function definition.

@devmotion
Copy link
Member Author

@theogf Did you see my last comment? Would you like it to be reverted?

@theogf
Copy link
Member

theogf commented May 17, 2022

I think keeping the global const is a bit more elegant and easily findable since it's in the same file. But again I do not mind, I'll just accept the PR and do what you want 😅

@devmotion devmotion merged commit e0a7af6 into master May 17, 2022
@devmotion devmotion deleted the dw/fix_matern branch May 17, 2022 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants