-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: implementation of bounded LBFGS algorithm #1290
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
1 similar comment
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
@googlebot I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
1 similar comment
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
Hi Miguel, thanks very much for this PR. This is a really nice contribution, and it's great to see such thorough and well-documented code. I made a quick pass with a few minor style contents, but overall this looks to be in quite good shape. I do have a high-level concern around the use of ragged Tensors. As you note, they can be pretty finicky, they make it tricky to work with batches, and they raise compatibility issues with, e.g., our JAX backend. It's not a dealbreaker, but if you're willing, I'd like to try to help think through whether another approach could work. I haven't fully worked through the code yet---to help me understand, can you explain what the ragged Tensors are being used for, and why they need to be ragged? Before checking this in, we'll need to add some tests, along with corresponding entries in the BUILD file (you can use the entries for
It sounds like you already have some tests, so if you could adapt them to add to this PR that'd be a great start. Is there anything in particular you're not sure about? I'm happy to try to help figure out the best way to set things up.
|
w_b), | ||
state.ddf) | ||
# NOTE: See lbfgsb.f (l. 1649) | ||
# TODO: How to get machine epsilon? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype_util.eps(ddf.dtype)
should do it
# The number of correction pairs that have been collected so far. | ||
#num_elements = ps.minimum( | ||
# state.num_iterations, # TODO(b/162733947): Change loop state -> closure. | ||
# ps.shape(state.position_deltas)[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the commented code can be removed here
grad_tolerance, f_relative_tolerance, x_tolerance, | ||
stopping_condition, max_iterations, free_mask, cauchy_point): | ||
"""Performs the line search in given direction, backtracking in direction to the cauchy point, | ||
and clamping actively contrained variables to the cauchy point.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: our linter requires docstrings to be either in the specific short form
"""One-line docstring of at most 80 chars, ending with punctuation."""
or the long form, including descriptions of args:
"""One-line summary of at most 80 chars, ending with punctuation.
One or more paragraphs of additional details.
Args:
arg1: ...
arg2: ...
Returns:
return_value:
"""
Assuming that you don't want to write long docstrings for these internal methods, can you please try to fit the one-line docstring format? E.g., here you could maybe just say
"""Performs the line search in the given direction."""
and move the points about backtracking and clamping to comments for the relevant code.
(also applies elsewhere to other methods)
state.breakpoint_min_old) | ||
|
||
# Find b | ||
breakpoint_min_idx, breakpoint_min = \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is in the TFP style guide specifically, because it's more generally covered by Google Python style, but we prefer parentheses over backslashes for writing multiline expressions, e.g.:
breakpoint_min_idx, breakpoint_min = (
_cauchy_get_breakpoint_min(
state.breakpoints,
free_vars_idx))
though in this particular case you can probably just put the call on the same line.
(likely applies elsewhere)
Hello, Thank you for the kind words. I am again a bit swamped, so I will return to this when I can, but I have read all of your comments and have no objections. Re. ragged tensors: they were more in use (and a greater source of problems) previously, as I was keeping track of Right now, as far as I can tell and remember, ragged tensors are only in use in lines 576-590, to transpose differently between batches, and since these are immediately I have also found that the current approach evaluates the cost function far far more often than the Fortran implementation (about 10 times as much in my use case). Depending on how costly this is, it can be a real bottleneck (this turned out to be the case for me). I am working on implementing the direct primal method to circumvent the double use of |
Hi! Thank you! |
Hello @srvasude, I've uploaded the more recent changes I've made to the code. It remains to write unit tests, and modifying the code to accept more batch dimensions. The modifications of commit 41b2267 should help a lot with this latter point, because the queue no longer has the batch dimensions "in the middle", and are instead leading. This means a lot of the stuff that's using Commit 64d30a4 is meant to be a starting point for writing the unit tests. However, it would maybe be best to implement multiple batching dimensions before writing the tests. Miguel |
This commit implements the bounded version of the Limited-memory Broyden–Fletcher–Goldfarb–Shanno optimization algorithm, per [1], in
lbfgsb.py
. It is lacking unit tests (which have been run outside of the committed changes), because I don't understand how to properly implement them alongside, e.g., the LBFGS unit tests. The following unit tests were ran:[5., 5.]
in the Rosenbrock surface, when given bounds[(5, 20), (-10, 5)]
, which matches the original Fortran implementation.I now need help making sure the code is up to the style guide, per issue #1273 .
[1] Richard H. Byrd, Peihuang Lu, Jorge Nocedal, & Ciyou Zhu (1995).
A Limited Memory Algorithm for Bound Constrained Optimization
SIAM Journal on Scientific Computing, 16(5), 1190–1208.
https://doi.org/10.1137/0916069