Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow optimization based acquisition functions #6

Open
kiudee opened this issue Jan 7, 2020 · 5 comments
Open

Allow optimization based acquisition functions #6

kiudee opened this issue Jan 7, 2020 · 5 comments
Labels
enhancement New feature or request Priority: Low

Comments

@kiudee
Copy link
Owner

kiudee commented Jan 7, 2020

Computation of acquisition functions only on sampled points is problematic in high-dimensional spaces, where the distance to the true optimum (of the acquisition function) will be large on average.

@michaelosthege
Copy link

Hi @kiudee,
The package looks quite good! I'm particularly interested in Thompson Sampling, however my problem is high dimensional so I need to use the approximation from Hernández-Lobato 2014.
Did you have a look into it already?
Cheers

@kiudee
Copy link
Owner Author

kiudee commented Mar 14, 2020

Hi @michaelosthege, thank you for the reference.
One of my near future todos was to support parallel evaluation of configurations, which are not simply implemented using a constant liar strategy.
Thompson Sampling already seemed like the most natural acquisition function for parallelization.

From having a cursory glance at the paper, it looks like it should be straightforward to implement.

@michaelosthege
Copy link

The paper was accompanied by a MATLAB implementation for which I also found a Python port.

Unfortunately, due to the code formatting & absence of useful docstrings/comments I found it hard to understand. I have trouble understanding which values I'll have to pass & which constraints this approximation may put on my GP model..

@kiudee
Copy link
Owner Author

kiudee commented Mar 19, 2020

In your case I would look at quadrature fourier features to implement the Thompson sampling part:
https://papers.nips.cc/paper/8115-efficient-high-dimensional-bayesian-optimization-with-additivity-and-quadrature-fourier-features
I found an implementation here:
https://github.com/Mojusko/QFF

Use it in conjunction with predictive variance reduction search (PVRS):
https://bayesopt.github.io/papers/2017/13.pdf
which works similar to PES but is faster and simpler to implement.

edit: In bayes-skopt PVRS is implemented as follows:

class PVRS(FullGPAcquisition):
"""Implements the predictive variance reduction search algorithm.
The algorithm draws a set of Thompson samples (samples from the optimum
distribution) and proposes the point which reduces the predictive variance
of these samples the most.
References
----------
[1] Nguyen, Vu, et al. "Predictive variance reduction search." Workshop on
Bayesian optimization at neural information processing systems (NIPSW). 2017.
"""
def __call__(self, X, gp, *args, n_thompson=10, random_state=None, **kwargs):
n = len(X)
thompson_sample = gp.sample_y(
X, sample_mean=True, n_samples=n_thompson, random_state=random_state
)
thompson_points = np.array(X)[np.argmin(thompson_sample, axis=0)]
covs = np.empty(n)
for i in range(n):
X_train_aug = np.concatenate([gp.X_train_, [X[i]]])
K = gp.kernel_(X_train_aug)
if np.iterable(gp.alpha):
K[np.diag_indices_from(K)] += np.concatenate([gp.alpha, [0.0]])
L = cholesky(K, lower=True)
K_trans = gp.kernel_(thompson_points, X_train_aug)
v = cho_solve((L, True), K_trans.T)
cov = K_trans.dot(v)
covs[i] = np.diag(cov).sum()
return covs

Note, however that this is sampling from a random set of points instead of using fourier features.

@michaelosthege
Copy link

michaelosthege commented Apr 14, 2020

hi @kiudee,
Thank you for those links. It took me a while to read them and also Bradford, 2018 which has a really good description of the RFF algorithm.

My impression is that the QFF make some assumptions about independence of the problem dimensions, that we don't feel comfortable about with our problem. (That might change though, if we run into problems with computational complexity 😅)

My RFF implementation still has a bug, resulting in too much posterior variance, but I think it's just a **2 or numpy.log somewhere..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Priority: Low
Projects
None yet
Development

No branches or pull requests

2 participants