What numerical optimizers can operate with only gradients, and no explicit value of the objective?

394 Views Asked by At

I have an optimization problem that involves minimizing a function whose gradient I know, but the actual value of objective function at any point is unknown.

I'd like to optimize the function using BFGS, but all of the BFGS implementations I've found seem to require knowledge of the value of the objective, especially in the line search step. I've looked at both a python (scipy) and C++ implementation of BFGS.

Obviously I can use gradient descent, but I'd prefer not to reinvent the wheel here.

Any ideas?

Some more detail: I want to minimize h. But I'm not given h. What I'm given is h = f(g), and an explicit formula for g(x). f basically transforms the gradients of g in a kind of tricky geometric way that is not too difficult to calculate, but impossible to integrate. So, it's pretty straightforward to calculate the gradient of h(x), but hard to get explicit values for h(x).

4

There are 4 best solutions below

0
On BEST ANSWER

After spending some time thinking about this, I think the answer is to just adapt a quasi-newton method like BFGS. The only place that the function value enters into the BFGS computation is in the line search section, the first Wolfe condition.

I think the solution is to instead use a line search method that doesn't check the first Wolfe condition (Armijo rule).

I implemented it for BFGS in python and C++: https://gist.github.com/rmcgibbo/4735287. On second though though, I think you could get the same result by supplying the BFGS routine with a function that is always decreasing (e.g. it contains a counter tracking the # of times it has been called, and always returns a smaller number than it did the last time you called it). The decrease has to be big enough that you always pass the Armijo rule (http://en.wikipedia.org/wiki/Wolfe_conditions).

1
On

I believe you've reduced the problem to a one of finding roots. You could use one of the root finders in scipy, then you simply have to check to see if that point is a minimum, maximum or inflection point.

5
On

In that case, try minimizing h(x) to the power two. This is because you are essentially searching for points where h(x) is close to zero. You can convexify it by squaring it and running your parameter search.

EDIT: sorry, what i meant was h(x) is the gradient ..

0
On

Maybe talking about a simpler example will help. Take some scalar y=f(x). The gradient of y is df/dx . If you know the derivative everywhere, you can easily (!!) determine the value of f(x) either analytically or via numerical integration, but with an undeterminable global constant. The old "integral(f(x)dx) = F(x) + C " trick. So, unless you can anchor your h function at at least one point, you can't solve the problem. You can track down the location of the minimum x such that h(x) is min), but not the value of h(x)