Brent

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

nr.minimizer
Class Brent

java.lang.Object
  nr.minimizer.VecMinimizerImp
      nr.minimizer.Brent

All Implemented Interfaces:: VecMinimizer

public class Brent
extends VecMinimizerImp
extends VecMinimizerImp

Minimizes a function of several variables using Brent's modification of Powell's method. Note: as of now, this algorithm does not work. Do not use it. The method is described in Brent, RP (1973) Algorithms for Minimization Without Derivatives. Prentice Hall: New Jersey. The discussion is unfortunately not available on the web, though the original Powell's method is discussed in Numerical Recipes, and they mention Brent's modification. The original routine in FORTRAN is called PRAXIS, and is available from many sources (search the web for brent praxis. One such source is www.psc.edu/~burkardt/src/praxis .
The algorithm works well for smooth functions (ones where the second derivative exists). These sorts of functions look like a paraboloid near the minimum. This discussion assumes you understand that statment; if not, review your multivariate calculus and Numerical Recipes chapter 10.
The basic idea is that we need to have a one-dimensional minimization algorithm and a set of directions (vectors) that are linearly independent, so we can minimize along each one in turn and search all of the domain of the function. To "minimize along a direction u from point x" means, if our function is f(x) with x a vector of size n,

construct the function of one variable g(b) = f(x + b*u)
Using our one-dimensional minimization algorithm, find b' that minimizes g(b)
Replace x with x + b'*u

The direction set basic procedure is:

Start with an initial guess for the minimum, x'
Start with a list of n linearly independent vectors, u[j]
Minimize along each u[j] in turn, updating x' each time.

Since the us span all n-space, one would hope that this would bring us to the minimum. Unfortunately, each successive scalar minimization ruins the minimization from before. We could repeat the basic procedure over and over again until it converges, but that could be very slow (see Numerical Recipes chapter 10 for more discussion of this).
What we want is for the us to not only be linearly independent, but to be conjugate, which means that minimizing along one direction does not spoil the other directions. Powell's algorithm finds a mutually conjugate set of directions for a paraboloid:

Set oldX' = x'
Do the basic procedure
Create a new direction, u' = x' - oldX'. This is the average direction moved in the basic procedure.
Minimize along u'.
Replace one of the original u[j] with u'

Each time Powell's algorithm is run, the new u' is conjugate to all the previous u's (for a paraboloid), so after n runs through the algorithm, the directions are all conjugate and the last basic procedure got us to the minimum.
The algorithm doesn't specify which original u[j] to replace; it works no matter what you choose. The original algorithm just picked the first available, but empirically it seems to converge faster if we pick the one that made the best improvement in f(x) during the basic procedure. See the discussion in Numerical Recipes chapter 10.5, section "Discarding the Direction of Largest Decrease."
If the function is not a paraboloid, then we could repeat Powell's algorithm over and over until it converges, but replacing a vector by the difference of vectors tends to make the set linearly dependent, so the u[j]s don't span all of space and only minimize a subspace of the true domain, so we won't necessarily find the true minimum.
The answer is to reset the u[j]s to a set of orthogonal vectors after doing Powell's algorithm n times. Brent found an amazing result that, for a paraboloid, gives an orthogonal and conjugate set of directions, and even if the function is far from paraboloid, still gives an orthogonal set, so Powell's algorithm still works. Resetting the directions is:

Create the nxn matrix U whose columns are the vectors u[j]
Determine the approximate second derivative d[j] along each direction u[j]. If, as above, g(b) = f(x + b*u), then for any three values of b, b₁,b₂, and b₃, with values of g of g₁,g₂, and g₃, the first derivatives are f'₁₂ = (g₁-g₂)/(b₁-b₂) and f'₂₃ = (g₂-g₃)/(b₂-b₃) and the second derivative is d = 2 * (f'₁₂-f'₂₃)/ (b₁-b₃). If we have done a scalar minimization in direction u, we should have evaluated g(b) at a few points already, so finding d is easy. (We can even have the scalar minimization routine do it automatically).
Determine the diagonal matrix D with D[j][j] = 1/sqrt(d[j]). If the function is anywhere close to a minimum of a paraboloid, the second derivative is > 0 (if it is not, Brent just sets d[j] = an empiric small number).
Set U = U*D
Find the singular value decomposition of U and set U to the left-hand orthogonal matrix of that decomposition. For the algorithms in package nr, this is just SingularValueDecomposition decomp = new SingularValueDecomposion (U);
Set the second derivative estimates to d[j] = 1/decomp.getSingularValues()[j]^2

The columns of U are now the desired new u[j], with no new function evaluations and only one order n³ method, the singular value decomposition.
Brent does a few other tricks:

Laziness: Brent points out that the first run through Powell's algorithm there are no other "new" directions to be conjugate to, so there's no reason to replace any of the old directions. So he just minimizes along one direction, then does Powell's algorithm n-1 times to make the directions conjugate. There is no point in minimizing along successive non-conjugate directions, since each minimization will likely ruin the last one.
Random jump: If the function has very narrow valleys (which we can tell if the second derivatives in different directions have vastly different orders of magnitude) then rounding errors may keep the algorithm from finding its way down the valley. Brent calls this an "ill-conditioned" problem. If it is ill-conditioned, before starting Powell's algorithm, move x' to a random point a little bit away ("little bit" is determined by the second derivatives and the machine precision).
Twisty little passage: If the valley is curved, then assuming a paraboloid won't turn the corners and convergence will be slow. Before resetting the directions, Brent uses the current value of x' and the previous 2 values at this point in the algorithm to approximate a curve and uses the scalar minimizer to minimize along that curve. If the algorithm has been done less than 3 times, just record the point for future times.
Automatic scaling: For the algorithm to be efficient, all the components of x should be approximately on the same scale. Brent's original algorithm incorporates ways to try to automatically do this, but the version implemented here does not. The user must scale his variables prior to calling VecMinimizerImp.minimize(Vec).

So Brent's algorithm is:

Minimize along u[1].
For n-1 times:
- If the problem is ill-conditioned, do a Random jump.
- Do Powell's algorithm.
- If f(x') didn't improve much and the problem was not ill-conditioned, consider it ill-conditioned and restart the algorithm.
- If x' didn't change much, then exit with success. Note that if x' didn't change, then f didn't change, which (by the previous step) we always end up considering the problem ill-conditioned and doing a Random jump before exiting.
Do a Twisty little passage step.
Reset the directions. Check the values of d[j] to see if the problem is ill-conditioned
Exit with failure. We did not converge with this execution of Brent's algorithm, but hopefully left us with an improved x', a set of conjugate and orthogonal directions u, and the corresponding estimates of the second derivative d.

So the final algorithm is:

Set the direction vectors to the unit vectors in each direction (U[i,j] = i=j ? 1 : 0).
Set the second derivative estimates to 0 (d[j] = 0).
Repeat Brent's algorithm until it exits with success.
If desired (if you are worried about twisty narrow valleys in which the algorithm might get stuck without being at the minimum), take a Random jump and restart. Brent restarts at least once.

The linear minimization is intentionally a quick but poor one. The idea is that there is no point in perfectly minimizing the function along one particular direction, since the algorithm will have to take many steps in many directions anyway. The minimizer goes to the exact minimum for a perfect parabola, and tries to find a point that is lower than the current one for any other function.
The linear minimization algorithm takes the value at the current point, evaluates the function one "stepsize" away, and uses these two points plus the estimate of the second derivative to extrapolate a parabola and determines where the minimum of that parabola is. It then evaluates the function at that point, and if the value is less than the previous ones, it updates the estimate of the second derivative using the three points evaluated and returns the location of the minimum.
If the value at the predicted minimum is more than the other ones, then the current function is not a good parabola. Try a few points closer to the original looking for a lower function value (the algorithm uses just 2 tries). Return the lowest point found (possibly the original point) and set the second derivative to zero.
If we start with no good estimate of the second derivative, evaluate the function another "stepsize" away and use those three points to extrapolate a parabola, then proceed as above.
Determining the stepsize is the hard part. If there is a good second derivative estimate, use that as an estimate of the width of the parabola and use that and the desired tolerance to get a step size. If we do not have a second derivative estimate, use the size of the last change in x' as the stepsize. If this is the first time, use the length of x' as a crude guess.
This is not the exact step size determining algorithm that Brent used, but it seems simpler and seems to work.
The algorithm converges when each component of the change in x' is less than the corresponding element of minDeltaX. minDeltaX[i] is defined as epsilon * abs (original value of x' at the start of the algorithm[i], or just epsilon if that is zero. Thus, the initial guess for the minimum also sets the scale for the convergence.

Field Summary

Fields inherited from class nr.minimizer.VecMinimizerImp
`MAXFUNCTIONEVAL`

Constructor Summary
`Brent(ScalarFunction f)` create a new instance of Brent.

Method Summary
`static void`	`main(java.lang.String[] args)` test suite
`static java.lang.String`	`name()`

Methods inherited from class nr.minimizer.VecMinimizerImp
`getEpsilon, minimize, numFuncEvals, setEpsilon`

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Brent

public Brent(ScalarFunction f)

create a new instance of Brent.

Parameters:: f - the ScalarFunction to minimize

Method Detail

name

public static java.lang.String name()

main

public static void main(java.lang.String[] args)

test suite

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

nr.minimizer Class Brent

Brent

name

main

nr.minimizer
Class Brent