nr.minimizer
Class Brent

java.lang.Object
  extended by nr.minimizer.VecMinimizerImp
      extended by nr.minimizer.Brent
All Implemented Interfaces:
VecMinimizer

public class Brent
extends VecMinimizerImp

Minimizes a function of several variables using Brent's modification of Powell's method. Note: as of now, this algorithm does not work. Do not use it. The method is described in Brent, RP (1973) Algorithms for Minimization Without Derivatives. Prentice Hall: New Jersey. The discussion is unfortunately not available on the web, though the original Powell's method is discussed in Numerical Recipes, and they mention Brent's modification. The original routine in FORTRAN is called PRAXIS, and is available from many sources (search the web for brent praxis. One such source is www.psc.edu/~burkardt/src/praxis.
The algorithm works well for smooth functions (ones where the second derivative exists). These sorts of functions look like a paraboloid near the minimum. This discussion assumes you understand that statment; if not, review your multivariate calculus and Numerical Recipes chapter 10.
The basic idea is that we need to have a one-dimensional minimization algorithm and a set of directions (vectors) that are linearly independent, so we can minimize along each one in turn and search all of the domain of the function. To
"minimize along a direction u from point x" means, if our function is f(x) with x a vector of size n,

  1. construct the function of one variable g(b) = f(x + b*u)
  2. Using our one-dimensional minimization algorithm, find b' that minimizes g(b)
  3. Replace x with x + b'*u

The direction set basic procedure is:
  1. Start with an initial guess for the minimum, x'
  2. Start with a list of n linearly independent vectors, u[j]
  3. Minimize along each u[j] in turn, updating x' each time.

Since the us span all n-space, one would hope that this would bring us to the minimum. Unfortunately, each successive scalar minimization ruins the minimization from before. We could repeat the basic procedure over and over again until it converges, but that could be very slow (see Numerical Recipes chapter 10 for more discussion of this).
What we want is for the us to not only be linearly independent, but to be conjugate, which means that minimizing along one direction does not spoil the other directions. Powell's algorithm finds a mutually conjugate set of directions for a paraboloid:
  1. Set oldX' = x'
  2. Do the basic procedure
  3. Create a new direction, u' = x' - oldX'. This is the average direction moved in the basic procedure.
  4. Minimize along u'.
  5. Replace one of the original u[j] with u'

Each time Powell's algorithm is run, the new u' is conjugate to all the previous u's (for a paraboloid), so after n runs through the algorithm, the directions are all conjugate and the last basic procedure got us to the minimum.
The algorithm doesn't specify which original u[j] to replace; it works no matter what you choose. The original algorithm just picked the first available, but empirically it seems to converge faster if we pick the one that made the best improvement in f(x) during the basic procedure. See the discussion in Numerical Recipes chapter 10.5, section "Discarding the Direction of Largest Decrease."
If the function is not a paraboloid, then we could repeat Powell's algorithm over and over until it converges, but replacing a vector by the difference of vectors tends to make the set linearly dependent, so the u[j]s don't span all of space and only minimize a subspace of the true domain, so we won't necessarily find the true minimum.
The answer is to reset the u[j]s to a set of orthogonal vectors after doing Powell's algorithm n times. Brent found an amazing result that, for a paraboloid, gives an orthogonal and conjugate set of directions, and even if the function is far from paraboloid, still gives an orthogonal set, so Powell's algorithm still works. Resetting the directions is:
  1. Create the nxn matrix U whose columns are the vectors u[j]
  2. Determine the approximate second derivative d[j] along each direction u[j]. If, as above, g(b) = f(x + b*u), then for any three values of b, b1,b2, and b3, with values of g of g1,g2, and g3, the first derivatives are f'12 = (g1-g2)/(b1-b2) and f'23 = (g2-g3)/(b2-b3) and the second derivative is d = 2 * (f'12-f'23)/ (b1-b3). If we have done a scalar minimization in direction u, we should have evaluated g(b) at a few points already, so finding d is easy. (We can even have the scalar minimization routine do it automatically).
  3. Determine the diagonal matrix D with D[j][j] = 1/sqrt(d[j]). If the function is anywhere close to a minimum of a paraboloid, the second derivative is > 0 (if it is not, Brent just sets d[j] = an empiric small number).
  4. Set U = U*D
  5. Find the singular value decomposition of U and set U to the left-hand orthogonal matrix of that decomposition. For the algorithms in package nr, this is just SingularValueDecomposition decomp = new SingularValueDecomposion (U);
  6. Set the second derivative estimates to d[j] = 1/decomp.getSingularValues()[j]^2
The columns of U are now the desired new u[j], with no new function evaluations and only one order n3 method, the singular value decomposition.
Brent does a few other tricks: So Brent's algorithm is:
  1. Minimize along u[1].
  2. For n-1 times:
  3. Do a Twisty little passage step.
  4. Reset the directions. Check the values of d[j] to see if the problem is ill-conditioned
  5. Exit with failure. We did not converge with this execution of Brent's algorithm, but hopefully left us with an improved x', a set of conjugate and orthogonal directions u, and the corresponding estimates of the second derivative d.

So the final algorithm is:
  1. Set the direction vectors to the unit vectors in each direction (U[i,j] = i=j ? 1 : 0).
  2. Set the second derivative estimates to 0 (d[j] = 0).
  3. Repeat Brent's algorithm until it exits with success.
  4. If desired (if you are worried about twisty narrow valleys in which the algorithm might get stuck without being at the minimum), take a Random jump and restart. Brent restarts at least once.

The linear minimization is intentionally a quick but poor one. The idea is that there is no point in perfectly minimizing the function along one particular direction, since the algorithm will have to take many steps in many directions anyway. The minimizer goes to the exact minimum for a perfect parabola, and tries to find a point that is lower than the current one for any other function.
The linear minimization algorithm takes the value at the current point, evaluates the function one "stepsize" away, and uses these two points plus the estimate of the second derivative to extrapolate a parabola and determines where the minimum of that parabola is. It then evaluates the function at that point, and if the value is less than the previous ones, it updates the estimate of the second derivative using the three points evaluated and returns the location of the minimum.
If the value at the predicted minimum is more than the other ones, then the current function is not a good parabola. Try a few points closer to the original looking for a lower function value (the algorithm uses just 2 tries). Return the lowest point found (possibly the original point) and set the second derivative to zero.
If we start with no good estimate of the second derivative, evaluate the function another "stepsize" away and use those three points to extrapolate a parabola, then proceed as above.
Determining the stepsize is the hard part. If there is a good second derivative estimate, use that as an estimate of the width of the parabola and use that and the desired tolerance to get a step size. If we do not have a second derivative estimate, use the size of the last change in x' as the stepsize. If this is the first time, use the length of x' as a crude guess.
This is not the exact step size determining algorithm that Brent used, but it seems simpler and seems to work.
The algorithm converges when each component of the change in x' is less than the corresponding element of minDeltaX. minDeltaX[i] is defined as epsilon * abs (original value of x' at the start of the algorithm[i], or just epsilon if that is zero. Thus, the initial guess for the minimum also sets the scale for the convergence.


Field Summary
 
Fields inherited from class nr.minimizer.VecMinimizerImp
MAXFUNCTIONEVAL
 
Constructor Summary
Brent(ScalarFunction f)
          create a new instance of Brent.
 
Method Summary
static void main(java.lang.String[] args)
          test suite
static java.lang.String name()
           
 
Methods inherited from class nr.minimizer.VecMinimizerImp
getEpsilon, minimize, numFuncEvals, setEpsilon
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Brent

public Brent(ScalarFunction f)
create a new instance of Brent.

Parameters:
f - the ScalarFunction to minimize
Method Detail

name

public static java.lang.String name()

main

public static void main(java.lang.String[] args)
test suite