nr.minimizer
Class Brent
java.lang.Object
nr.minimizer.VecMinimizerImp
nr.minimizer.Brent
- All Implemented Interfaces:
- VecMinimizer
public class Brent
- extends VecMinimizerImp
Minimizes a function of several variables using Brent's modification of Powell's method.
Note: as of now, this algorithm does not work. Do not use it.
The method is described in Brent, RP (1973) Algorithms for Minimization
Without Derivatives.
Prentice Hall: New Jersey. The discussion is unfortunately not available on the web, though the
original Powell's method is discussed in
Numerical Recipes,
and they mention Brent's modification.
The original routine in FORTRAN is called PRAXIS, and is available from many sources
(search the
web for brent praxis
. One such source is
www.psc.edu/~burkardt/src/praxis.
The algorithm works well for smooth functions (ones where the second derivative exists).
These sorts of functions look like a paraboloid near the minimum. This discussion assumes
you understand that statment; if not, review your multivariate calculus and Numerical
Recipes chapter 10.
The basic idea is that we need to have a one-dimensional minimization algorithm and
a set of directions (vectors) that are linearly independent, so we can minimize
along each one in turn and search all of the domain of the function. To
"minimize
along a direction u from point x"
means, if our function is f(x)
with x a vector of size n,
-
construct the function of one variable g(b) =
f(x + b*u)
-
Using our one-dimensional minimization algorithm, find b'
that minimizes g(b)
-
Replace x with x + b'*u
-
The direction set basic procedure is:
-
Start with an initial guess for the minimum, x'
-
Start with a list of n linearly independent vectors,
u[j]
-
Minimize along each u[j] in turn, updating x'
each time.
Since the us span all n-space, one would hope that this would
bring us to the minimum. Unfortunately, each successive scalar minimization
ruins the minimization from before. We could repeat the
basic procedure over and over again until it
converges, but that could be very slow (see Numerical Recipes chapter 10 for
more discussion of this).
What we want is for the us to not only be linearly independent, but to be
conjugate, which means that minimizing along one direction does not spoil
the other directions. Powell's algorithm
finds a mutually conjugate set of
directions for a paraboloid:
-
Set oldX' = x'
-
Do the basic procedure
-
Create a new direction, u' = x' - oldX'. This is
the average direction moved in the basic procedure.
-
Minimize along u'.
-
Replace one of the original u[j] with u'
Each time Powell's algorithm is run, the new u' is
conjugate to all the previous u's (for a paraboloid), so after n
runs through the algorithm, the directions are all conjugate and the last
basic procedure got us to the minimum.
The algorithm doesn't specify which original u[j] to replace; it
works no matter what you choose. The original algorithm just picked the first
available, but empirically it seems to converge faster if we pick the one that
made the best improvement in f(x) during the basic procedure.
See the discussion in
Numerical Recipes chapter 10.5, section "Discarding the Direction of Largest Decrease."
If the function is not a paraboloid, then we could repeat
Powell's algorithm over and over until it converges,
but replacing a vector by the difference of vectors tends to make the set linearly
dependent, so the u[j]s don't span all of space and only minimize
a subspace of the true domain, so we won't necessarily find the true minimum.
The answer is to reset the u[j]s to a set of orthogonal vectors after
doing Powell's algorithm n times. Brent found an amazing
result that, for a paraboloid, gives an orthogonal and conjugate set of directions, and
even if the function is far from paraboloid, still gives an orthogonal set, so
Powell's algorithm still works.
Resetting the directions is:
-
Create the nxn matrix U whose columns are the vectors
u[j]
-
Determine the approximate second derivative d[j] along each
direction u[j]. If, as above, g(b) =
f(x + b*u), then for any three values of b,
b1,b2, and b3,
with values of g of g1,g2,
and g3, the first derivatives are
f'12 =
(g1-g2)/(b1-b2)
and
f'23 =
(g2-g3)/(b2-b3)
and the second derivative is
d = 2 *
(f'12-f'23)/
(b1-b3).
If we have done a scalar minimization in direction u, we should have
evaluated g(b) at a few points already, so finding d is easy.
(We can even have the scalar minimization routine do it automatically).
- Determine the diagonal matrix D with
D[j][j] = 1/sqrt(d[j]).
If the function is anywhere close to a
minimum of a paraboloid, the second derivative is > 0
(if it is not, Brent just sets d[j] = an empiric small number).
-
Set U = U*D
-
Find the singular value decomposition of U and
set U to the left-hand orthogonal matrix
of that decomposition. For the algorithms in package nr, this is just
SingularValueDecomposition
decomp = new SingularValueDecomposion (U);
-
Set the second derivative estimates to
d[j] = 1/decomp.getSingularValues()[j]^2
The columns of U are now the desired new u[j], with no new function
evaluations and only one order n3 method, the singular value
decomposition.
Brent does a few other tricks:
-
Laziness:
Brent points out that the first run through Powell's
algorithm there are no other "new" directions to be conjugate to, so there's
no reason to replace any of the old directions. So he just minimizes along one
direction, then does Powell's algorithm n-1 times to make the directions
conjugate. There is no point in minimizing along successive non-conjugate directions,
since each minimization will likely ruin the last one.
-
Random jump:
If the function has very narrow valleys (which we can tell if the second
derivatives in different directions have vastly different orders of magnitude) then
rounding errors may keep the algorithm from finding its way down the valley.
Brent calls this an "ill-conditioned" problem. If it is ill-conditioned,
before starting Powell's algorithm, move x'
to a random point a little bit away ("little bit" is determined by the second
derivatives and the machine precision).
-
Twisty little passage: If the valley is curved, then
assuming a paraboloid won't turn the corners and convergence will be slow.
Before resetting the directions, Brent uses the current value of
x' and the previous 2 values at this point in the algorithm to
approximate a curve and uses the scalar minimizer to minimize along that
curve. If the algorithm has been done less than 3 times, just record
the point for future times.
-
Automatic scaling: For the algorithm to be
efficient, all the components of x should be approximately on the same
scale. Brent's original algorithm incorporates ways to try to automatically do
this, but the version implemented here does not. The user must scale his
variables prior to calling
VecMinimizerImp.minimize(Vec)
.
So Brent's algorithm is:
-
Minimize along u[1].
-
For n-1 times:
-
If the problem is ill-conditioned, do a Random jump.
-
Do Powell's algorithm.
-
If f(x') didn't improve much and the problem was not ill-conditioned,
consider it ill-conditioned and restart the algorithm.
-
If x' didn't change much, then exit with success. Note that if x' didn't change,
then f didn't change, which (by the previous step) we always end up
considering the problem ill-conditioned and doing a Random jump before exiting.
-
Do a Twisty little passage step.
-
Reset the directions. Check the values of
d[j] to see if the problem is ill-conditioned
-
Exit with failure. We did not converge with this execution of Brent's algorithm, but
hopefully left us with an improved x', a set of conjugate and orthogonal directions
u, and the corresponding estimates of the second derivative d.
So the final algorithm is:
-
Set the direction vectors to the unit vectors in each direction
(U[i,j] = i=j ? 1 : 0).
-
Set the second derivative estimates to 0 (d[j] = 0).
-
Repeat Brent's algorithm until it exits with success.
-
If desired (if you are worried about twisty narrow valleys in which the algorithm might get
stuck without being at the minimum), take a Random jump and restart.
Brent restarts at least once.
The linear minimization is intentionally a quick but poor one.
The idea is that there is no point in perfectly minimizing the function along one particular
direction, since the algorithm
will have to take many steps in many directions anyway. The minimizer goes to the exact
minimum for a perfect parabola, and tries to find a point that is lower than the current one
for any other function.
The linear minimization algorithm takes the value at the current point, evaluates the function
one "stepsize" away, and uses these two points plus the estimate of the second derivative to
extrapolate a parabola and determines where the minimum of that parabola is. It then evaluates the
function at that point, and if the value is less than the previous ones, it updates the estimate
of the second derivative using the three points evaluated and returns the location of the minimum.
If the value at the predicted minimum is more than the other ones, then the current function
is not a good parabola. Try a few points closer to the original looking for a lower function value
(the algorithm uses just 2 tries). Return the lowest point found (possibly the original
point) and set the second derivative to zero.
If we start with no good estimate of the second derivative, evaluate the function another
"stepsize" away and use those three points to extrapolate a parabola, then proceed as above.
Determining the stepsize is the hard part. If there is a good second derivative estimate, use that
as an estimate of the width of the parabola and use that and the desired tolerance to get a step size.
If we do not have a second derivative estimate, use the size of the last change in
x' as the stepsize. If this is the first time, use the length of x' as
a crude guess.
This is not the exact step size determining algorithm that Brent used, but it seems simpler and
seems to work.
The algorithm converges when each component of the change in x' is less than
the corresponding element of minDeltaX. minDeltaX[i] is
defined as epsilon * abs (original value of x' at the start of the
algorithm[i], or just epsilon if that is zero. Thus, the
initial guess for the minimum also sets the scale for the convergence.
Method Summary |
static void |
main(java.lang.String[] args)
test suite |
static java.lang.String |
name()
|
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Brent
public Brent(ScalarFunction f)
- create a new instance of Brent.
- Parameters:
f
- the ScalarFunction
to minimize
name
public static java.lang.String name()
main
public static void main(java.lang.String[] args)
- test suite