Jump to content

Newton's method in optimization

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Olegalexandrov (talk | contribs) at 07:23, 5 December 2004 (explained how to use Newton's method for optimization). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can be used to find local maxima and minima of functions by noticing that if a real number is a critical point of a function , then is a root of , and therefore one can apply Newton's method to the derivative . Thus, provided that is a twice differentiable function and the initial guess is chosen close enough to , the sequence defined by

will converge towards .

This iterative scheme can be generalized to several dimensions by replacing the derivative of with the gradient, , and the reciprocal of the second derivative with the inverse of the Hessian matrix, . One obtains the iterative scheme

Usually Newton's method is modified to include a small step size instead of

The geometric interpretation of Newton's method is that at each iteration one approximates by a quadratic function around , and then takes a step towards the maximum/minimum of this quadratic function.

Newton's method converges much faster towards a local maximum or minimum than gradient descent. However, to use Newton's method one needs to know the Hessian of and invert it (at least approximately) at each iteration, which can be a costly operation. There exist various quasi-Newton methods, which converge a bit slower, but are a somewhat less costly.

See also gradient descent.