Newton's method in optimization

Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can be used to find local maxima and minima of functions by noticing that if a real number $x*$ is a critical point of a function $f(x)$ , then $x*$ is a root of $f'(x)$ , and therefore one can apply Newton's method to the derivative $f'(x)$ . Thus, provided that $f(x)$ is a twice differentiable function and the initial guess $x_{0}$ is chosen close enough to $x*$ , the sequence $(x_{n})$ defined by

x_{n+1}=x_{n}-{\frac {f'(x_{n})}{f''(x_{n})}},\ n\geq 0

will converge towards $x*$ .

This iterative scheme can be generalized to several dimensions by replacing the derivative of $f(\mathbf {x} )$ with the gradient, $\nabla f(\mathbf {x} )$ , and the reciprocal of the second derivative with the inverse of the Hessian matrix, $Hf(\mathbf {x} )$ . One obtains the iterative scheme

\mathbf {x} _{n+1}=\mathbf {x} _{n}-[Hf(\mathbf {x} _{n})]^{-1}\nabla f(\mathbf {x} _{n}),\ n\geq 0.

Usually Newton's method is modified to include a small step size $\gamma >0$ instead of $\gamma =1$

\mathbf {x} _{n+1}=\mathbf {x} _{n}-\gamma [Hf(\mathbf {x} _{n})]^{-1}\nabla f(\mathbf {x} _{n}).

The geometric interpretation of Newton's method is that at each iteration one approximates $f(\mathbf {x} )$ by a quadratic function around $\mathbf {x} _{n}$ , and then takes a step towards the maximum/minimum of this quadratic function.

Newton's method converges much faster towards a local maximum or minimum than gradient descent. However, to use Newton's method one needs to know the Hessian of $f(\mathbf {x} )$ and invert it (at least approximately) at each iteration, which can be a costly operation. There exist various quasi-Newton methods, which converge a bit slower, but are a somewhat less costly.