Numerical methods for linear least squares

Numerical methods for linear least squares entails the numerical analysis of (linear) least squares problems.

Introduction

A general approach to the least squares problem $\operatorname {\,min} \,{\big \|}\mathbf {y} -X{\boldsymbol {\beta }}{\big \|}^{2}$ can be described as follows. Suppose that we can find an n by m matrix S such that XS is an orthogonal projection onto the image of X. Then a solution to our minimization problem is given by

{\boldsymbol {\beta }}=S\mathbf {y}

simply because

X{\boldsymbol {\beta }}=X(S\mathbf {y} )=(XS)\mathbf {y}

is exactly a sought for orthogonal projection of $\mathbf {y}$ onto an image of X (see the picture below and note that as explained in the next section the image of X is just a subspace generated by column vectors of X). A few popular ways to find such a matrix S are described below.

Inverting the matrix of the normal equations

The algebraic solution of the normal equations with a full-rank matrix X^TX can be written as

{\hat {\boldsymbol {\beta }}}=(\mathbf {X} ^{\rm {T}}\mathbf {X} )^{-1}\mathbf {X} ^{\rm {T}}\mathbf {y} =\mathbf {X} ^{+}\mathbf {y}

where X⁺ is the Moore–Penrose pseudoinverse of X. Although this equation is correct and can work in many applications, it is not computationally efficient to invert the normal-equations matrix (the Gramian matrix). An exception occurs in numerical smoothing and differentiation where an analytical expression is required.

If the matrix X^TX is well-conditioned and positive definite, implying that it has full rank, the normal equations can be solved directly by using the Cholesky decomposition R^TR, where R is an upper triangular matrix, giving:

R^{\rm {T}}R{\hat {\boldsymbol {\beta }}}=X^{\rm {T}}\mathbf {y} .

The solution is obtained in two stages, a forward substitution step, solving for z:

R^{\rm {T}}\mathbf {z} =X^{\rm {T}}\mathbf {y} ,

followed by a backward substitution, solving for ${\hat {\boldsymbol {\beta }}}$ :

R{\hat {\boldsymbol {\beta }}}=\mathbf {z} .

Both substitutions are facilitated by the triangular nature of R.

Orthogonal decomposition methods

Orthogonal decomposition methods of solving the least squares problem are slower than the normal equations method but are more numerically stable because they avoid forming the product X^TX.

The residuals are written in matrix notation as

\mathbf {r} =\mathbf {y} -X{\hat {\boldsymbol {\beta }}}.

The matrix X is subjected to an orthogonal decomposition, e.g., the QR decomposition as follows.

X=Q{\begin{pmatrix}R\\0\end{pmatrix}}\

,

where Q is an m×m orthogonal matrix (Q^TQ=I) and R is an n×n upper triangular matrix with $r_{ii}>0$ .

The residual vector is left-multiplied by Q^T.

Q^{\rm {T}}\mathbf {r} =Q^{\rm {T}}\mathbf {y} -\left(Q^{\rm {T}}Q\right){\begin{pmatrix}R\\0\end{pmatrix}}{\hat {\boldsymbol {\beta }}}={\begin{bmatrix}\left(Q^{\rm {T}}\mathbf {y} \right)_{n}-R{\hat {\boldsymbol {\beta }}}\\\left(Q^{\rm {T}}\mathbf {y} \right)_{m-n}\end{bmatrix}}={\begin{bmatrix}\mathbf {u} \\\mathbf {v} \end{bmatrix}}

Because Q is orthogonal, the sum of squares of the residuals, s, may be written as:

s=\|\mathbf {r} \|^{2}=\mathbf {r} ^{\rm {T}}\mathbf {r} =\mathbf {r} ^{\rm {T}}QQ^{\rm {T}}\mathbf {r} =\mathbf {u} ^{\rm {T}}\mathbf {u} +\mathbf {v} ^{\rm {T}}\mathbf {v}

Since v doesn't depend on β, the minimum value of s is attained when the upper block, u, is zero. Therefore, the parameters are found by solving:

R{\hat {\boldsymbol {\beta }}}=\left(Q^{\rm {T}}\mathbf {y} \right)_{n}.

These equations are easily solved as R is upper triangular.

An alternative decomposition of X is the singular value decomposition (SVD)^[1]

X=U\Sigma V^{\rm {T}}\

,

where U is m by m orthogonal matrix, V is n by n orthogonal matrix and $\Sigma$ is an m by n matrix with all its elements outside of the main diagonal equal to 0. The pseudoinverse of $\Sigma$ is easily obtained by inverting its non-zero diagonal elements and transposing. Hence,

\mathbf {X} \mathbf {X} ^{+}=U\Sigma V^{\rm {T}}V\Sigma ^{+}U^{\rm {T}}=UPU^{\rm {T}},

where P is obtained from $\Sigma$ by replacing its non-zero diagonal elements with ones. Since $(\mathbf {X} \mathbf {X} ^{+})^{*}=\mathbf {X} \mathbf {X} ^{+}$ (the property of pseudoinverse), the matrix $UPU^{\rm {T}}$ is an orthogonal projection onto the image (column-space) of X. In accordance with a general approach described in the introduction above (find XS which is an orthogonal projection),

S=\mathbf {X} ^{+}

,

and thus,

\beta =V\Sigma ^{+}U^{\rm {T}}\mathbf {y}

is a solution of a least squares problem. This method is the most computationally intensive, but is particularly useful if the normal equations matrix, X^TX, is very ill-conditioned (i.e. if its condition number multiplied by the machine's relative round-off error is appreciably large). In that case, including the smallest singular values in the inversion merely adds numerical noise to the solution. This can be cured with the truncated SVD approach, giving a more stable and exact answer, by explicitly setting to zero all singular values below a certain threshold and so ignoring them, a process closely related to factor analysis.

References

^ Lawson, C. L.; Hanson, R. J. (1974). Solving Least Squares Problems. Englewood Cliffs, NJ: Prentice-Hall. ISBN 0-13-822585-0.

Numerical methods for linear least squares

Introduction

Inverting the matrix of the normal equations

Orthogonal decomposition methods

See also

References

Further reading