Jump to content

Minimum mean square error

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Ewpfpod (talk | contribs) at 14:29, 24 October 2012 (Example 2). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics and signal processing, a minimum mean square error (MMSE) estimator describes the approach which minimizes the mean square error (MSE), which is a common measure of estimator quality.

The term MMSE specifically refers to estimation in a Bayesian setting with quadratic cost function. Unlike non-Bayesian approach where parameters of interest are assumed to be deterministic, but unknown constants, the Bayesian estimator seeks to estimate a parameters that is itself a random variable. The Bayesian approach, based directly on Bayes’ theorem, provides a framework for handling such problems by allowing prior knowledge to be incorporated into the estimator. Furthermore, Bayesian estimation provides yet another alternative to the minimum-variance unbiased estimator MVUE. This is useful when the MVUE cannot be found.

In the alternative frequentist setting there does not exist a single estimator having minimal MSE. A somewhat similar concept can be obtained within the frequentist point of view if one requires unbiasedness, since an estimator may exist that minimizes the variance (and hence the MSE) among unbiased estimators. Such an estimator is then called the MVUE.

Definition

Let be an unknown random vector variable, and let be a known random vector variable (the measurement or observation). An estimator of is any function of the measurement . The estimation error vector is given by and its mean squared error (MSE) is given by the trace of error covariance matrix

,

where the expectation is taken over both and . When is a scalar variable, then MSE expression simplifies to . The MMSE estimator is then defined as the estimator achieving minimal MSE.

Properties

  • Under some weak regularity assumptions,[1] the MMSE estimator is uniquely defined, and is given by
In other words, the MMSE estimator is the conditional expectation of given the known observed value of the measurements.
  • If and are jointly Gaussian, then the MMSE estimator is linear, i.e., it has the form for matrix and constant . As a consequence, to find the MMSE estimator, it is sufficient to find the linear MMSE estimator. Such a situation occurs in the example presented in the next section.
for all functions of the measurements.

Linear MMSE estimator

In many cases, it is not possible to determine a closed form for the MMSE estimator. Also they are computationally expensive to implement since they often require multidimensional integration. In these cases, one possibility is to abandon the full optimality requirements and seek the technique minimizing the MSE within a particular class, such as the class of linear estimators. The linear MMSE estimator is the estimator achieving minimum MSE among all estimators of the form where the measurement is a random vector, is a matrix and is a vector. Such linear estimator only depends on the first two moments of the probability density function. These estimators are sometimes referred to as Wiener filters.

Let us have a linear MMSE estimator given as . For the estimator to be unbiased, the mean error should be zero. This means,

Plugging the expression for in above, we get

where and . Thus we can re-write the estimator as

and the expression for estimation error becomes

From the orthogonality principle, we can have . Here the left hand side term is

When equated to zero, we obtain the desired expression for as

The is cross-covariance matrix between X and Y, and is covariance matrix of Y. Since , the expression can also be re-written in terms of as

Standard method like Gauss elimination can be used to solve the matrix equation. Since the matrix is a symmetric positive definite matrix it can be solved twice as fast with the Cholesky decomposition. Levinson recursion is a fast method when is also a Toeplitz matrix. This can happen when is a wide sense stationary process.

The covariance of MMSE estimation error will then be given by

The first term in the third line is zero due to the orthogonality principle. Since , we can re-write in terms of correlation matrices as

Thus the minimum mean square error achievable by such a linear estimator is

.

Linear process

Furthermore, let us have an underlying linear process , where is a known matrix and is random noise vector with the mean and cross-covariance . The required covariance matrices will be

and

Thus the expression for the linear MMSE estimator further modifies to

When , the expression for is the same as that of weighed least square estimate with as the weight matrix.

Examples

Example 1

We shall take a linear prediction problem as an example. Let a linear combination of observed scalar random variables and be used to estimate another future scalar random variable such that . If the random variables are real Gaussian random variables with zero mean and its covariance matrix given by

then our task is to find the coefficients such that it will yield an optimal linear estimate .

In terms of the terminology developed in the previous section, for this problem we have the observation vector , the estimator matrix as a row vector, and the estimated variable as a scalar quantity. The autocorrelation matrix is defined as

The cross correlation matrix is defined as

We now solve the equation by inverting and pre-multiplying to get

So we have and as the optimal coefficients for . Computing the minimum mean square error then gives Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "http://localhost:6011/en.wikipedia.org/v1/":): {\displaystyle \left\Vert e\right\Vert _{\min}^{2}=E[x_{4}x_{4}]-WC_{YX}=15-WC_{YX}=.2857} .[2] Note that it is not necessary to obtain an explicit matrix inverse of to compute the value of . The matrix equation can be solved by well known methods such as Gauss elimination method. A shorter, non-numerical example can be found in orthogonality principle.

Example 2

Consider a vector formed by taking observations of a random scalar parameter disturbed by white Gaussian noise. We can describe the process by a linear equation , where . Depending on context it will be clear if represents a scalar or a vector. Let the aprior distribution of be uniform over an interval , and thus will have variance of . Let the noise vector be normally distributed as Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "http://localhost:6011/en.wikipedia.org/v1/":): {\displaystyle N(0,\sigma^2I)} where is an identity matrix. Also and are independent and . It is easy to see that

Thus, the MMSE estimator is given by

The last step is due to a special case of the matrix binomial inverse theorem (also known as Woodbury matrix identity). The matrix thus obtained in the last step, Failed to parse (SVG (MathML can be enabled via browser plugin): Invalid response ("Math extension cannot connect to Restbase.") from server "http://localhost:6011/en.wikipedia.org/v1/":): {\displaystyle I - \frac{\frac{\sigma_X^2}{\sigma^2}11^T}{1+\frac{\sigma_X^2}{\sigma^2}1^T1}} , will have as diagonal terms and as off-diagonal terms. Taking the product with respect to , we get the required estimator

where for we have

For very large , we see that the MMSE estimator of a scalar unknown random variable with uniform aprior distribution can be simply approximated by the arithmetic average of all the observed data

However, the estimator is suboptimal since it is constrained to be linear.

See also

Notes

  1. ^ Lehmann and Casella, Corollary 4.1.2.
  2. ^ Moon and Stirling.

Further reading

  • Johnson, D. (22 November 2004). Minimum Mean Squared Error Estimators. Connexions
  • Prediction and Improved Estimation in Linear Models, by J. Bibby, H. Toutenburg (Wiley, 1977). This book looks almost exclusively at minimum mean-square error estimation and inference.
  • Jaynes, E. T. Probability Theory: The Logic of Science. Cambridge University Press, 2003.
  • Lehmann, E. L. (1998). Theory of Point Estimation. Springer. pp. 2nd ed, ch. 4. ISBN 0-387-98502-6. {{cite book}}: Unknown parameter |coauthors= ignored (|author= suggested) (help)
  • Kay, S. M. (1993). Fundamentals of Statistical Signal Processing: Estimation Theory. Prentice Hall. pp. 344–350. ISBN 0-13-042268-1.
  • Moon, T.K. and W.C. Stirling. Mathematical Methods and Algorithms for Signal Processing. Prentice Hall. 2000.