Omitted-variable bias

In statistics, omitted-variable bias (OVB) is the bias that appears in estimates of parameters in a regression analysis when the assumed specification is incorrect, in that it omits an independent variable (possibly non-delineated) that should be in the model.

Omitted-variable bias in linear regression

Two conditions must hold true for omitted-variable bias to exist in linear regression:

the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient is not zero); and
the omitted variable must be correlated with one or more of the included independent variables.

As an example, consider a linear model of the form

y_{i}=x_{i}\beta +z_{i}\delta +u_{i},\qquad i=1,\dots ,n

where

x_i is a 1 × p row vector, and is part of the observed data;
β is a p × 1 column vector of unobservable parameters to be estimated;
z_i is a scalar and is part of the observed data;
δ is a scalar and is an unobservable parameter to be estimated;
the error terms u_i are unobservable random variables having expected value 0 (conditionally on x_i and z_i);
the dependent variables y_i are part of the observed data.

We let

X=\left[{\begin{array}{c}x_{1}\\\vdots \\x_{n}\end{array}}\right]\in \mathbb {R} ^{n\times p},

and

Y=\left[{\begin{array}{c}y_{1}\\\vdots \\y_{n}\end{array}}\right],\quad Z=\left[{\begin{array}{c}z_{1}\\\vdots \\z_{n}\end{array}}\right],\quad U=\left[{\begin{array}{c}u_{1}\\\vdots \\u_{n}\end{array}}\right]\in \mathbb {R} ^{n\times 1}.

Then through the usual least squares calculation, the estimated parameter vector ${\hat {\beta }}$ based only on the observed x-values but omitting the observed z values, is given by:

{\hat {\beta }}=(X'X)^{-1}X'Y\,

(where the "prime" notation means the transpose of a matrix).

Substituting for Y based on the assumed linear model,

{\begin{aligned}{\hat {\beta }}&=(X'X)^{-1}X'(X\beta +Z\delta +u)\\&=(X'X)^{-1}X'X\beta +(X'X)^{-1}X'Z\delta +(X'X)^{-1}X'u\\&=\beta +(X'X)^{-1}X'Z\delta +(X'X)^{-1}X'u.\end{aligned}}

Taking expectations, the final term

(X'X)^{-1}X'u

falls out by the assumption that u has zero expectation. Simplifying the remaining terms:

{\begin{aligned}E[{\hat {\beta }}]&=\beta +(X'X)^{-1}X'Z\delta \\&=\beta +{\text{bias}}.\end{aligned}}

The second term above is the omitted-variable bias in this case. Note that the bias is equal to the weighted portion of z_i which is "explained" by x_i.

References

Greene, WH (1993). Econometric Analysis, 2nd ed. Macmillan. pp. 245–246.

This statistics-related article is a stub. You can help Wikipedia by expanding it.