Omitted-variable bias
In statistics, omitted-variable bias (OVB) is the bias that appears in estimates of parameters in a regression analysis when the assumed specification is incorrect, in that it omits an independent variable (possibly non-delineated) that should be in the model.
Omitted-variable bias in linear regression
Two conditions must hold true for omitted-variable bias to exist in linear regression:
- the omitted variable must be a determinant of the dependent variable (i.e., its true regression coefficient is not zero); and
- the omitted variable must be correlated with one or more of the included independent variables.
As an example, consider a linear model of the form
where
- xi is a 1 × p row vector, and is part of the observed data;
- β is a p × 1 column vector of unobservable parameters to be estimated;
- zi is a scalar and is part of the observed data;
- δ is a scalar and is an unobservable parameter to be estimated;
- the error terms ui are unobservable random variables having expected value 0 (conditionally on xi and zi);
- the dependent variables yi are part of the observed data.
We let
and
Then through the usual least squares calculation, the estimated parameter vector based only on the observed x-values but omitting the observed z values, is given by:
(where the "prime" notation means the transpose of a matrix).
Substituting for Y based on the assumed linear model,
Taking expectations, the final term
falls out by the assumption that u has zero expectation. Simplifying the remaining terms:
The second term above is the omitted-variable bias in this case. Note that the bias is equal to the weighted portion of zi which is "explained" by xi.
References
- Greene, WH (1993). Econometric Analysis, 2nd ed. Macmillan. pp. 245–246.