Jump to content

Statistical model validation

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by SolidPhase (talk | contribs) at 13:39, 17 February 2019 (expand and polish). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In statistics, model validation is the task of confirming that the outputs of a statistical model are acceptable with respect to the real data-generating process. In other words, model validation is the task of confirming that the outputs of a statistical model have enough fidelity to the outputs of the data-generating process that the objectives of the investigation can be satisfied.

Model validation can be based on two types of data: data that was used in the construction of the model and data that was not used in the construction. Validation based on the first type usually involves analyzing the goodness of fit of the model or analyzing whether the residuals seem to be random (i.e. residual diagnostics). Validation based on the second type usually involves analyzing whether the model's predictive performance deteriorates non-negligibly when applied to some new data.

For some classes of statistical models, specialized methods of performing validation are available. For example, if the statistical model was obtained via a regression, then specialized analyses for regression validation exist and are generally employed.

When doing a validation, there are three notable causes of potential difficulty, according to the Encyclopedia of Statistical Sciences (2006).[1] The three causes are these: lack of data; lack of control of the input variables; uncertainty about the underlying probability distributions and correlations.

See also

Notes

  1. ^ Deaton, M. L. (2006), "Simulation models, validation of", in S. Kotz; et al. (eds.), Encyclopedia of Statistical Sciences, Wiley {{citation}}: Explicit use of et al. in: |editor= (help).

References