Jump to content

Robust regression

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Deimos 28 (talk | contribs) at 07:21, 20 September 2005. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Classical least-squares regression relies on heavy model assumptions (the Gauss-Markov hypothesis) which are often not met in practice. Non-parametric models (i.e., for which the data does not have to follow a known distribution) were developped to solve this problem. However, non-parametric models give very unprecise results, compared to their parametric counterparts. Therefore, a compromise between paramtric and non-parametric methods was created: robust statistics.

The aim of robust statistics is to create statistical methods which are resistant to departure from model assumptions, i.e. outliers. An outlier can be defined as a value which is not explained adequately by a given model. For example, if , the value can be considered an outlier (a 0.01 outlier as the probability of observing given those assumptions is lower than 0.01).

Several questions arise: how many outliers can a given algorithm bear before it breaks down? more precisely, how can we describe the influence of a growing proportion of outliers on the algorithm? What are the properties desirable for robust statistical procedures?

Description of Robustness

d

Desirable Properties

M-estimators

Construction of Robust Algorithms