Robust regression

Classical least-squares regression relies on heavy model assumptions (the Gauss-Markov hypothesis) which are often not met in practice. Non-parametric models (i.e., for which the data does not have to follow a known distribution) were developped to solve this problem. However, non-parametric models give very unprecise results, compared to their parametric counterparts. Therefore, a compromise between paramtric and non-parametric methods was created: robust statistics.

The aim of robust statistics is to create statistical methods which are resistant to departure from model assumptions, i.e. outliers. An outlier can be defined as a value which is not explained adequately by a given model. For example, if $X\sim {\mathcal {N}}(0,1)$ , the value $x=10$ can be considered an outlier (a 0.01 outlier as the probability of observing $x=10$ given those assumptions is lower than 0.01).

Several questions arise: how many outliers can a given algorithm bear before it breaks down? more precisely, how can we describe the influence of a growing proportion of outliers on the algorithm? What are the properties desirable for robust statistical procedures?

Robust regression

Description of Robustness

d

Desirable Properties

Construction of Robust Algorithms