Robust regression
Classical least-squares regression relies on heavy model assumptions (the Gauss-Markov hypothesis) which are often not met in practice. Non-parametric models (i.e., for which the data does not have to follow a known distribution) were developped to solve this problem. However, non-parametric models give very unprecise results, compared to their parametric counterparts. Therefore, a compromise between paramtric and non-parametric methods was created: robust statistics.
The aim of robust statistics is to create statistical methods which are resistant to departure from model assumptions, i.e. outliers. An outlier can be defined as a value which is not explained adequately by a given model. For example, if , the value can be considered an outlier (a 0.01 outlier as the probability of observing given those assumptions is lower than 0.01).
Several questions arise: how many outliers can a given algorithm bear before it breaks down? more precisely, how can we describe the influence of a growing proportion of outliers on the algorithm? What are the properties desirable for robust statistical procedures?