Robust measures of scale
In statistics, a robust measure of scale is a robust statistic that quantifies the statistical dispersion in a set of quantitative data. Robust measures of scale are used to complement or replace conventional estimates of scale such as the sample variance or sample standard deviation. As with other robust statistics, a robust measure of scale is minimally affected by a small fraction of outliers, at the cost of lower statistical efficiency when outliers are not present.
IQR and MAD
The most familiar robust measures of scale are the interquartile range (IQR) and the median absolute deviation (MAD). The IQR is the difference between the 75th percentile and the 25th percentile of a sample. The MAD is the median of the absolute values of the differences between the data values and the overall median of the data set.
Other robust measures of scale
Rousseeuw and Croux[1] propose alternatives to the MAD, pointing out two drawbacks of it:
- It is inefficient (37% efficiency) at Gaussian distributions.
- it computes a symmetric statistic about a location estimate, thus not dealing with skewness.
They propose two alternatives, based on pairwise differences: Sn and Qn, defined as:
These can be computed in O(n log n) time and O(n) space.
Neither of these requires location estimation, as they are based only on differences between samples. They are both more efficient than the MAD under a Gaussian distribution: Sn is 58% efficient, while Qn is 82% efficient.
The interdecile range is another robust measure of scale.
The population analogue of a robust measure of scale
In some cases, robust estimators of scale are used to estimate the population variance or population standard deviation. For example, the IQR is sometimes defined as the difference between the 75th and 25th percentiles divided by 1.349, so that it becomes unbiased for the population variance if the data follow a normal distribution.
In other situations, it makes more sense to thing of a robust measure of scale as an estimator of its analogous population value. For example, the MAD of a sample from a standard Cauchy distribution is an estimator of the population MAD, which in this case is 1, whereas the population variance does not exist.
References
- ^ Rousseeuw, Peter J.; Croux, Christophe (1993), "Alternatives to the Median Absolute Deviation", Journal of the American Statistical Association, 88 (424): 1273–1283, doi:10.2307/2291267
{{citation}}
: Unknown parameter|month=
ignored (help)