Clustered standard errors
Clustered standard errors are measurements that estimate the standard error of a regression parameter in settings where observations may be subdivided into smaller-sized groups ("clusters") and where the sampling and/or treatment assignment is correlated within each group. Clustered standard errors are widely used in a variety of applied econometric settings, including difference-in-differences[1] or experiments[2]. Analogous to how Huber-White standard errors are consistent in the presence of heteroscedasticity and Newey–West standard errors are consistent in the presence of accurately-modeled autocorrelation, clustered (or "Liang-Zieger"[3]) standard errors are consistent in the presence of cluster-based sampling or treatment assignment. Clustered standard errors are often justified by possible correlation in modeling residuals within each cluster; while recent work suggests that this is not the precise justification behind clustering[4], it may be pedagogically useful.
Intuitive Motivation
Mathematical Motivation
- ^ Bertrand, Marianne; Duflo, Esther; Mullainathan, Sendhil (2004-02-01). "How Much Should We Trust Differences-In-Differences Estimates?". The Quarterly Journal of Economics. 119 (1): 249–275. doi:10.1162/003355304772839588. ISSN 0033-5533.
- ^ Yixin Tang (2019-09-11). "Analyzing Switchback Experiments by Cluster Robust Standard Error to prevent false positive results". DoorDash Engineering Blog. Retrieved 2020-07-05.
- ^ Liang, Kung-Yee; Zeger, Scott L. (1986-04-01). "Longitudinal data analysis using generalized linear models". Biometrika. 73 (1): 13–22. doi:10.1093/biomet/73.1.13. ISSN 0006-3444.
- ^ Abadie, Alberto; Athey, Susan; Imbens, Guido; Wooldridge, Jeffrey (2017-10-24). "When Should You Adjust Standard Errors for Clustering?". arXiv:1710.02926 [econ, math, stat].