Clustered standard errors

Clustered standard errors are measurements that estimate the standard error of a regression parameter in settings where observations may be subdivided into smaller-sized groups ("clusters") and where the sampling and/or treatment assignment is correlated within each group. Clustered standard errors are widely used in a variety of applied econometric settings, including difference-in-differences^[1] or experiments^[2]. Analogous to how Huber-White standard errors are consistent in the presence of heteroscedasticity and Newey–West standard errors are consistent in the presence of accurately-modeled autocorrelation, clustered (or "Liang-Zieger"^[3]) standard errors are consistent in the presence of cluster-based sampling or treatment assignment. Clustered standard errors are often justified by possible correlation in modeling residuals within each cluster; while recent work suggests that this is not the precise justification behind clustering^[4], it may be pedagogically useful.

Intuitive Motivation

Mathematical Motivation

^ Bertrand, Marianne; Duflo, Esther; Mullainathan, Sendhil (2004-02-01). "How Much Should We Trust Differences-In-Differences Estimates?". The Quarterly Journal of Economics. 119 (1): 249–275. doi:10.1162/003355304772839588. ISSN 0033-5533.
^ Yixin Tang (2019-09-11). "Analyzing Switchback Experiments by Cluster Robust Standard Error to prevent false positive results". DoorDash Engineering Blog. Retrieved 2020-07-05.
^ Liang, Kung-Yee; Zeger, Scott L. (1986-04-01). "Longitudinal data analysis using generalized linear models". Biometrika. 73 (1): 13–22. doi:10.1093/biomet/73.1.13. ISSN 0006-3444.
^ Abadie, Alberto; Athey, Susan; Imbens, Guido; Wooldridge, Jeffrey (2017-10-24). "When Should You Adjust Standard Errors for Clustering?". arXiv:1710.02926 [econ, math, stat].

[1] Bertrand, Marianne; Duflo, Esther; Mullainathan, Sendhil (2004-02-01). "How Much Should We Trust Differences-In-Differences Estimates?". The Quarterly Journal of Economics. 119 (1): 249–275. doi:10.1162/003355304772839588. ISSN 0033-5533.

[2] Yixin Tang (2019-09-11). "Analyzing Switchback Experiments by Cluster Robust Standard Error to prevent false positive results". DoorDash Engineering Blog. Retrieved 2020-07-05.

[3] Liang, Kung-Yee; Zeger, Scott L. (1986-04-01). "Longitudinal data analysis using generalized linear models". Biometrika. 73 (1): 13–22. doi:10.1093/biomet/73.1.13. ISSN 0006-3444.

[4] Abadie, Alberto; Athey, Susan; Imbens, Guido; Wooldridge, Jeffrey (2017-10-24). "When Should You Adjust Standard Errors for Clustering?". arXiv:1710.02926 [econ, math, stat].

[1]

[2]

[3]

[4]