Abstract
This contribution gives a brief summary of robust estimators of multivariate location and scatter. We assume that the original (uncontaminated) data follow an elliptical distribution with location vector μ and positive definite scatter matrix Σ. Robust methods aim to estimate μ and Σ even though the data has been contaminated by outliers. The well-known multivariate M-estimators can break down when the outlier fraction exceeds 1/(p+1) where p is the number of variables. We describe several robust estimators that can withstand a high fraction (up to 50 %) of outliers, such as the minimum covariance determinant estimator (MCD), the Stahel–Donoho estimator, S-estimators and MM-estimators. We also discuss faster methods that are only approximately equivariant under linear transformations, such as the orthogonalized Gnanadesikan–Kettenring estimator and the deterministic MCD algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Becker, C., & Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94, 947–955.
Becker, C., & Gather, U. (2001). The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36, 119–127.
Billor, N., Hadi, A., & Velleman, P. (2000). BACON: blocked adaptive computationally efficient outlier nominators. Computational Statistics & Data Analysis, 34, 279–298.
Cator, E., & Lopuhaä, H. (2012). Central limit theorem and influence function for the MCD estimators at general multivariate distributions. Bernoulli, 18, 520–551.
Croux, C., & Haesbroeck, G. (1999). Influence function and efficiency of the Minimum Covariance Determinant scatter matrix estimator. Journal of Multivariate Analysis, 71, 161–190.
Davies, L. (1987). Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. The Annals of Statistics, 15, 1269–1292.
Davies, P., & Gather, U. (2005). Breakdown and groups (with discussion and rejoinder). The Annals of Statistics, 33, 977–1035.
Debruyne, M., & Hubert, M. (2009). The influence function of the Stahel–Donoho covariance estimator of smallest outlyingness. Statistics & Probability Letters, 79, 275–282.
Donoho, D. (1982). Breakdown properties of multivariate location estimators. Ph.D. Thesis, Harvard University, Boston.
Fritz, H., Filzmoser, P., & Croux, C. (2012). A comparison of algorithms for the multivariate L1-median. Computational Statistics, 27, 393–410.
Gather, U., & Hilker, T. (1997). A note on Tyler’s modification of the MAD for the Stahel-Donoho estimator. The Annals of Statistics, 25, 2024–2026.
Gnanadesikan, R., & Kettenring, J. (1972). Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 28, 81–124.
Hampel, F., Ronchetti, E., Rousseeuw, P., & Stahel, W. (1986). Robust statistics: the approach based on influence functions. New York: Wiley.
Hubert, M., Rousseeuw, P., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal components analysis. Technometrics, 47, 64–79.
Hubert, M., Rousseeuw, P., & Verdonck, T. (2012). A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics, 21, 618–637.
Lopuhaä, H. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics, 17, 1662–1683.
Lopuhaä, H., & Rousseeuw, P. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19, 229–248.
Maronna, R. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4, 51–67.
Maronna, R., Martin, D., & Yohai, V. (2006). Robust statistics: theory and methods. New York: Wiley.
Maronna, R., & Yohai, V. (1995). The behavior of the Stahel–Donoho robust multivariate estimator. Journal of the American Statistical Association, 90, 330–341.
Maronna, R., & Zamar, R. (2002). Robust estimates of location and dispersion for high-dimensional data sets. Technometrics, 44, 307–317.
Pison, G., Van Aelst, S., & Willems, G. (2002). Small sample corrections for LTS and MCD. Metrika, 55, 111–123.
Rousseeuw, P. (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871–880.
Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (Vol. B, pp. 283–297). Dordrecht: Reidel.
Rousseeuw, P., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88, 1273–1283.
Rousseeuw, P., & Leroy, A. (1987). Robust regression and outlier detection. New York: Wiley-Interscience.
Rousseeuw, P., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.
Rousseeuw, P., & van Zomeren, B. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633–651.
Salibian-Barrera, M., Van Aelst, S., & Willems, G. (2006). PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.
Salibian-Barrera, M., & Yohai, V. J. (2006). A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414–427.
Stahel, W. (1981). Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. Ph.D. Thesis, ETH Zürich.
Tatsuoka, K., & Tyler, D. (2000). On the uniqueness of S-functionals and M-functionals under nonelliptical distributions. The Annals of Statistics, 28, 1219–1243.
Verboven, S., & Hubert, M. (2005). LIBRA: a Matlab library for robust analysis. Chemometrics and Intelligent Laboratory Systems, 75, 127–136.
Visuri, S., Koivunen, V., & Oja, H. (2000). Sign and rank covariance matrices. Journal of Statistical Planning and Inference, 91, 557–575.
Yohai, V., & Zamar, R. (1988). High breakdown point estimates of regression by means of the minimization of an efficient scale. Journal of the American Statistical Association, 83, 406–413.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Rousseeuw, P., Hubert, M. (2013). High-Breakdown Estimators of Multivariate Location and Scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds) Robustness and Complex Data Structures. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35494-6_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-35494-6_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35493-9
Online ISBN: 978-3-642-35494-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)