Abstract
Real-life data often contain some observations not consistent with the main bulk of the rest. Since classical statistical procedures often react sensitive against so-called outliers, the use of outlier identification methods based on robust statistical estimators is recommended. One class of such robust estimators is constructed according to the principle of subset selection, meaning that an outlier-free subset of the data is identified first which can then be used to discard or downweight deviating observations in order to robustly estimate the parameters of interest. Such approaches also deliver outlier identification methods. The general approach is presented and three methods are discussed which are developed especially for cases where there are no special restrictions on the data structure given by the main bulk of the observations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: Wiley.
Becker, C., & Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94, 947–955.
Becker, C., & Gather, U. (2001). The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36, 119–127.
Becker, C., & Paris Scholz, S. (2006). Deepest points and least deep points: robustness and outliers with MZE. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, & W. Gaul (Eds.), From data and information analysis to knowledge engineering (pp. 254–261). Heidelberg: Springer.
Bennett, M., & Willemain, T. (2001). Resistant estimation of multivariate location using minimum spanning trees. Journal of Statistical Computation and Simulation, 69, 19–40.
Choudhury, D. R., & Das, M. N. (1992). Use of combinatorics for unique detection of unknown number of outliers using group tests. Sankhya. Series B, 54, 92–99.
Dang, X., & Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference, 140, 782–801.
Davies, P. L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88, 782–801.
Delaunay, B. (1934). Sur la sphere vide. Izvestiâ Akademii Nauk SSSR. Otdelenie Tehničeskih Nauk, 7, 793–800.
Fieller, N. R. J. (1976). Some problems related to the rejection of outlying observations. Ph.D. Thesis, University of Hull, Hull.
Gather, U., & Becker, C. (1997). Outlier identification and robust methods. In G. S. Maddala & C. R. Rao (Eds.), Handbook of statistics 15: robust inference (pp. 123–143). Amsterdam: Elsevier.
Hampel, F. R., Rousseeuw, P. J., Ronchetti, E., & Stahel, W. (1986). Robust statistics. The approach based on influence functions. New York: Wiley.
Hawkins, D. M. (1973). Repeated testing for outliers. Statistica Neerlandica, 27, 1–10.
Hawkins, D. M. (1980). Identification of outliers. London: Chapman & Hall.
Hubert, M., Rousseeuw, P. J., & van Aelst, S. (2008). High-breakdown robust multivariate methods. Statistical Science, 23, 92–119.
Jungnickel, D. (2008). Graphs, networks and algorithms (3rd ed.). Heidelberg: Springer.
Kirschstein, T., Liebscher, S., & Becker, C. (2013). Robust estimation of location and scatter by pruning the minimum spanning tree. Submitted for publication.
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.
Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer.
Kruskal, J. (1956). On the shortest spanning subtree and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.
Liebscher, S., Kirschstein, T., & Becker, C. (2012). The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Statistics and Computing, 22, 325–336. doi:10.1007/s11222-011-9250-3.
Liebscher, S., Kirschstein, T., & Becker, C. (2013). Rdela—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Statistics and Computing doi:10.1007/s11222-012-9337-5.
Lopuhaä, H. P., & Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19, 229–248.
Mara, W. (2011). The Chernobyl disaster: legacy and impact on the future of nuclear energy. New York: Marshall Cavendish.
Murphy, R. B. (1951). On tests for outlying observations. Ph.D. Thesis, Princeton University, Ann Arbor.
Pearson, E. S., & Chandra Sekar, C. (1936). The efficiency of statistical tools and a criterion for the rejection of outlying observations. Biometrika, 28, 308–320.
Rosner, B. (1975). On the detection of many outliers. Technometrics, 17, 221–227.
Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In W. Grossman, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (pp. 283–297). Dordrecht: Reidel.
Ultsch, A. (1993). Self-organizing neural networks for visualization and classification. In O. Opitz, B. Lausen, & R. Klar (Eds.), Information and classification: concepts (pp. 307–313). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Becker, C., Liebscher, S., Kirschstein, T. (2013). Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds) Robustness and Complex Data Structures. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35494-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-35494-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35493-9
Online ISBN: 978-3-642-35494-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)