Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter

Becker, Claudia; Liebscher, Steffen; Kirschstein, Thomas

doi:10.1007/978-3-642-35494-6_7

Claudia Becker⁴,
Steffen Liebscher⁴ &
Thomas Kirschstein⁴

2086 Accesses
1 Citations

Abstract

Real-life data often contain some observations not consistent with the main bulk of the rest. Since classical statistical procedures often react sensitive against so-called outliers, the use of outlier identification methods based on robust statistical estimators is recommended. One class of such robust estimators is constructed according to the principle of subset selection, meaning that an outlier-free subset of the data is identified first which can then be used to discard or downweight deviating observations in order to robustly estimate the parameters of interest. Such approaches also deliver outlier identification methods. The general approach is presented and three methods are discussed which are developed especially for cases where there are no special restrictions on the data structure given by the main bulk of the observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: Wiley.
MATH Google Scholar
Becker, C., & Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94, 947–955.
Article MathSciNet MATH Google Scholar
Becker, C., & Gather, U. (2001). The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36, 119–127.
Article MathSciNet MATH Google Scholar
Becker, C., & Paris Scholz, S. (2006). Deepest points and least deep points: robustness and outliers with MZE. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, & W. Gaul (Eds.), From data and information analysis to knowledge engineering (pp. 254–261). Heidelberg: Springer.
Chapter Google Scholar
Bennett, M., & Willemain, T. (2001). Resistant estimation of multivariate location using minimum spanning trees. Journal of Statistical Computation and Simulation, 69, 19–40.
Article MathSciNet MATH Google Scholar
Choudhury, D. R., & Das, M. N. (1992). Use of combinatorics for unique detection of unknown number of outliers using group tests. Sankhya. Series B, 54, 92–99.
MATH Google Scholar
Dang, X., & Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference, 140, 782–801.
Article MathSciNet Google Scholar
Davies, P. L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88, 782–801.
Article MathSciNet MATH Google Scholar
Delaunay, B. (1934). Sur la sphere vide. Izvestiâ Akademii Nauk SSSR. Otdelenie Tehničeskih Nauk, 7, 793–800.
Google Scholar
Fieller, N. R. J. (1976). Some problems related to the rejection of outlying observations. Ph.D. Thesis, University of Hull, Hull.
Google Scholar
Gather, U., & Becker, C. (1997). Outlier identification and robust methods. In G. S. Maddala & C. R. Rao (Eds.), Handbook of statistics 15: robust inference (pp. 123–143). Amsterdam: Elsevier.
Chapter Google Scholar
Hampel, F. R., Rousseeuw, P. J., Ronchetti, E., & Stahel, W. (1986). Robust statistics. The approach based on influence functions. New York: Wiley.
MATH Google Scholar
Hawkins, D. M. (1973). Repeated testing for outliers. Statistica Neerlandica, 27, 1–10.
Article MathSciNet MATH Google Scholar
Hawkins, D. M. (1980). Identification of outliers. London: Chapman & Hall.
Book MATH Google Scholar
Hubert, M., Rousseeuw, P. J., & van Aelst, S. (2008). High-breakdown robust multivariate methods. Statistical Science, 23, 92–119.
Article MathSciNet Google Scholar
Jungnickel, D. (2008). Graphs, networks and algorithms (3rd ed.). Heidelberg: Springer.
Book MATH Google Scholar
Kirschstein, T., Liebscher, S., & Becker, C. (2013). Robust estimation of location and scatter by pruning the minimum spanning tree. Submitted for publication.
Google Scholar
Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.
Article MathSciNet MATH Google Scholar
Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer.
Book MATH Google Scholar
Kruskal, J. (1956). On the shortest spanning subtree and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.
Article MathSciNet MATH Google Scholar
Liebscher, S., Kirschstein, T., & Becker, C. (2012). The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Statistics and Computing, 22, 325–336. doi:10.1007/s11222-011-9250-3.
Article MathSciNet Google Scholar
Liebscher, S., Kirschstein, T., & Becker, C. (2013). Rdela—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Statistics and Computing doi:10.1007/s11222-012-9337-5.
Google Scholar
Lopuhaä, H. P., & Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19, 229–248.
Article MathSciNet MATH Google Scholar
Mara, W. (2011). The Chernobyl disaster: legacy and impact on the future of nuclear energy. New York: Marshall Cavendish.
Google Scholar
Murphy, R. B. (1951). On tests for outlying observations. Ph.D. Thesis, Princeton University, Ann Arbor.
Google Scholar
Pearson, E. S., & Chandra Sekar, C. (1936). The efficiency of statistical tools and a criterion for the rejection of outlying observations. Biometrika, 28, 308–320.
MATH Google Scholar
Rosner, B. (1975). On the detection of many outliers. Technometrics, 17, 221–227.
Article MathSciNet MATH Google Scholar
Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In W. Grossman, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (pp. 283–297). Dordrecht: Reidel.
Chapter Google Scholar
Ultsch, A. (1993). Self-organizing neural networks for visualization and classification. In O. Opitz, B. Lausen, & R. Klar (Eds.), Information and classification: concepts (pp. 307–313). Berlin: Springer.
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Martin-Luther-University Halle-Wittenberg, 06099, Halle, Germany
Claudia Becker, Steffen Liebscher & Thomas Kirschstein

Authors

Claudia Becker
View author publications
You can also search for this author in PubMed Google Scholar
Steffen Liebscher
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Kirschstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Claudia Becker .

Editor information

Editors and Affiliations

Halle-Wittenberg, Faculty of Law and Economics, Martin-Luther-University, Große Steinstraße 73, Halle, 06099, Germany
Claudia Becker
Faculty of Statistics, TU Dortmund University, Vogelpothsweg 87, Dortmund, 44227, Germany
Roland Fried
Faculty of Statistics, TU Dortmund University, Vogelpothsweg 87, Dortmund, 44227, Germany
Sonja Kuhnt

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Becker, C., Liebscher, S., Kirschstein, T. (2013). Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds) Robustness and Complex Data Structures. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35494-6_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-35494-6_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35493-9
Online ISBN: 978-3-642-35494-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics