Skip to main content

Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter

  • Chapter
Robustness and Complex Data Structures

Abstract

Real-life data often contain some observations not consistent with the main bulk of the rest. Since classical statistical procedures often react sensitive against so-called outliers, the use of outlier identification methods based on robust statistical estimators is recommended. One class of such robust estimators is constructed according to the principle of subset selection, meaning that an outlier-free subset of the data is identified first which can then be used to discard or downweight deviating observations in order to robustly estimate the parameters of interest. Such approaches also deliver outlier identification methods. The general approach is presented and three methods are discussed which are developed especially for cases where there are no special restrictions on the data structure given by the main bulk of the observations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Barnett, V., & Lewis, T. (1994). Outliers in statistical data (3rd ed.). New York: Wiley.

    MATH  Google Scholar 

  • Becker, C., & Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94, 947–955.

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, C., & Gather, U. (2001). The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36, 119–127.

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, C., & Paris Scholz, S. (2006). Deepest points and least deep points: robustness and outliers with MZE. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, & W. Gaul (Eds.), From data and information analysis to knowledge engineering (pp. 254–261). Heidelberg: Springer.

    Chapter  Google Scholar 

  • Bennett, M., & Willemain, T. (2001). Resistant estimation of multivariate location using minimum spanning trees. Journal of Statistical Computation and Simulation, 69, 19–40.

    Article  MathSciNet  MATH  Google Scholar 

  • Choudhury, D. R., & Das, M. N. (1992). Use of combinatorics for unique detection of unknown number of outliers using group tests. Sankhya. Series B, 54, 92–99.

    MATH  Google Scholar 

  • Dang, X., & Serfling, R. (2010). Nonparametric depth-based multivariate outlier identifiers, and masking robustness properties. Journal of Statistical Planning and Inference, 140, 782–801.

    Article  MathSciNet  Google Scholar 

  • Davies, P. L., & Gather, U. (1993). The identification of multiple outliers. Journal of the American Statistical Association, 88, 782–801.

    Article  MathSciNet  MATH  Google Scholar 

  • Delaunay, B. (1934). Sur la sphere vide. Izvestiâ Akademii Nauk SSSR. Otdelenie Tehničeskih Nauk, 7, 793–800.

    Google Scholar 

  • Fieller, N. R. J. (1976). Some problems related to the rejection of outlying observations. Ph.D. Thesis, University of Hull, Hull.

    Google Scholar 

  • Gather, U., & Becker, C. (1997). Outlier identification and robust methods. In G. S. Maddala & C. R. Rao (Eds.), Handbook of statistics 15: robust inference (pp. 123–143). Amsterdam: Elsevier.

    Chapter  Google Scholar 

  • Hampel, F. R., Rousseeuw, P. J., Ronchetti, E., & Stahel, W. (1986). Robust statistics. The approach based on influence functions. New York: Wiley.

    MATH  Google Scholar 

  • Hawkins, D. M. (1973). Repeated testing for outliers. Statistica Neerlandica, 27, 1–10.

    Article  MathSciNet  MATH  Google Scholar 

  • Hawkins, D. M. (1980). Identification of outliers. London: Chapman & Hall.

    Book  MATH  Google Scholar 

  • Hubert, M., Rousseeuw, P. J., & van Aelst, S. (2008). High-breakdown robust multivariate methods. Statistical Science, 23, 92–119.

    Article  MathSciNet  Google Scholar 

  • Jungnickel, D. (2008). Graphs, networks and algorithms (3rd ed.). Heidelberg: Springer.

    Book  MATH  Google Scholar 

  • Kirschstein, T., Liebscher, S., & Becker, C. (2013). Robust estimation of location and scatter by pruning the minimum spanning tree. Submitted for publication.

    Google Scholar 

  • Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43, 59–69.

    Article  MathSciNet  MATH  Google Scholar 

  • Kohonen, T. (2001). Self-organizing maps (3rd ed.). Berlin: Springer.

    Book  MATH  Google Scholar 

  • Kruskal, J. (1956). On the shortest spanning subtree and the traveling salesman problem. Proceedings of the American Mathematical Society, 7, 48–50.

    Article  MathSciNet  MATH  Google Scholar 

  • Liebscher, S., Kirschstein, T., & Becker, C. (2012). The flood algorithm—a multivariate, self-organizing-map-based, robust location and covariance estimator. Statistics and Computing, 22, 325–336. doi:10.1007/s11222-011-9250-3.

    Article  MathSciNet  Google Scholar 

  • Liebscher, S., Kirschstein, T., & Becker, C. (2013). Rdela—a Delaunay-triangulation-based, location and covariance estimator with high breakdown point. Statistics and Computing doi:10.1007/s11222-012-9337-5.

    Google Scholar 

  • Lopuhaä, H. P., & Rousseeuw, P. J. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19, 229–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Mara, W. (2011). The Chernobyl disaster: legacy and impact on the future of nuclear energy. New York: Marshall Cavendish.

    Google Scholar 

  • Murphy, R. B. (1951). On tests for outlying observations. Ph.D. Thesis, Princeton University, Ann Arbor.

    Google Scholar 

  • Pearson, E. S., & Chandra Sekar, C. (1936). The efficiency of statistical tools and a criterion for the rejection of outlying observations. Biometrika, 28, 308–320.

    MATH  Google Scholar 

  • Rosner, B. (1975). On the detection of many outliers. Technometrics, 17, 221–227.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point. In W. Grossman, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (pp. 283–297). Dordrecht: Reidel.

    Chapter  Google Scholar 

  • Ultsch, A. (1993). Self-organizing neural networks for visualization and classification. In O. Opitz, B. Lausen, & R. Klar (Eds.), Information and classification: concepts (pp. 307–313). Berlin: Springer.

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Claudia Becker .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Becker, C., Liebscher, S., Kirschstein, T. (2013). Multivariate Outlier Identification Based on Robust Estimators of Location and Scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds) Robustness and Complex Data Structures. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35494-6_7

Download citation

Publish with us

Policies and ethics