Skip to main content

Outlier Detection

  • Reference work entry
  • First Online:

Synonyms

Anomaly detection; Fraud detection; Identification of outliers; Rejection of outliers

Definition

Outlier detection aims at identifying those objects in a database that are unusual, i.e., different than the majority of the data and therefore suspicious resulting from a contamination, error, or fraud. In a statistical modeling, the assessment of “being unusual” is typically based on a parametric model of the data, identifying those objects that do not fit well to the modeled distribution as outliers. In the database context, the statistical intuition of “being unusual” is typically modeled in an approximate but more efficient, nonparametric way by (local) density estimates and comparison to some reference set.

Historical Background

Filtering out those observations that look suspiciously different than the majority of observations is a procedure probably tacitly practiced since people studied data collections and tried to make sense out of observations. In the eighteenth century,...

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   4,499.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   6,499.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Recommended Reading

  1. Hawkins D. Identification of outliers. London: Chapman and Hall; 1980.

    Book  MATH  Google Scholar 

  2. Barnett V, Lewis T. Outliers in statistical data. 3rd ed. Chichester: Wiley; 1994.

    MATH  Google Scholar 

  3. Rousseeuw PJ, Hubert M. Robust statistics for outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov. 2011;1(1):73–9.

    Article  Google Scholar 

  4. Knorr EM, Ng RT, Tucanov V. Distance-based outliers: algorithms and applications. VLDB J. 2000;8(3–4):237–53.

    Article  Google Scholar 

  5. Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 427–38.

    Article  Google Scholar 

  6. Angiulli F, Pizzuti C. Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng. 2005;17(2):203–15.

    Article  MATH  Google Scholar 

  7. Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 93–104.

    Google Scholar 

  8. Schubert E, Zimek A, Kriegel HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc. 2014;28(1):190–237.

    Article  MathSciNet  MATH  Google Scholar 

  9. Orair GH, Teixeira C, Wang Y, Meira Jr W, Parthasarathy S. Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow. 2010;3(2):1469–80.

    Article  Google Scholar 

  10. Zimek A, Schubert E, Kriegel HP. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min. 2012;5(5): 363–87.

    Article  MathSciNet  Google Scholar 

  11. Zimek A, Campello RJGB, Sander J. Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor. 2013;15(1):11–22.

    Article  Google Scholar 

  12. Chandola V, Banerjee A, Kumar V. Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng. 2012;24(5):823–39.

    Article  Google Scholar 

  13. Akoglu L, Tong H, Koutra D. Graph-based anomaly detection and description: a survey. Data Min Knowl Disc. 2014; https://doi.org/10.1007/s10618-014-0365-y.

    Article  MathSciNet  Google Scholar 

  14. Kriegel HP, Kröger P, Schubert E, Zimek A. Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM International Conference on Data Mining; 2011. p. 13–24.

    Chapter  Google Scholar 

  15. Achtert E, Kriegel HP, Schubert E, Zimek A. Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 1009–12.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arthur Zimek .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Science+Business Media, LLC, part of Springer Nature

About this entry

Check for updates. Verify currency and authenticity via CrossMark

Cite this entry

Zimek, A., Schubert, E. (2018). Outlier Detection. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80719

Download citation

Publish with us

Policies and ethics