Synonyms
Anomaly detection; Fraud detection; Identification of outliers; Rejection of outliers
Definition
Outlier detection aims at identifying those objects in a database that are unusual, i.e., different than the majority of the data and therefore suspicious resulting from a contamination, error, or fraud. In a statistical modeling, the assessment of “being unusual” is typically based on a parametric model of the data, identifying those objects that do not fit well to the modeled distribution as outliers. In the database context, the statistical intuition of “being unusual” is typically modeled in an approximate but more efficient, nonparametric way by (local) density estimates and comparison to some reference set.
Historical Background
Filtering out those observations that look suspiciously different than the majority of observations is a procedure probably tacitly practiced since people studied data collections and tried to make sense out of observations. In the eighteenth century,...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Hawkins D. Identification of outliers. London: Chapman and Hall; 1980.
Barnett V, Lewis T. Outliers in statistical data. 3rd ed. Chichester: Wiley; 1994.
Rousseeuw PJ, Hubert M. Robust statistics for outlier detection. Wiley Interdiscip Rev Data Min Knowl Discov. 2011;1(1):73–9.
Knorr EM, Ng RT, Tucanov V. Distance-based outliers: algorithms and applications. VLDB J. 2000;8(3–4):237–53.
Ramaswamy S, Rastogi R, Shim K. Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 427–38.
Angiulli F, Pizzuti C. Outlier mining in large high-dimensional data sets. IEEE Trans Knowl Data Eng. 2005;17(2):203–15.
Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 93–104.
Schubert E, Zimek A, Kriegel HP. Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min Knowl Disc. 2014;28(1):190–237.
Orair GH, Teixeira C, Wang Y, Meira Jr W, Parthasarathy S. Distance-based outlier detection: consolidation and renewed bearing. Proc VLDB Endow. 2010;3(2):1469–80.
Zimek A, Schubert E, Kriegel HP. A survey on unsupervised outlier detection in high-dimensional numerical data. Stat Anal Data Min. 2012;5(5): 363–87.
Zimek A, Campello RJGB, Sander J. Ensembles for unsupervised outlier detection: challenges and research questions. ACM SIGKDD Explor. 2013;15(1):11–22.
Chandola V, Banerjee A, Kumar V. Anomaly detection for discrete sequences: a survey. IEEE Trans Knowl Data Eng. 2012;24(5):823–39.
Akoglu L, Tong H, Koutra D. Graph-based anomaly detection and description: a survey. Data Min Knowl Disc. 2014; https://doi.org/10.1007/s10618-014-0365-y.
Kriegel HP, Kröger P, Schubert E, Zimek A. Interpreting and unifying outlier scores. In: Proceedings of the 11th SIAM International Conference on Data Mining; 2011. p. 13–24.
Achtert E, Kriegel HP, Schubert E, Zimek A. Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 1009–12.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Zimek, A., Schubert, E. (2018). Outlier Detection. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80719
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_80719
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering