Anomaly detection; Fraud detection; Identification of outliers; Rejection of outliers
Outlier detection aims at identifying those objects in a database that are unusual, i.e., different than the majority of the data and therefore suspicious resulting from a contamination, error, or fraud. In a statistical modeling, the assessment of “being unusual” is typically based on a parametric model of the data, identifying those objects that do not fit well to the modeled distribution as outliers. In the database context, the statistical intuition of “being unusual” is typically modeled in an approximate but more efficient, nonparametric way by (local) density estimates and comparison to some reference set.
Filtering out those observations that look suspiciously different than the majority of observations is a procedure probably tacitly practiced since people studied data collections and tried to make sense out of observations. In the eighteenth century,...
- 7.Breunig MM, Kriegel HP, Ng RT, Sander J. LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2000. p. 93–104.Google Scholar
- 15.Achtert E, Kriegel HP, Schubert E, Zimek A. Interactive data mining with 3D-parallel-coordinate-trees. In: Proceedings of the ACM SIGMOD International Conference on Management of Data; 2013. p. 1009–12.Google Scholar