Enhancing Outlier Detection by an Outlier Indicator
Outlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection and so on. At present, a large number of popular outlier detection algorithms are available, including distribution-based, distance-based, density-based, and clustering-based approaches and so on. However, traditional outlier detection algorithms face some challenges. For one example, most distance-based and density-based outlier detection methods are based on k-nearest neighbors and therefore, are very sensitive to the value of k. For another example, some methods can only detect global outliers, but fail to detect local outliers. Last but not the least, most outlier detection algorithms do not accurately distinguish between boundary points and outliers. To partially solve these problems, in this paper, we propose to augment some boundary indicators to classical outlier detection algorithms. Experiments performed on both synthetic and real data sets demonstrate the efficacy of enhanced outlier detection algorithms.
KeywordsOutlier detection Distance-based outlier detection Density-based outlier detection Boundary detection k-Nearest neighbors
The authors would like to thank the Chinese National Science Foundation for its valuable support of this work under award 61473220 and all the anonymous reviewers for their valuable comments.
- 3.Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 392–403 (1998)Google Scholar
- 4.Breuning, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)Google Scholar
- 10.UCI: The UCI KDD Archive, University of California, Irvine, CA. http://kdd.ics.uci.edu/
- 11.Aggarwal, C., Yu, P.: Outlier detection for high-dimensional data. In: Proceedings of the 2001 ACM SIGMOD Conference (SIGMOD 2001), Santa Barbara, CA, USA, pp. 37–46 (2001)Google Scholar