Advertisement

Enhancing Outlier Detection by an Outlier Indicator

  • Xiaqiong Li
  • Xiaochun WangEmail author
  • Xia Li Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10934)

Abstract

Outlier detection is an important task in data mining and has high practical value in numerous applications such as astronomical observation, text detection, fraud detection and so on. At present, a large number of popular outlier detection algorithms are available, including distribution-based, distance-based, density-based, and clustering-based approaches and so on. However, traditional outlier detection algorithms face some challenges. For one example, most distance-based and density-based outlier detection methods are based on k-nearest neighbors and therefore, are very sensitive to the value of k. For another example, some methods can only detect global outliers, but fail to detect local outliers. Last but not the least, most outlier detection algorithms do not accurately distinguish between boundary points and outliers. To partially solve these problems, in this paper, we propose to augment some boundary indicators to classical outlier detection algorithms. Experiments performed on both synthetic and real data sets demonstrate the efficacy of enhanced outlier detection algorithms.

Keywords

Outlier detection Distance-based outlier detection Density-based outlier detection Boundary detection k-Nearest neighbors 

Notes

Acknowledgment

The authors would like to thank the Chinese National Science Foundation for its valuable support of this work under award 61473220 and all the anonymous reviewers for their valuable comments.

References

  1. 1.
    Hawkins, D.M.: Identification of Outliers. Monographs on Applied Probability and Statistics. Chapman and Hall, London (1980)CrossRefGoogle Scholar
  2. 2.
    Barnett, V., Lewis, T.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)zbMATHGoogle Scholar
  3. 3.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proceedings of the 24th VLDB Conference, New York, USA, pp. 392–403 (1998)Google Scholar
  4. 4.
    Breuning, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, pp. 93–104 (2000)Google Scholar
  5. 5.
    Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recogn. Lett. 22, 691–700 (2001)CrossRefGoogle Scholar
  6. 6.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45681-3_2CrossRefGoogle Scholar
  7. 7.
    Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the ACM SIGMOD Conference, pp. 427–438 (2000)CrossRefGoogle Scholar
  8. 8.
    Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006).  https://doi.org/10.1007/11731139_68CrossRefGoogle Scholar
  9. 9.
    Huang, H., Mehrotra, K., Mohan, C.K.: Rank-based outlier detection. J. Stat. Comput. Simul. 83(3), 1–14 (2013)MathSciNetCrossRefGoogle Scholar
  10. 10.
    UCI: The UCI KDD Archive, University of California, Irvine, CA. http://kdd.ics.uci.edu/
  11. 11.
    Aggarwal, C., Yu, P.: Outlier detection for high-dimensional data. In: Proceedings of the 2001 ACM SIGMOD Conference (SIGMOD 2001), Santa Barbara, CA, USA, pp. 37–46 (2001)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Software EngineeringXi’an Jiaotong UniversityXi’anChina
  2. 2.School of Information EngineeringChangan UniveristyXi’anChina

Personalised recommendations