ODRA: an outlier detection algorithm based on relevant attribute analysis method

Abstract

Advances in data acquisition have generated an enormous amount of data that captures business, commercial, technological and scientific information. However, some occurrences are rare or unusual, irrespective of a large amount of data available. These rare occurrences in data mining are usually referred to as outliers or anomalies. All these rare occurrences are infrequent. Sometimes it varies from 0.01% to 10% depending on the type of application. In recent years, outlier detection has become important in many applications and has attracted considerable attention among the increasing number of data mining techniques. Focusing on this has resulted in several outlier detection algorithms, mostly based on distance or density. However, each method has its inherent weaknesses. Methods based on distance have problems with local density, and methods based on density have problems with low-density patterns. In this paper, we present a new outlier detection algorithm based on the relevant attribute analysis (ODRA) for local outlier detection in a high-dimensional dataset. There are two phases of the proposed algorithm. During the preliminary stage, we present a data reduction method that reduces the data set by pruning irrelevant attributes and data points. In the second phase, we propose an outlier detection method based on k-NN kernel density estimation. The experimental results on 15 UCI machine learning repository datasets show the supremacy and effectiveness of our proposed approach over state-of-the-art outlier detection methods.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    http://www.archive.ics.uci.edu/ml/

References

  1. 1.

    Aggarwal, C.C., Philip, S.Y.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)

    Article  Google Scholar 

  2. 2.

    Aggarwal, C.C., Philip, S.: Outlier detection for high dimensional data. ACM Sigmod. Record. 10, 37–46 (2001)

    Article  Google Scholar 

  3. 3.

    Barnett, V., Lewis, T., et al.: Outliers in Statistical Data, vol. 3. Wiley, New York (1994)

    Google Scholar 

  4. 4.

    Bouguessa, M., Wang, S.: Mining projected clusters in high-dimensional spaces. IEEE Trans. Knowl. Data Eng. 21(4), 507–522 (2009)

    Article  Google Scholar 

  5. 5.

    Breunig, M. M., Kriegel, H.-P., Ng, R. T., Sander, J.: Lof: identifying density-based local outliers. In ACM sigmod record, vol.29, pp. 93–104. ACM, (2000)

  6. 6.

    Campos, G.O., Zimek, A., Sander, J., Campello, R.J.G.B., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Mining Knowl. Discov. 30(4), 891–927 (2016)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Cheng, Z., Zou, C., Dong, J.: Outlier detection using isolation forest and local outlier factor. In: Proceedings of the conference on research in adaptive and convergent systems, pp. 161–168, (2019)

  8. 8.

    Craswell, N: R-precision, encyclopedia of database systems, (2009)

  9. 9.

    Friedman, M.: The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 32(200), 675–701 (1937)

    Article  Google Scholar 

  10. 10.

    Hawkins, D.M.: Identification of Outliers. Springer, New York (1980)

    Google Scholar 

  11. 11.

    Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Pacific-Asia conference on knowledge discovery and data mining, pp. 577–593. Springer, (2006)

  12. 12.

    Keller, F., Muller, E., Bohm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: Data engineering (ICDE), 2012 IEEE 28th international conference on, pp. 1037–1048. IEEE, (2012)

  13. 13.

    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Loop: local outlier probabilities. In: Proceedings of the 18th ACM conference on information and knowledge management, pp. 1649–1652. ACM, (2009)

  14. 14.

    Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In Advances in knowledge discovery and data mining, pp. 831–838, (2009)

  15. 15.

    Kriegel, H.-P., Kroger, P., Schubert, E., Zimek, A.: Outlier detection in arbitrarily oriented subspaces. In: Data mining (ICDM), 2012 IEEE 12th international conference on, pp. 379–388. IEEE, (2012)

  16. 16.

    Kriegel, H.-P., Zimek, A. et al.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 444–452. ACM, (2008)

  17. 17.

    Lichman, M.: UCI machine learning repository. irvine, ca: University of california, school of information and computer science. http://archive.ics.uci.edu/ml, (2013)

  18. 18.

    Müller, E., Schiffer, M., Seidl, T..: Statistical selection of relevant subspace projections for outlier ranking. In: Data engineering (ICDE), 2011 IEEE 27th international conference on, pp. 434–445. IEEE, (2011)

  19. 19.

    Pham, N., Pagh, R..: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 877–885. ACM, (2012)

  20. 20.

    Schubert, E., Zimek, A., Kriegel, H.-P.: Generalized outlier detection with flexible kernel density estimates. In: Proceedings of the 2014 SIAM international conference on data mining, pp. 542–550. SIAM, (2014)

  21. 21.

    Schubert, E., Zimek, A., Kriegel, H.-P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Min. Knowl. Discov. 28(1), 190–237 (2014)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Tang, J., Chen, Z., Fu, A. W.C., Cheung, D.: A robust outlier detection scheme for large data sets. In: In 6th Pacific-Asia conference on knowledge discovery and data mining. Citeseer, (2001)

  23. 23.

    Tang, J., Chen, Z., Fu, A.W.-C., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Pacific-Asia conference on knowledge discovery and data mining, pp 535–548. Springer, (2002)

  24. 24.

    Vázquez, F.I., Zseby, T., Zimek, A..: Outlier detection based on low density models. In: 2018 IEEE international conference on data mining workshops (ICDMW), pp. 970–979. IEEE, (2018)

  25. 25.

    Xie, J., Xiong, Z., Dai, Q., Wang, X., Zhang, Y.: A local-gravitation-based method for the detection of outliers and boundary points. Knowl. Based Syst. 192, 105331 (2020)

    Article  Google Scholar 

  26. 26.

    Zhang, E., Zhang, Y..: Average precision. In Encyclopedia of Database Systems, pp. 192–193. Springer, (2009)

  27. 27.

    Zhang, J., Jiang, Y., Chang, K.H., Zhang, S., Cai, J., Hu, L.: A concept lattice based outlier mining method in low-dimensional subspaces. Pattern Recognit. Lett. 30(15), 1434–1439 (2009)

    Article  Google Scholar 

  28. 28.

    Zhang, J., Zhang, S., Chang, K.H., Qin, X.: An outlier mining algorithm based on constrained concept lattice. Int. J. Syst. Sci. 45(5), 1170–1179 (2014)

    MathSciNet  Article  Google Scholar 

  29. 29.

    Zhao, X., Zhang, J., Qin, X.: Loma: a local outlier mining algorithm based on attribute relevance analysis. Expert Syst. Appl. 84, 272–280 (2017)

    Article  Google Scholar 

  30. 30.

    Zhu, C., Kitagawa, H., Faloutsos, C..: Example-based robust outlier detection in high dimensional datasets. In: Data mining, fifth IEEE international conference on, pp. 4–pp. IEEE, (2005)

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Abdul Wahid.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wahid, A., Rao, A. ODRA: an outlier detection algorithm based on relevant attribute analysis method. Cluster Comput (2020). https://doi.org/10.1007/s10586-020-03136-9

Download citation

Keywords

  • Unsupervised outlier detection
  • Distance-based
  • Density-based
  • Data set reduction
  • Nearest neighbours
  • Kernel density estimation