Advertisement

Entropy Isolation Forest Based on Dimension Entropy for Anomaly Detection

  • Liefa Liao
  • Bin LuoEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 986)

Abstract

Anomaly detection, as an important basic research task in the field of data mining, has been concerned by both industry and academia. Among many anomaly detection methods, iForest (isolation Forest) has low time complexity and good detection effect. It has better adaptability in the face of high-capacity and high-dimensional data. However, iForest is not suitable for the special high-dimensional data, is not stable enough, and is not so robust to the noise features. In view of these problems, this paper proposes an improved anomaly detection method E-iForest (entropy-isolation forest) based on dimension entropy. By introducing the dimension entropy as the basis for selecting the isolation attribute and the isolation point during the training process, the method uses three isolation strategies and adjust the path length calculation. The experiments show that the E-iForest has better detection effect, has better speed in high-capacity datasets, is more stable than iForest and is more robust to the noise features.

Keywords

Anomaly detection Dimension entropy Isolation strategies Robustness 

References

  1. 1.
    Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237–263. Springer, Cham (2015)Google Scholar
  2. 2.
    Zhang, J.: Advancements of outlier detection: a survey. ICST Trans. Scalable Inf. Syst. 13(1), 1–26 (2013)CrossRefGoogle Scholar
  3. 3.
    Sowmya, R., Suneetha, K.R.: Data mining with big data. In: 2017 11th International Conference on Intelligent Systems and Control (ISCO), pp. 246–250. IEEE (2017)Google Scholar
  4. 4.
    Pang, G., Cao, L., Chen, L., Lian, D., Liu, H.: Sparse modeling-based sequential ensemble learning for effective outlier detection in high-dimensional numeric data. In: AAAI (2018)Google Scholar
  5. 5.
    Domingues, R., Filippone, M., Michiardi, P., Zouaoui, J.: A comparative evaluation of outlier detection algorithms: experiments and analyses. Pattern Recogn. 74, 406–421 (2018)CrossRefGoogle Scholar
  6. 6.
    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)Google Scholar
  7. 7.
    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data (TKDD) 6(1), 3 (2012)Google Scholar
  8. 8.
    Aryal, S., Ting, K.M., Wells, J.R., Washio, T.: Improving iforest with relative mass. In: Tseng, V.S., Ho, T.B., Zhou, Z.H., Chen, A.L.P., Kao, H.Y. (eds.) PAKDD 2014. LNCS, vol. 8444, pp. 510–521. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-06605-9_42CrossRefGoogle Scholar
  9. 9.
    Liu, F.T., Ting, K.M., Zhou, Z.H.: On detecting clustered anomalies using SCiForest. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS, vol. 6322, pp. 274–290. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15883-4_18CrossRefGoogle Scholar
  10. 10.
    Zhang, X., et al.: LSHiForest: a generic framework for fast tree isolation based ensemble anomaly analysis. In: 2017 IEEE 33rd International Conference on Data Engineering (ICDE), pp. 983–994. IEEE (2017)Google Scholar
  11. 11.
    Györfi, L., Van der Meulen, E.C.: Density-free convergence properties of various estimators of entropy. Comput. Stat. Data Anal. 5(4), 425–436 (1987)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Dua, D., Karra Taniskidou, E.: UCI Machine Learning Repository (2017)Google Scholar
  13. 13.
    Dal Pozzolo, A., Caelen, O., Johnson, R.A., Bontempi, G.: Calibrating probability with undersampling for unbalanced classification. In: 2015 IEEE Symposium Series on Computational Intelligence, pp. 159–166. IEEE (2015)Google Scholar
  14. 14.
    Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, no. 2, pp. 93–104. ACM (2000)Google Scholar
  15. 15.
    Momtaz, R., Mohssen, N., Gowayyed, M.A.: DWOF: a robust density-based outlier detection approach. In: Sanches, J.M., Micó, L., Cardoso, J.S. (eds.) IbPRIA 2013, vol. 7887, pp. 517–525. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38628-2_61CrossRefGoogle Scholar
  16. 16.
    Kriegel, H.P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.B. (eds.) PAKDD 2009, vol. 5476, pp. 831–838. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-01307-2_86CrossRefGoogle Scholar
  17. 17.
    Schubert, E., Zimek, A., Kriegel, H.P.: Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Mining Knowl. Discov. 28(1), 190–237 (2014)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Achtert, E., Kriegel, H.P., Zimek, A.: ELKI: a software system for evaluation of subspace clustering algorithms. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008, vol. 5069, pp. 580–585. Springer, Heidelberg (2008).  https://doi.org/10.1007/978-3-540-69497-7_41CrossRefGoogle Scholar
  19. 19.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Jiangxi University of Science and TechnologyGanzhouChina

Personalised recommendations