Advertisement

Online Anomaly Detection Using Random Forest

  • Zhiruo ZhaoEmail author
  • Kishan G. Mehrotra
  • Chilukuri K. Mohan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10868)

Abstract

In this paper, we focus on how to use random forests based methods to improve the anomaly detection rate for streaming datasets.

The key concept in a current work [12] is to build a random forest where in any tree, at any internal node, a feature is randomly selected and the associated data space is partitioned in half. However, the model parameters were pre-defined and the efficiency on applying this model for various conditions is not discussed. In this paper, we first give mathematical justification of required tree height and number of trees by casting the problem as a classical coupon collector problem. Then we design a majority voting score combination strategy to combine the results from different anomaly detection trees. Finally, we apply feature clustering to group the correlated features together in order to find the anomalies jointly determined by subsets of features.

References

  1. 1.
    Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM 2005, pp. 80–91 (2005)Google Scholar
  2. 2.
    Beckman, R.J., Cook, R.D.: Outlier.......... s. Technometrics 25(2), 119–149 (1983).  https://doi.org/10.1080/00401706.1983.10487840MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. (CSUR) 41(3), 15 (2009)CrossRefGoogle Scholar
  4. 4.
    Chen, Q., Luley, R., Wu, Q., Bishop, M., Linderman, R.W., Qiu, Q.: AnRAD: a neuromorphic anomaly detection framework for massive concurrent data streams. IEEE Trans. Neural Netw. Learn. Syst. 29(5), 1622–1636 (2017)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315(5814), 972–976 (2007)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Gupta, M., Gao, J., Aggarwal, C.C., Han, J.: Outlier detection for temporal data: a survey. IEEE Trans. Knowl. Data Eng. 26(9), 2250–2267 (2014)CrossRefGoogle Scholar
  8. 8.
    Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, New York (2011)zbMATHGoogle Scholar
  9. 9.
    Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)CrossRefGoogle Scholar
  10. 10.
    Motwani, R., Raghavan, P.: Randomized Algorithms. Chapman & Hall/CRC, London (2010)zbMATHGoogle Scholar
  11. 11.
    Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2007, pp. 504–515. IEEE (2007)Google Scholar
  12. 12.
    Tan, S.C., Ting, K.M., Liu, T.F.: Fast anomaly detection for streaming data. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1, p. 1511 (2011)Google Scholar
  13. 13.
    Yamanishi, K., Takeuchi, J.-I.: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 676–681. ACM (2002)Google Scholar
  14. 14.
    Zhao, Z., Mehrotra, K.G., Mohan, C.K.: Ensemble algorithms for unsupervised anomaly detection. In: Ali, M., Kwon, Y.S., Lee, C.-H., Kim, J., Kim, Y. (eds.) IEA/AIE 2015. LNCS (LNAI), vol. 9101, pp. 514–525. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-19066-2_50CrossRefGoogle Scholar
  15. 15.
    Zikeba, M., Tomczak, S.K., Tomczak, J.M.: Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Syst. Appl. 58, 93–101 (2016)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Zhiruo Zhao
    • 1
    Email author
  • Kishan G. Mehrotra
    • 1
  • Chilukuri K. Mohan
    • 1
  1. 1.Syracuse UniversitySyracuseUSA

Personalised recommendations