Online Anomaly Detection Using Random Forest
In this paper, we focus on how to use random forests based methods to improve the anomaly detection rate for streaming datasets.
The key concept in a current work  is to build a random forest where in any tree, at any internal node, a feature is randomly selected and the associated data space is partitioned in half. However, the model parameters were pre-defined and the efficiency on applying this model for various conditions is not discussed. In this paper, we first give mathematical justification of required tree height and number of trees by casting the problem as a classical coupon collector problem. Then we design a majority voting score combination strategy to combine the results from different anomaly detection trees. Finally, we apply feature clustering to group the correlated features together in order to find the anomalies jointly determined by subsets of features.
- 1.Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: Proceedings of the 2005 SIAM International Conference on Data Mining, SIAM 2005, pp. 80–91 (2005)Google Scholar
- 11.Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2007, pp. 504–515. IEEE (2007)Google Scholar
- 12.Tan, S.C., Ting, K.M., Liu, T.F.: Fast anomaly detection for streaming data. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence, vol. 22, no. 1, p. 1511 (2011)Google Scholar
- 13.Yamanishi, K., Takeuchi, J.-I.: A unifying framework for detecting outliers and change points from non-stationary time series data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 676–681. ACM (2002)Google Scholar