Abstract
Data-driven Anomaly Detection approaches have received increasing attention in many application areas in the past few years as a tool to monitor complex systems in addition to classical univariate control charts. Tree-based approaches have proven to be particularly effective when dealing with high-dimensional Anomaly Detection problems and with underlying non-gaussian data distributions. The most popular approach in this family is the Isolation Forest, which is currently one of the most popular choices for scientists and practitioners when dealing with Anomaly Detection tasks. The Isolation Forest represents a seminal algorithm upon which many extended approaches have been presented in the past years aiming at improving the original method or at dealing with peculiar application scenarios. In this work, we revise some of the most popular and powerful Tree-based approaches to Anomaly Detection (extensions of the Isolation Forest and other approaches), considering both batch and streaming data scenarios. This work will review several relevant aspects of the methods, like computational costs and interpretability traits. To help practitioners we also report available relevant libraries and open implementations, together with a review of real-world industrial applications of the considered approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmed S et al (2019) Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest. IEEE Trans Inf Forensics Secur 14(10):2765–2777
Alsini R et al (2021) Improving the outlier detection method in concrete mix design by combining the isolation forest and local outlier factor. Constr Build Mater 270:121396
Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: European conference on principles of data mining and knowledge discovery. Springer, pp 15–27
Antonini M et al (2018) Smart audio sensors in the internet of things edge for anomaly detection. IEEE Access 6:67594–67610
Aryal S, Santosh KC, Dazeley R (2020) usfAD: a robust anomaly detector based on unsupervised stochastic forest. Int J Mach Learn Cybern 12(4):1137–1150
Aryal S, et al (2014) Improving iForest with relative mass. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 510–521
Bandaragoda TR et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998
Barbariol T, Feltresi E, Susto GA (2020) Self- diagnosis of multiphase flow meters through machine learning-based anomaly detection. Energies 13(12):3136
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Brito LC, et al (2021) An explainable artificial intelligence approach for unsupervised fault detection and diagnosis in rotating machinery. arXiv preprint arXiv:2102.11848
Buschjager, S., Honysz, PJ, Morik, K (2020) Randomized outlier detection with trees. Int J Data Sci Anal 1–14
Carletti M, Terzi M, Susto GA (2020) Interpretable anomaly detection with DIFFI: depth-based feature importance for the isolation forest. arXiv preprint arXiv:2007.11117
Carletti M, et al (2019) Explainable machine learning in industry 4.0: evaluating feature importance in anomaly detection to enable root cause analysis. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 21–26
Chen F, Liu Z, Sun M (2015) Anomaly detection by using random projection forest. In: 2015 IEEE international conference on image processing (ICIP). IEEE, pp 1210–1214
Chen G, Cai YL, Shi J (2011) Ordinal isolation: an efficient and effective intelligent outlier detection algorithm. In: 2011 IEEE international conference on cyber technology in automation, control, and intelligent systems. IEEE, pp 21–26
Das M, Parthasarathy S (2009) Anomaly detection and spatio-temporal analysis of global climate system. In: Proceedings of the 3rd international workshop on knowledge discovery from sensor data, pp 142–150
Désir C et al (2013) One class random forests. Pattern Recogn 46(12):3490–3506
Dickens C et al (2020) Interpretable anomaly detection with Mondrian Polya forests on data streams. arXiv preprint arXiv:2008.01505
Ding Z-G, Da-Jun D, Fei M-R (2015) An isolation principle based distributed anomaly detection method in wireless sensor networks. Int J Autom Comput 12(4):402–412
Ding Z, Fei M (2013) An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proc Vol 46(20):12–17
Ding Z, Fei M, Dajun D (2015) An online anomaly detection method for stream data using isolation principle and statistic histogram. Int J Model Simul Sci Comput 6(2):1550017
Du J et al (2020) ITrust: an anomaly-resilient trust model based on isolation forest for underwater acoustic sensor networks. IEEE Trans Mob Comput
Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
Flach PA, Kull M (2015) Precision-recall-gain curves: PR analysis done right. NIPS, vol. 15
Gao R et al (2019) Research and improvement of isolation forest in detection of local anomaly points. J Phys Conf Ser 1237(5):052023
Ghaddar A, Darwish L, Yamout F (2019) Identifying mass-based local anomalies using binary space partitioning. In: 2019 International conference on wireless and mobile computing, networking and communications (WiMob). IEEE, pp 183–190
Goix N, et al (2017) One class splitting criteria for random forests. In: Asian conference on machine learning. PMLR, pp 343–358
Goldstein M, Dengel A (2012) Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: KI-2012: poster and demo track, pp 59–63
Gopalan P, Sharan V, Wieder U (2019) Pidforest: anomaly detection via partial identification. arXiv preprint arXiv:1912.03582
Guha S et al (2016) Robust random cut forest based anomaly detection on streams. In: International conference on machine learning. PMLR, pp 2712–2721
Hara Y, et al (2020) Fault detection of hydroelectric generators using isolation forest. In: 2020 59th annual conference of the society of instrument and control engineers of Japan (SICE). IEEE, pp 864–869
Hariri S, Kind MC, Brunner RJ (2021) Extended isolation forest. IEEE Trans Knowl Data Eng 33(4):1479–1489 (2021). https://doi.org/10.1109/TKDE.2019.2947676. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85102315664&doi=10.1109%2fTKDE.2019.2947676&partnerID=40&md5=2b9a150220b5e76da6945c12c631f6ff
Hariri S, Kind MC, Brunner RJ (2018) Extended isolation forest. arXiv preprint arXiv:1811.02141
Hawkins DM (1980) Identification of outliers, vol 11. Springer
Hill DJ, Minsker BS (2010) Anomaly detection in streaming environmental sensor data: a data-driven modeling approach. Environ Model Softw 25(9):1014–1022
Hofmockel J, Sax E (2018) Isolation forest for anomaly detection in raw vehicle sensor data. In: VEHITS 2018, pp 411–416
Holmér V (2019) Hybrid extended isolation forest: anomaly detection for bird alarm
Iglewicz B, Hoaglin DC (1993) How to detect and handle outliers, vol. 16. ASQ press
Jiang S, An Q (2008) Clustering-based outlier detection method. In: 2008 5th international conference on fuzzy systems and knowledge discovery, vol 2. IEEE, pp 429–433
John H, Naaz S (2019) Credit card fraud detection using local outlier factor and isolation forest. Int J Comput Sci Eng 7(4):1060–1064
Karczmarek P, Kiersztyn A, Pedrycz W (2020) Fuzzy set-based isolation forest. In: 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 1–6
Karczmarek, P, Kiersztyn A, Pedrycz W (2020) n-ary isolation forest: an experimental comparative analysis. In: International conference on artificial intelligence and soft computing. Springer, pp 188– 198
Karczmarek P, et al (2020) K-means-based isolation forest. In: Knowledge-based systems, vol 195, p 105659
Kim D et al (2018) Squeezed convolutional variational autoencoder for unsupervised anomaly detection in edge device industrial internet of things. In: 2018 international conference on information and computer technologies (ICICT). IEEE, pp 67–71
Kim J et al (2017) Applications of clustering and isolation forest techniques in real-time building energy-consumption data: application to LEED certified buildings. J Energy Eng 143(5):04017052
Kopp M, Pevny T, Holena M (2020) Anomaly explanation with random forests. Exp Syst Appl 149:113187
Leveni F et al (2020) PIF: anomaly detection via preference embedding
Li C et al (2021) Similarity-measured isolation forest: anomaly detection method for machine monitoring data. IEEE Trans Instrum Meas 70:1–12
Li S et al (2019) Hyperspectral anomaly detection with kernel isolation forest. IEEE Trans Geosci Remote Sens 58(1):319–329
Liao L, Luo B (2018) Entropy isolation forest based on dimension entropy for anomaly detection. In: International symposium on intelligence computation and applications. Springer, pp 365–376
Lin Z, Liu X, Collu M (2020) Wind power prediction based on high-frequency SCADA data along with isolation forest and deep learning neural networks. Int J Electr Power Energy Syst 118:105835
Liu FT, Ting KM, Zhou Z-H (2012) Isolation-based anomaly detection. ACM Trans Knowl Disc Data (TKDD) 6(1):1–39
Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: 2008 8th IEEE international conference on data mining. IEEE, pp 413–422
Liu FT, Ting KM, Zhou Z-H (2010) On detecting clustered anomalies using SCiForest. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 274–290
Liu J et al (2018) Anomaly detection in manufacturing systems using structured neural networks. In: 2018 13th world congress on intelligent control and automation (WCICA). IEEE, pp 175–180
Liu W et al (2019) A method for the detection of fake reviews based on temporal features of reviews and comments. IEEE Eng Manage Rev 47(4):67–79
Liu Z et al (2018) An optimized computational framework for isolation forest. In: Mathematical problems in engineering 2018
Luo S et al (2019) An attribute associated isolation forest algorithm for detecting anomalous electro-data. In: 2019 chinese control conference (CCC). IEEE, pp 3788–3792
Lyu Y et al (2020) RMHSForest: relative mass and half-space tree based forest for anomaly detection. Chin J Electr 29(6):1093–1101
Ma H et al (2020) Isolation Mondrian forest for batch and online anomaly detection. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 3051–3058
Maggipinto M, Beghi A, Susto GA (2019) A deep learning-based approach to anomaly detection with 2-dimensional data in manufacturing. In: 2019 IEEE 17th international conference on industrial informatics (INDIN), vol 1. IEEE, pp 187–192
Malanchev KL et al (2019) Use of machine learning for anomaly detection problem in large astronomical databases. In: DAMDID/RCDL, pp 205–216
Mao W et al (2018) Anomaly detection for power consumption data based on isolated forest. In: 2018 international conference on power system technology (POWERCON). IEEE, pp 4169–4174
Marteau P-F, Soheily-Khah S, Béchet N (2017) Hybrid isolation forest-application to intrusion detection. arXiv preprint arXiv:1705.03800
Meneghetti L et al (2018) Data-driven anomaly recognition for unsupervised model-free fault detection in artificial pancreas. IEEE Trans Control Syst Technol 28(1):33–47
Mensi A, Bicego M (2019) A novel anomaly score for isolation forests. In: International conference on image analysis and processing. Springer, pp 152–163
Park CH, Kim J (2021) An explainable outlier detection method using region-partition trees. J Supercomput 77(3):3062–3076
Pevny T (2016) Loda: lightweight on-line detector of anomalies. Mach Learn 102(2):275–304
Puggini L, McLoone S (2018) An enhanced variable selection and Isolation Forest based methodology for anomaly detection with OES data. Eng Appl Artif Intell 67:126–135
Qu H, Li Z, Wu J (2020) Integrated learning method for anomaly detection combining KLSH and isolation principles. In: 2020 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–6
Rao GM, Ramesh D (2021) A hybrid and improved isolation forest algorithm for anomaly detection. In: Proceedings of international conference on recent trends in machine learning, IoT, smart cities and applications. Springer, pp 589–598
Riazi M, et al.: Detecting the onset of machine failure using anomaly detection methods. In: International conference on big data analytics and knowledge discovery. Springer, pp 3–12
Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10(3):e0118432
de Santis RB, Costa MA (2020) Extended isolation forests for fault detection in small hydroelectric plants. Sustainability 12(16):6421
Shen Y et al (2016) A novel isolation-based outlier detection method. In: Pacific rim international conference on artificial intelligence. Springer, pp 446–456
Staerman G et al (2019) Functional isolation forest. In: Asian conference on machine learning. PMLR, pp 332–347
Sternby J, Thormarker E, Liljenstam M (2020) Anomaly detection forest
Stojanovic L et al (2016) Big-data-driven anomaly detection in industry (4.0): an approach and a case study. In: 2016 IEEE international conference on big data (big data). IEEE, pp 1647–1652
Sun H, et al (2019) Fast anomaly detection in multiple multi-dimensional data streams. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 1218–1223
Susto GA, Beghi A, McLoone S (2017) Anomaly detection through on-line isolation forest: an application to plasma etching. In: 2017 28th annual SEMI advanced semiconductor manufacturing conference (ASMC). IEEE, pp 89–94
Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: 22nd international joint conference on artificial intelligence
Tan Y, et al (2020) Decay detection of a marine gas turbine with contaminated data based on isolation forest approach. In: Ships and offshore structures, pp 1–11
Ting KM, et al (2013) Mass estimation. In: Machine learning, vol 90, no 1, pp 127–160
Ting KM et al (2010) Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 989–998
Togbe MU et al (2021) Anomalies detection using isolation in concept-drifting data streams. Computers 10(1):13
Tran PH, Heuchenne C, Thomassey S (2020) An anomaly detection approach based on the combination of LSTM autoencoder and isolation forest for multivariate time series data. In: FLINS 2020: proceedings of the 14th international FLINS conference on robotics and artificial intelligence. World Scientific, pp 18–21
Tsou Y-L, et al (2018) Robust distributed anomaly detection using optimal weighted one-class random forests. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 1272–1277
Wang Y-B et al (2019) Separating multi-source partial discharge signals using linear prediction analysis and isolation forest algorithm. IEEE Trans Instrum Meas 69(6):2734–2742
Weber M, et al (2018) Embedded hybrid anomaly detection for automotive CAN communication. In: ERTS 2018: 9th european congress on embedded real time software and systems
Wetzig R, Gulenko A, Schmidt F (2019) Unsupervised anomaly alerting for iot-gateway monitoring using adaptive thresholds and half- space trees. In: 2019 6th international conference on internet of things: systems, management and security (IOTSMS). IEEE, pp 161–168
Wu K, et al (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: 2014 IEEE international conference on data mining. IEEE, pp 600–609
Wu T, Zhang Y-JA, Tang X (2018) Isolation forest based method for low-quality synchrophasor measurements and early events detection. In: 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm). IEEE, pp 1–7
Xiang H et al (2020) OPHiForest: order preserving hashing based isolation forest for robust and scalable anomaly detection. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 1655–1664
Yang Q, Singh J, Lee J (2019) Isolation-based feature selection for unsupervised outlier detection. In: Annual conference of the PHM society, vol 11
Yao C et al (2019) Distribution forest: an anomaly detection method based on isolation forest. In: International symposium on advanced parallel processing technologies. Springer, pp 135–147
Yu X, Tang LA, Han J (2009) Filtering and refinement: a two stage approach for efficient and effective anomaly detection. In: 2009 9th IEEE international conference on data mining. IEEE, pp 617–626
Zhang C et al (2018) A novel anomaly detection algorithm based on trident tree. In: International conference on cloud computing. Springer, pp 295–306
Zhang X et al (2017) LSHiForest: a generic framework for fast tree isolation based ensemble anomaly analysis. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE, pp 983–994
Zhang Y et al (2019) Anomaly detection for industry product quality inspection based on Gaussian restricted Boltzmann machine. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 1–6
Zhong S et al (2019) A novel unsupervised anomaly detection for gas turbine using isolation forest. In: 2019 IEEE international conference on prognostics and health management (ICPHM). IEEE, pp 1–6
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Barbariol, T., Chiara, F.D., Marcato, D., Susto, G.A. (2022). A Review of Tree-Based Approaches for Anomaly Detection. In: Tran, K.P. (eds) Control Charts and Machine Learning for Anomaly Detection in Manufacturing. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-83819-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-83819-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83818-8
Online ISBN: 978-3-030-83819-5
eBook Packages: EngineeringEngineering (R0)