Skip to main content

A Review of Tree-Based Approaches for Anomaly Detection

  • Chapter
  • First Online:
Control Charts and Machine Learning for Anomaly Detection in Manufacturing

Part of the book series: Springer Series in Reliability Engineering ((RELIABILITY))

Abstract

Data-driven Anomaly Detection approaches have received increasing attention in many application areas in the past few years as a tool to monitor complex systems in addition to classical univariate control charts. Tree-based approaches have proven to be particularly effective when dealing with high-dimensional Anomaly Detection problems and with underlying non-gaussian data distributions. The most popular approach in this family is the Isolation Forest, which is currently one of the most popular choices for scientists and practitioners when dealing with Anomaly Detection tasks. The Isolation Forest represents a seminal algorithm upon which many extended approaches have been presented in the past years aiming at improving the original method or at dealing with peculiar application scenarios. In this work, we revise some of the most popular and powerful Tree-based approaches to Anomaly Detection (extensions of the Isolation Forest and other approaches), considering both batch and streaming data scenarios. This work will review several relevant aspects of the methods, like computational costs and interpretability traits. To help practitioners we also report available relevant libraries and open implementations, together with a review of real-world industrial applications of the considered approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmed S et al (2019) Unsupervised machine learning-based detection of covert data integrity assault in smart grid networks utilizing isolation forest. IEEE Trans Inf Forensics Secur 14(10):2765–2777

    Article  Google Scholar 

  2. Alsini R et al (2021) Improving the outlier detection method in concrete mix design by combining the isolation forest and local outlier factor. Constr Build Mater 270:121396

    Article  Google Scholar 

  3. Angiulli F, Pizzuti C (2002) Fast outlier detection in high dimensional spaces. In: European conference on principles of data mining and knowledge discovery. Springer, pp 15–27

    Google Scholar 

  4. Antonini M et al (2018) Smart audio sensors in the internet of things edge for anomaly detection. IEEE Access 6:67594–67610

    Article  Google Scholar 

  5. Aryal S, Santosh KC, Dazeley R (2020) usfAD: a robust anomaly detector based on unsupervised stochastic forest. Int J Mach Learn Cybern 12(4):1137–1150

    Article  Google Scholar 

  6. Aryal S, et al (2014) Improving iForest with relative mass. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 510–521

    Google Scholar 

  7. Bandaragoda TR et al (2018) Isolation-based anomaly detection using nearest-neighbor ensembles. Comput Intell 34(4):968–998

    Article  MathSciNet  Google Scholar 

  8. Barbariol T, Feltresi E, Susto GA (2020) Self- diagnosis of multiphase flow meters through machine learning-based anomaly detection. Energies 13(12):3136

    Article  Google Scholar 

  9. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  10. Brito LC, et al (2021) An explainable artificial intelligence approach for unsupervised fault detection and diagnosis in rotating machinery. arXiv preprint arXiv:2102.11848

  11. Buschjager, S., Honysz, PJ, Morik, K (2020) Randomized outlier detection with trees. Int J Data Sci Anal 1–14

    Google Scholar 

  12. Carletti M, Terzi M, Susto GA (2020) Interpretable anomaly detection with DIFFI: depth-based feature importance for the isolation forest. arXiv preprint arXiv:2007.11117

  13. Carletti M, et al (2019) Explainable machine learning in industry 4.0: evaluating feature importance in anomaly detection to enable root cause analysis. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 21–26

    Google Scholar 

  14. Chen F, Liu Z, Sun M (2015) Anomaly detection by using random projection forest. In: 2015 IEEE international conference on image processing (ICIP). IEEE, pp 1210–1214

    Google Scholar 

  15. Chen G, Cai YL, Shi J (2011) Ordinal isolation: an efficient and effective intelligent outlier detection algorithm. In: 2011 IEEE international conference on cyber technology in automation, control, and intelligent systems. IEEE, pp 21–26

    Google Scholar 

  16. Das M, Parthasarathy S (2009) Anomaly detection and spatio-temporal analysis of global climate system. In: Proceedings of the 3rd international workshop on knowledge discovery from sensor data, pp 142–150

    Google Scholar 

  17. Désir C et al (2013) One class random forests. Pattern Recogn 46(12):3490–3506

    Article  Google Scholar 

  18. Dickens C et al (2020) Interpretable anomaly detection with Mondrian Polya forests on data streams. arXiv preprint arXiv:2008.01505

  19. Ding Z-G, Da-Jun D, Fei M-R (2015) An isolation principle based distributed anomaly detection method in wireless sensor networks. Int J Autom Comput 12(4):402–412

    Article  Google Scholar 

  20. Ding Z, Fei M (2013) An anomaly detection approach based on isolation forest algorithm for streaming data using sliding window. IFAC Proc Vol 46(20):12–17

    Article  Google Scholar 

  21. Ding Z, Fei M, Dajun D (2015) An online anomaly detection method for stream data using isolation principle and statistic histogram. Int J Model Simul Sci Comput 6(2):1550017

    Article  Google Scholar 

  22. Du J et al (2020) ITrust: an anomaly-resilient trust model based on isolation forest for underwater acoustic sensor networks. IEEE Trans Mob Comput

    Google Scholar 

  23. Dua D, Graff C (2017) UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

  24. Flach PA, Kull M (2015) Precision-recall-gain curves: PR analysis done right. NIPS, vol. 15

    Google Scholar 

  25. Gao R et al (2019) Research and improvement of isolation forest in detection of local anomaly points. J Phys Conf Ser 1237(5):052023

    Article  Google Scholar 

  26. Ghaddar A, Darwish L, Yamout F (2019) Identifying mass-based local anomalies using binary space partitioning. In: 2019 International conference on wireless and mobile computing, networking and communications (WiMob). IEEE, pp 183–190

    Google Scholar 

  27. Goix N, et al (2017) One class splitting criteria for random forests. In: Asian conference on machine learning. PMLR, pp 343–358

    Google Scholar 

  28. Goldstein M, Dengel A (2012) Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: KI-2012: poster and demo track, pp 59–63

    Google Scholar 

  29. Gopalan P, Sharan V, Wieder U (2019) Pidforest: anomaly detection via partial identification. arXiv preprint arXiv:1912.03582

  30. Guha S et al (2016) Robust random cut forest based anomaly detection on streams. In: International conference on machine learning. PMLR, pp 2712–2721

    Google Scholar 

  31. Hara Y, et al (2020) Fault detection of hydroelectric generators using isolation forest. In: 2020 59th annual conference of the society of instrument and control engineers of Japan (SICE). IEEE, pp 864–869

    Google Scholar 

  32. Hariri S, Kind MC, Brunner RJ (2021) Extended isolation forest. IEEE Trans Knowl Data Eng 33(4):1479–1489 (2021). https://doi.org/10.1109/TKDE.2019.2947676. https://www.scopus.com/inward/record.uri?eid=2-s2.0-85102315664&doi=10.1109%2fTKDE.2019.2947676&partnerID=40&md5=2b9a150220b5e76da6945c12c631f6ff

  33. Hariri S, Kind MC, Brunner RJ (2018) Extended isolation forest. arXiv preprint arXiv:1811.02141

  34. Hawkins DM (1980) Identification of outliers, vol 11. Springer

    Google Scholar 

  35. Hill DJ, Minsker BS (2010) Anomaly detection in streaming environmental sensor data: a data-driven modeling approach. Environ Model Softw 25(9):1014–1022

    Article  Google Scholar 

  36. Hofmockel J, Sax E (2018) Isolation forest for anomaly detection in raw vehicle sensor data. In: VEHITS 2018, pp 411–416

    Google Scholar 

  37. Holmér V (2019) Hybrid extended isolation forest: anomaly detection for bird alarm

    Google Scholar 

  38. Iglewicz B, Hoaglin DC (1993) How to detect and handle outliers, vol. 16. ASQ press

    Google Scholar 

  39. Jiang S, An Q (2008) Clustering-based outlier detection method. In: 2008 5th international conference on fuzzy systems and knowledge discovery, vol 2. IEEE, pp 429–433

    Google Scholar 

  40. John H, Naaz S (2019) Credit card fraud detection using local outlier factor and isolation forest. Int J Comput Sci Eng 7(4):1060–1064

    Google Scholar 

  41. Karczmarek P, Kiersztyn A, Pedrycz W (2020) Fuzzy set-based isolation forest. In: 2020 IEEE international conference on fuzzy systems (FUZZ-IEEE). IEEE, pp 1–6

    Google Scholar 

  42. Karczmarek, P, Kiersztyn A, Pedrycz W (2020) n-ary isolation forest: an experimental comparative analysis. In: International conference on artificial intelligence and soft computing. Springer, pp 188– 198

    Google Scholar 

  43. Karczmarek P, et al (2020) K-means-based isolation forest. In: Knowledge-based systems, vol 195, p 105659

    Google Scholar 

  44. Kim D et al (2018) Squeezed convolutional variational autoencoder for unsupervised anomaly detection in edge device industrial internet of things. In: 2018 international conference on information and computer technologies (ICICT). IEEE, pp 67–71

    Google Scholar 

  45. Kim J et al (2017) Applications of clustering and isolation forest techniques in real-time building energy-consumption data: application to LEED certified buildings. J Energy Eng 143(5):04017052

    Article  Google Scholar 

  46. Kopp M, Pevny T, Holena M (2020) Anomaly explanation with random forests. Exp Syst Appl 149:113187

    Article  Google Scholar 

  47. Leveni F et al (2020) PIF: anomaly detection via preference embedding

    Google Scholar 

  48. Li C et al (2021) Similarity-measured isolation forest: anomaly detection method for machine monitoring data. IEEE Trans Instrum Meas 70:1–12

    Google Scholar 

  49. Li S et al (2019) Hyperspectral anomaly detection with kernel isolation forest. IEEE Trans Geosci Remote Sens 58(1):319–329

    Article  Google Scholar 

  50. Liao L, Luo B (2018) Entropy isolation forest based on dimension entropy for anomaly detection. In: International symposium on intelligence computation and applications. Springer, pp 365–376

    Google Scholar 

  51. Lin Z, Liu X, Collu M (2020) Wind power prediction based on high-frequency SCADA data along with isolation forest and deep learning neural networks. Int J Electr Power Energy Syst 118:105835

    Article  Google Scholar 

  52. Liu FT, Ting KM, Zhou Z-H (2012) Isolation-based anomaly detection. ACM Trans Knowl Disc Data (TKDD) 6(1):1–39

    Article  Google Scholar 

  53. Liu FT, Ting KM, Zhou Z-H (2008) Isolation forest. In: 2008 8th IEEE international conference on data mining. IEEE, pp 413–422

    Google Scholar 

  54. Liu FT, Ting KM, Zhou Z-H (2010) On detecting clustered anomalies using SCiForest. In: Joint european conference on machine learning and knowledge discovery in databases. Springer, pp 274–290

    Google Scholar 

  55. Liu J et al (2018) Anomaly detection in manufacturing systems using structured neural networks. In: 2018 13th world congress on intelligent control and automation (WCICA). IEEE, pp 175–180

    Google Scholar 

  56. Liu W et al (2019) A method for the detection of fake reviews based on temporal features of reviews and comments. IEEE Eng Manage Rev 47(4):67–79

    Article  Google Scholar 

  57. Liu Z et al (2018) An optimized computational framework for isolation forest. In: Mathematical problems in engineering 2018

    Google Scholar 

  58. Luo S et al (2019) An attribute associated isolation forest algorithm for detecting anomalous electro-data. In: 2019 chinese control conference (CCC). IEEE, pp 3788–3792

    Google Scholar 

  59. Lyu Y et al (2020) RMHSForest: relative mass and half-space tree based forest for anomaly detection. Chin J Electr 29(6):1093–1101

    Article  MathSciNet  Google Scholar 

  60. Ma H et al (2020) Isolation Mondrian forest for batch and online anomaly detection. In: 2020 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 3051–3058

    Google Scholar 

  61. Maggipinto M, Beghi A, Susto GA (2019) A deep learning-based approach to anomaly detection with 2-dimensional data in manufacturing. In: 2019 IEEE 17th international conference on industrial informatics (INDIN), vol 1. IEEE, pp 187–192

    Google Scholar 

  62. Malanchev KL et al (2019) Use of machine learning for anomaly detection problem in large astronomical databases. In: DAMDID/RCDL, pp 205–216

    Google Scholar 

  63. Mao W et al (2018) Anomaly detection for power consumption data based on isolated forest. In: 2018 international conference on power system technology (POWERCON). IEEE, pp 4169–4174

    Google Scholar 

  64. Marteau P-F, Soheily-Khah S, Béchet N (2017) Hybrid isolation forest-application to intrusion detection. arXiv preprint arXiv:1705.03800

  65. Meneghetti L et al (2018) Data-driven anomaly recognition for unsupervised model-free fault detection in artificial pancreas. IEEE Trans Control Syst Technol 28(1):33–47

    Article  Google Scholar 

  66. Mensi A, Bicego M (2019) A novel anomaly score for isolation forests. In: International conference on image analysis and processing. Springer, pp 152–163

    Google Scholar 

  67. Park CH, Kim J (2021) An explainable outlier detection method using region-partition trees. J Supercomput 77(3):3062–3076

    Article  Google Scholar 

  68. Pevny T (2016) Loda: lightweight on-line detector of anomalies. Mach Learn 102(2):275–304

    Article  MathSciNet  Google Scholar 

  69. Puggini L, McLoone S (2018) An enhanced variable selection and Isolation Forest based methodology for anomaly detection with OES data. Eng Appl Artif Intell 67:126–135

    Article  Google Scholar 

  70. Qu H, Li Z, Wu J (2020) Integrated learning method for anomaly detection combining KLSH and isolation principles. In: 2020 IEEE congress on evolutionary computation (CEC). IEEE, pp 1–6

    Google Scholar 

  71. Rao GM, Ramesh D (2021) A hybrid and improved isolation forest algorithm for anomaly detection. In: Proceedings of international conference on recent trends in machine learning, IoT, smart cities and applications. Springer, pp 589–598

    Google Scholar 

  72. Riazi M, et al.: Detecting the onset of machine failure using anomaly detection methods. In: International conference on big data analytics and knowledge discovery. Springer, pp 3–12

    Google Scholar 

  73. Saito T, Rehmsmeier M (2015) The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10(3):e0118432

    Article  Google Scholar 

  74. de Santis RB, Costa MA (2020) Extended isolation forests for fault detection in small hydroelectric plants. Sustainability 12(16):6421

    Article  Google Scholar 

  75. Shen Y et al (2016) A novel isolation-based outlier detection method. In: Pacific rim international conference on artificial intelligence. Springer, pp 446–456

    Google Scholar 

  76. Staerman G et al (2019) Functional isolation forest. In: Asian conference on machine learning. PMLR, pp 332–347

    Google Scholar 

  77. Sternby J, Thormarker E, Liljenstam M (2020) Anomaly detection forest

    Google Scholar 

  78. Stojanovic L et al (2016) Big-data-driven anomaly detection in industry (4.0): an approach and a case study. In: 2016 IEEE international conference on big data (big data). IEEE, pp 1647–1652

    Google Scholar 

  79. Sun H, et al (2019) Fast anomaly detection in multiple multi-dimensional data streams. In: 2019 IEEE international conference on big data (Big Data). IEEE, pp 1218–1223

    Google Scholar 

  80. Susto GA, Beghi A, McLoone S (2017) Anomaly detection through on-line isolation forest: an application to plasma etching. In: 2017 28th annual SEMI advanced semiconductor manufacturing conference (ASMC). IEEE, pp 89–94

    Google Scholar 

  81. Tan SC, Ting KM, Liu TF (2011) Fast anomaly detection for streaming data. In: 22nd international joint conference on artificial intelligence

    Google Scholar 

  82. Tan Y, et al (2020) Decay detection of a marine gas turbine with contaminated data based on isolation forest approach. In: Ships and offshore structures, pp 1–11

    Google Scholar 

  83. Ting KM, et al (2013) Mass estimation. In: Machine learning, vol 90, no 1, pp 127–160

    Google Scholar 

  84. Ting KM et al (2010) Mass estimation and its applications. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, pp 989–998

    Google Scholar 

  85. Togbe MU et al (2021) Anomalies detection using isolation in concept-drifting data streams. Computers 10(1):13

    Article  Google Scholar 

  86. Tran PH, Heuchenne C, Thomassey S (2020) An anomaly detection approach based on the combination of LSTM autoencoder and isolation forest for multivariate time series data. In: FLINS 2020: proceedings of the 14th international FLINS conference on robotics and artificial intelligence. World Scientific, pp 18–21

    Google Scholar 

  87. Tsou Y-L, et al (2018) Robust distributed anomaly detection using optimal weighted one-class random forests. In: 2018 IEEE international conference on data mining (ICDM). IEEE, pp 1272–1277

    Google Scholar 

  88. Wang Y-B et al (2019) Separating multi-source partial discharge signals using linear prediction analysis and isolation forest algorithm. IEEE Trans Instrum Meas 69(6):2734–2742

    Article  Google Scholar 

  89. Weber M, et al (2018) Embedded hybrid anomaly detection for automotive CAN communication. In: ERTS 2018: 9th european congress on embedded real time software and systems

    Google Scholar 

  90. Wetzig R, Gulenko A, Schmidt F (2019) Unsupervised anomaly alerting for iot-gateway monitoring using adaptive thresholds and half- space trees. In: 2019 6th international conference on internet of things: systems, management and security (IOTSMS). IEEE, pp 161–168

    Google Scholar 

  91. Wu K, et al (2014) RS-forest: a rapid density estimator for streaming anomaly detection. In: 2014 IEEE international conference on data mining. IEEE, pp 600–609

    Google Scholar 

  92. Wu T, Zhang Y-JA, Tang X (2018) Isolation forest based method for low-quality synchrophasor measurements and early events detection. In: 2018 IEEE international conference on communications, control, and computing technologies for smart grids (SmartGridComm). IEEE, pp 1–7

    Google Scholar 

  93. Xiang H et al (2020) OPHiForest: order preserving hashing based isolation forest for robust and scalable anomaly detection. In: Proceedings of the 29th ACM international conference on information & knowledge management, pp 1655–1664

    Google Scholar 

  94. Yang Q, Singh J, Lee J (2019) Isolation-based feature selection for unsupervised outlier detection. In: Annual conference of the PHM society, vol 11

    Google Scholar 

  95. Yao C et al (2019) Distribution forest: an anomaly detection method based on isolation forest. In: International symposium on advanced parallel processing technologies. Springer, pp 135–147

    Google Scholar 

  96. Yu X, Tang LA, Han J (2009) Filtering and refinement: a two stage approach for efficient and effective anomaly detection. In: 2009 9th IEEE international conference on data mining. IEEE, pp 617–626

    Google Scholar 

  97. Zhang C et al (2018) A novel anomaly detection algorithm based on trident tree. In: International conference on cloud computing. Springer, pp 295–306

    Google Scholar 

  98. Zhang X et al (2017) LSHiForest: a generic framework for fast tree isolation based ensemble anomaly analysis. In: 2017 IEEE 33rd international conference on data engineering (ICDE). IEEE, pp 983–994

    Google Scholar 

  99. Zhang Y et al (2019) Anomaly detection for industry product quality inspection based on Gaussian restricted Boltzmann machine. In: 2019 IEEE international conference on systems, man and cybernetics (SMC). IEEE, pp 1–6

    Google Scholar 

  100. Zhong S et al (2019) A novel unsupervised anomaly detection for gas turbine using isolation forest. In: 2019 IEEE international conference on prognostics and health management (ICPHM). IEEE, pp 1–6

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gian Antonio Susto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Barbariol, T., Chiara, F.D., Marcato, D., Susto, G.A. (2022). A Review of Tree-Based Approaches for Anomaly Detection. In: Tran, K.P. (eds) Control Charts and Machine Learning for Anomaly Detection in Manufacturing. Springer Series in Reliability Engineering. Springer, Cham. https://doi.org/10.1007/978-3-030-83819-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-83819-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-83818-8

  • Online ISBN: 978-3-030-83819-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics