Skip to main content

A Novel Isolation-Based Outlier Detection Method

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Abstract

Outlier detection is one of the most important tasks in data analysis. It refers to the process of recognizing unusual characteristics which may provide useful insights in helping us to understand the behaviors of data. In the paper, an isolation-based outlier detection method, called Entropy-based Greedy Isolation Tree (EGiTree), is proposed. Unlike other tree-like detection methods, our method exploits a half-baked isolation tree, which is constructed via three entropy-based heuristics, to identify outliers. Specifically, the heuristics are used to guide the selection process of attribute and its split value when constructing the tree. Thus, the outlierness score of each data point is estimated based on the total partition cost of the isolation node in the tree, as well as the path length and complexity of partition. Experiment results on public real-world datasets show that our approach outperforms distanced-based, density-based, subspace-based as well as state-of-the-art isolation-based approaches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)

    Book  MATH  Google Scholar 

  2. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, San Francisco (2000)

    MATH  Google Scholar 

  3. Zhang, J.: Advancements of outlier detection: a survey. ICST Trans. Scalable Inf. Syst. 13(1), 1–26 (2013)

    Article  Google Scholar 

  4. Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Chichester (1994)

    MATH  Google Scholar 

  5. Barnett, V.: The ordering of multivariate data (with discussion). J. Roy. Stat. Soc. Ser. A 139, 318–354 (1976)

    Article  Google Scholar 

  6. Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, pp. 20–24 (2000)

    Google Scholar 

  7. Solberg, H.E., Lahti, A.: Detection of outliers in reference distributions: performance of horn’s algorithm. Clin. Chem. 51(12), 2326–2332 (2005)

    Article  Google Scholar 

  8. Endler, D.: Intrusion detection: applying machine learning to solaris audit data. In: Proceedings of the 14th Annual Computer Security Applications Conference, p. 268 (1998)

    Google Scholar 

  9. Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 53–62 (1999)

    Google Scholar 

  10. Kruegel, C., Toth, T., Kirda, E.: Service specific anomaly detection for network intrusion detection. In: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 201–208 (2002)

    Google Scholar 

  11. Yamanishi, K., Takeuchi, J.I.: Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–394 (2001)

    Google Scholar 

  12. Parzen, E.: On the estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)

    Article  MathSciNet  MATH  Google Scholar 

  13. Bishop, C.: Novelty detection and neural network validation. In: Proceedings of IEEE Vision, Image and Signal Processing, vol. 141, pp. 217–222 (1994)

    Google Scholar 

  14. Tarassenko, L.: Novelty detection for the identification of masses in mammograms. In: Proceedings of the 4th IEEE International Conference on Artificial Neural Networks, vol. 4, pp. 442–447, Cambridge, UK (1995)

    Google Scholar 

  15. Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: Proceedings of 24th International Conference on Very Large Data Bases, VLDB 1998, pp 392–403, New York, NY (1998)

    Google Scholar 

  16. Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: Proceedings of 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 211–222, Edinburgh, Scotland (1999)

    Google Scholar 

  17. Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8(3–4), 237–253 (2000)

    Article  Google Scholar 

  18. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 427–438. ACM Press, New York (2000)

    Google Scholar 

  19. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 813–822. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Proceedings of the Twenty-Seventh Annual Conference on Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, 5–10 Dec 2013

    Google Scholar 

  21. Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases, pp. 28–39 (1996)

    Google Scholar 

  22. Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*tree: an efficient and robust access method for points and rectangles. In: Proceedings of 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD 1990, pp 322–331, Atlantic City, NJ (1990)

    Google Scholar 

  23. Tang, J., Chen, Z., Fu, AW.-c., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  24. Papadimitriou, S., Kitagawa, H., Gibbons, P., Faloutsos, C.: Loci: fast outlier detection using the local correlation integral. In: Proceedings. 19th International Conference on Data Engineering, pp. 315–326 (2003)

    Google Scholar 

  25. Fan, H., Zaïane, O.R., Foss, A., Wu, J.: A nonparametric outlier detection for effectively discovering Top-N outliers from engineering data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 557–566. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  26. Jin, W., Tung, A.K., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on Data Engineering, ICDE 2012, pp: 1037–1048 (2012)

    Google Scholar 

  28. Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 831–838. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  29. Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 3:1–3:39 (2012)

    Article  Google Scholar 

  30. Liu, F.T., Ting, K.M., Zhou, Z.-H.: On detecting clustered anomalies using SCiForest. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 274–290. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  31. Gyorfi, L., van der Meulen, E.C.: Density-free convergence properties of various estimators of entropy. Comput. Stat. Data Anal. 5, 425–436 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  32. Lichmanm M.: UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science (2013). http://archive.ics.uci.edu/ml

Download references

Acknowledgement

This work is partially supported by the NSF of China (No. 61272468, 61272007, 61572443), the NSF of Zhejiang Province (No. LY14F020012, No. LQ13F020026), and the Opening Fund of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial Colleges at Zhejiang Normal University (No. ZSDZZZZXK27).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanxia Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Shen, Y., Liu, H., Wang, Y., Chen, Z., Sun, G. (2016). A Novel Isolation-Based Outlier Detection Method. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42911-3_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42910-6

  • Online ISBN: 978-3-319-42911-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics