A Novel Isolation-Based Outlier Detection Method

Shen, Yanhui; Liu, Huawen; Wang, Yanxia; Chen, Zhongyu; Sun, Guanghua

doi:10.1007/978-3-319-42911-3_37

A Novel Isolation-Based Outlier Detection Method

Yanhui Shen¹⁵,
Huawen Liu¹⁵,
Yanxia Wang¹⁵,
Zhongyu Chen¹⁵ &
…
Guanghua Sun¹⁵

Conference paper
First Online: 10 August 2016

2721 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9810))

Abstract

Outlier detection is one of the most important tasks in data analysis. It refers to the process of recognizing unusual characteristics which may provide useful insights in helping us to understand the behaviors of data. In the paper, an isolation-based outlier detection method, called Entropy-based Greedy Isolation Tree (EGiTree), is proposed. Unlike other tree-like detection methods, our method exploits a half-baked isolation tree, which is constructed via three entropy-based heuristics, to identify outliers. Specifically, the heuristics are used to guide the selection process of attribute and its split value when constructing the tree. Thus, the outlierness score of each data point is estimated based on the total partition cost of the isolation node in the tree, as well as the path length and complexity of partition. Experiment results on public real-world datasets show that our approach outperforms distanced-based, density-based, subspace-based as well as state-of-the-art isolation-based approaches.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Aggarwal, C.C.: Outlier Analysis. Springer, New York (2013)
Book MATH Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, San Francisco (2000)
MATH Google Scholar
Zhang, J.: Advancements of outlier detection: a survey. ICST Trans. Scalable Inf. Syst. 13(1), 1–26 (2013)
Article Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistical Data, 3rd edn. Wiley, Chichester (1994)
MATH Google Scholar
Barnett, V.: The ordering of multivariate data (with discussion). J. Roy. Stat. Soc. Ser. A 139, 318–354 (1976)
Article Google Scholar
Laurikkala, J., Juhola, M., Kentala, E.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, pp. 20–24 (2000)
Google Scholar
Solberg, H.E., Lahti, A.: Detection of outliers in reference distributions: performance of horn’s algorithm. Clin. Chem. 51(12), 2326–2332 (2005)
Article Google Scholar
Endler, D.: Intrusion detection: applying machine learning to solaris audit data. In: Proceedings of the 14^th Annual Computer Security Applications Conference, p. 268 (1998)
Google Scholar
Fawcett, T., Provost, F.: Activity monitoring: noticing interesting changes in behavior. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 53–62 (1999)
Google Scholar
Kruegel, C., Toth, T., Kirda, E.: Service specific anomaly detection for network intrusion detection. In: Proceedings of the 2002 ACM Symposium on Applied Computing, pp. 201–208 (2002)
Google Scholar
Yamanishi, K., Takeuchi, J.I.: Discovering outlier filtering rules from unlabeled data: combining a supervised learner with an unsupervised learner. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 389–394 (2001)
Google Scholar
Parzen, E.: On the estimation of a probability density function and mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Article MathSciNet MATH Google Scholar
Bishop, C.: Novelty detection and neural network validation. In: Proceedings of IEEE Vision, Image and Signal Processing, vol. 141, pp. 217–222 (1994)
Google Scholar
Tarassenko, L.: Novelty detection for the identification of masses in mammograms. In: Proceedings of the 4th IEEE International Conference on Artificial Neural Networks, vol. 4, pp. 442–447, Cambridge, UK (1995)
Google Scholar
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: Proceedings of 24th International Conference on Very Large Data Bases, VLDB 1998, pp 392–403, New York, NY (1998)
Google Scholar
Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: Proceedings of 25th International Conference on Very Large Data Bases, VLDB 1999, pp. 211–222, Edinburgh, Scotland (1999)
Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. 8(3–4), 237–253 (2000)
Article Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, SIGMOD 2000, pp. 427–438. ACM Press, New York (2000)
Google Scholar
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 813–822. Springer, Heidelberg (2009)
Chapter Google Scholar
Sugiyama, M., Borgwardt, K.M.: Rapid distance-based outlier detection via sampling. In: Proceedings of the Twenty-Seventh Annual Conference on Neural Information Processing Systems, Harrahs and Harveys, Lake Tahoe, 5–10 Dec 2013
Google Scholar
Berchtold, S., Keim, D.A., Kriegel, H.-P.: The X-tree: an index structure for high-dimensional data. In: Proceedings of the 22th International Conference on Very Large Data Bases, pp. 28–39 (1996)
Google Scholar
Beckmann, N., Kriegel, H.-P., Schneider, R., Seeger, B.: The R*tree: an efficient and robust access method for points and rectangles. In: Proceedings of 1990 ACM SIGMOD International Conference on Management of Data, SIGMOD 1990, pp 322–331, Atlantic City, NJ (1990)
Google Scholar
Tang, J., Chen, Z., Fu, AW.-c., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002)
Chapter Google Scholar
Papadimitriou, S., Kitagawa, H., Gibbons, P., Faloutsos, C.: Loci: fast outlier detection using the local correlation integral. In: Proceedings. 19th International Conference on Data Engineering, pp. 315–326 (2003)
Google Scholar
Fan, H., Zaïane, O.R., Foss, A., Wu, J.: A nonparametric outlier detection for effectively discovering Top-N outliers from engineering data. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 557–566. Springer, Heidelberg (2006)
Chapter Google Scholar
Jin, W., Tung, A.K., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006)
Chapter Google Scholar
Keller, F., Muller, E., Bohm, K.: HiCS: high contrast subspaces for density-based outlier ranking. In: IEEE 28th International Conference on Data Engineering, ICDE 2012, pp: 1037–1048 (2012)
Google Scholar
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 831–838. Springer, Heidelberg (2009)
Chapter Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.-H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 3:1–3:39 (2012)
Article Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.-H.: On detecting clustered anomalies using SCiForest. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 274–290. Springer, Heidelberg (2010)
Chapter Google Scholar
Gyorfi, L., van der Meulen, E.C.: Density-free convergence properties of various estimators of entropy. Comput. Stat. Data Anal. 5, 425–436 (1987)
Article MathSciNet MATH Google Scholar
Lichmanm M.: UCI Machine Learning Repository, Irvine, CA: University of California, School of Information and Computer Science (2013). http://archive.ics.uci.edu/ml

Download references

Acknowledgement

This work is partially supported by the NSF of China (No. 61272468, 61272007, 61572443), the NSF of Zhejiang Province (No. LY14F020012, No. LQ13F020026), and the Opening Fund of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial Colleges at Zhejiang Normal University (No. ZSDZZZZXK27).

Author information

Authors and Affiliations

Department of Computer Science, Zhejiang Normal University, Jinhua, 321004, China
Yanhui Shen, Huawen Liu, Yanxia Wang, Zhongyu Chen & Guanghua Sun

Authors

Yanhui Shen
View author publications
You can also search for this author in PubMed Google Scholar
Huawen Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yanxia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhongyu Chen
View author publications
You can also search for this author in PubMed Google Scholar
Guanghua Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanxia Wang .

Editor information

Editors and Affiliations

Cardiff University, Cardiff, United Kingdom
Richard Booth
Southeast University , Nanjing, China
Min-Ling Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shen, Y., Liu, H., Wang, Y., Chen, Z., Sun, G. (2016). A Novel Isolation-Based Outlier Detection Method. In: Booth, R., Zhang, ML. (eds) PRICAI 2016: Trends in Artificial Intelligence. PRICAI 2016. Lecture Notes in Computer Science(), vol 9810. Springer, Cham. https://doi.org/10.1007/978-3-319-42911-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-42911-3_37
Published: 10 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42910-6
Online ISBN: 978-3-319-42911-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics