Advertisement

Data Mining and Knowledge Discovery

, Volume 31, Issue 2, pp 548–572 | Cite as

Outlier detection using binary decision diagrams

  • Takuro Kutsuna
  • Akihiro Yamamoto
Article
  • 629 Downloads

Abstract

We propose a novel method for outlier detection using binary decision diagrams. Leave-one-out density is proposed as a new measure for detecting outliers, which is defined as a ratio of the number of data elements inside a region to the volume of the region after a focused datum is removed. We show that leave-one-out density can be evaluated very efficiently on a set of regions around each datum in a given dataset by using binary decision diagrams. The time complexity of the proposed method is nearly linear with respect to the size of the dataset, while the outlier detection accuracy is still comparable to that of other methods. Experimental results show the effectiveness of the proposed method.

Keywords

Outlier detection Binary decision diagram Leave-one-out-density 

References

  1. Aryal S, Ting KM, Wells JR, Washio T (2014) Improving iforest with relative mass. In: Tseng VS, Ho TB, Zhou ZH, Chen AL, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, New York, pp 510–521CrossRefGoogle Scholar
  2. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 24 June 2014
  3. Bay SD (2003) Orca: a program for mining distance-based outliers. http://www.stephenbay.net/orca. Accessed 6 Jul 2015
  4. Bay SD, Schwabacher M (2003) Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining (KDD ’03), ACM, New York, pp 29–38Google Scholar
  5. Beckmann N, Kriegel H, Schneider R, Seeger B (1990) The R*-tree: an efficient and robust access method for points and rectangles. SIGMOD Rec 19(2):322–331CrossRefGoogle Scholar
  6. Blackard JA, Dean DJ (1999) Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables. Comput Electron Agric 24(3):131–151CrossRefGoogle Scholar
  7. Brace K, Rudell R, Bryant R (1990) Efficient implementation of a BDD package. In: The 27th ACM/IEEE design automation conference, pp 40–45Google Scholar
  8. Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data (SIGMOD ’00), ACM, New York, pp 93–104Google Scholar
  9. Bryant R (1986) Graph-based algorithms for boolean function manipulation. IEEE Trans Comput 35(8):677–691CrossRefMATHGoogle Scholar
  10. Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15:1–15:58CrossRefGoogle Scholar
  11. Fawcett T (2006) An introduction to ROC analysis. Pattern Recognit Lett 27(8):861–874MathSciNetCrossRefGoogle Scholar
  12. Ghoting A, Parthasarathy S, Otey ME (2008) Fast mining of distance-based outliers in high-dimensional datasets. Data Min Knowl Discov 16(3):349–364MathSciNetCrossRefGoogle Scholar
  13. Karatzoglou A, Smola A, Hornik K, Zeileis A (2004) kernlab-an S4 package for kernel methods in R. J Stat Softw 11(9):1–20CrossRefGoogle Scholar
  14. Kutsuna T (2010) A binary decision diagram-based one-class classifier. In: Proceedings of the 10th IEEE international conference on data mining (ICDM ’10), pp 284–293Google Scholar
  15. Kutsuna T, Yamamoto A (2014a) Outlier detection based on leave-one-out density using binary decision diagrams. In: Tseng V, Ho T, Zhou ZH, Chen A, Kao HY (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science. Springer, New York, pp 486–497CrossRefGoogle Scholar
  16. Kutsuna T, Yamamoto A (2014b) A parameter-free approach for one-class classification using binary decision diagrams. Intell Data Anal 18(5):889–910Google Scholar
  17. Lazarevic A, Kumar V (2005) Feature bagging for outlier detection. In: Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (KDD ’05), ACM, New York, pp 157–166Google Scholar
  18. Lazarevic A, Ertoz L, Kumar V, Ozgur A, Srivastava J (2003) A comparative study of anomaly detection schemes in network intrusion detection. In: Proceedings of the 2003 SIAM conference on data miningGoogle Scholar
  19. Liu FT (2009) Isolationforest: Isolation forest. http://sourceforge.net/projects/iforest. Accessed 11 November 2014. R package version 0.0-25
  20. Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: Proceedings of the 8th IEEE international conference on data mining (ICDM ’08), pp 413–422Google Scholar
  21. Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci (Calcutta) 2:49–55MATHGoogle Scholar
  22. Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decis Support Syst 62:22–31CrossRefGoogle Scholar
  23. R Core Team (2014) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org. Accessed 20 Jan 2016
  24. Schölkopf B, Platt J, Shawe-Taylor J, Smola A, Williamson R (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471CrossRefMATHGoogle Scholar
  25. Somenzi F (1999) Calculational system design. In: Broy M, Steninbruggen R (eds) Binary decision diagrams, vol 173. IOS Press, Amsterdam, pp 303–366Google Scholar
  26. Somenzi F (2012) CUDD: CU decision diagram package. http://vlsi.colorado.edu/~fabio/CUDD. Accessed 24 June 2014
  27. Torgo L (2010) Data mining with R, learning with case studies. Chapman and Hall/CRC, Boca RatonCrossRefGoogle Scholar
  28. Yamanishi K, Takeuchi JI, Williams G, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. Data Min Knowl Discov 8(3):275–300MathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  1. 1.Toyota Central R&D Labs. Inc.NagakuteJapan
  2. 2.Department of Intelligence Science and Technology, Graduate School of InformaticsKyoto UniversityKyotoJapan

Personalised recommendations