Skip to main content

A Comparative Study of Outlier Detection Algorithms

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Abstract

Data Mining is the process of extracting interesting information from large sets of data. Outliers are defined as events that occur very infrequently. Detecting outliers before they escalate with potentially catastrophic consequences is very important for various real life applications such as in the field of fraud detection, network robustness analysis, and intrusion detection. This paper presents a comprehensive analysis of three outlier detection methods Extensible Markov Model (EMM), Local Outlier Factor (LOF) and LCS-Mine, where algorithm analysis shows the time complexity analysis and outlier detection accuracy. The experiments conducted with Ozone level Detection, IR video trajectories, and 1999 and 2000 DARPA DDoS datasets demonstrate that EMM outperforms both LOF and LSC-Mine in both time and outlier detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elliott, J.: Distributed Denial of Service Attacks and the Zombie Ant Effect. IT Professional 2(2), 55–57 (2000)

    Article  Google Scholar 

  2. Zhang, Z.(Mark).: Mining Surveillance Video for Independent Motion Detection. In: Second IEEE International Conference on Data Mining (ICDM 2002), p. 741 (2002)

    Google Scholar 

  3. Huang, J., Meng, Y., Dunham, M.H.: Extensible Markov Model. In: Proceedings IEEE ICDM Conference, November 2004, pp. 371–374 (2004)

    Google Scholar 

  4. Narayan, U., Bhat, Miller, G.K.: Elements of Applied Stochastic Processes, 3rd edn. John Wiley & sons, Chichester (2002)

    MATH  Google Scholar 

  5. Breunig, M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of ACM SIGMOD 2000 Int. Conf. On Management of Data (SIGMOD 2000), Dallas, TX, pp. 93–104 (2000)

    Google Scholar 

  6. Agyemang, M., Ezeife, C.I.: Lsc-mine: Algorithm for mining local outliers. In: Proceedings of the 15th Information Resource Management Association (IRMA) International Conference, New Orleans, May 2004, pp. 5–8 (2004)

    Google Scholar 

  7. Grubbs, F.E.: Procedures for Detecting Outlying Observations in Samples. Technometrics 11, 1–21 (1969)

    Article  Google Scholar 

  8. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Datasets. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas, TX, pp. 427–438 (2000)

    Google Scholar 

  9. Nairac, A., Townsend, N., Carr, R., King, S., Cowley, P., Tarassenko, L.: A System for the Analysis of Jet System Vibration Data. Integrated ComputerAided Engineering 6(1), 53–65 (1999)

    Google Scholar 

  10. Bishop, C.M.: Novelty detection & Neural Network validation. In: Proceedings of the IEE Conference on Vision, Image and Signal Processing, pp. 217–222 (1994)

    Google Scholar 

  11. Japkowicz, N., Myers, C., Gluck, M.A.: A Novelty Detection Approach to Classification. In: Proceedings of the 14th International Conference on Artificial Intelligence (IJCAI 1995), pp. 518–523 (1995)

    Google Scholar 

  12. Caudell, T.P., Newman, D.S.: An Adaptive Resonance Architecture to Define Normality and Detect Novelties in Time Series and Databases. In: IEEE World Congress on Neural Networks, Portland, Oregon, pp. 166–176 (1993)

    Google Scholar 

  13. Carpenter, G., Grossberg, S.: A Massively Parallel Architecture for a Self Organizing Neural Pattern Recognition Machine. Computer Vision, Graphics, and Image Processing 37, 54–115 (1987)

    Article  MATH  Google Scholar 

  14. John, G.H.: Robust Decision Trees: Removing Outliers from Databases. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 174–179. AAAI Press, Menlo Park (1995)

    Google Scholar 

  15. Skalak, D.B., Rissland, E.L.: Inductive Learning in a Mixed Paradigm Setting. In: Proceedings of the Eighth National Conference on Artificial Intelligence, Boston, MA, pp. 840–847 (1990)

    Google Scholar 

  16. Lane, T., Brodley, C.E.: Applications of Machine Learning to Anomaly Detection. In: Adey, R.A., Rzevski, G., Teti, T. (eds.) Applications of Artificial Intelligence in Engineering X11, pp. 113–114. Comput. Mech. Publications, Southampton (1997a)

    Google Scholar 

  17. Lane, T., Brodley, C.E.: Sequence matching and learning in anomaly detection for computer security. In: AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pp. 43–49. AAAI Press, Menlo Park (1997b)

    Google Scholar 

  18. Meng, Y., Dunham, M.H., Marchetti, F.M., Huang, J.: Rare Event Detection in A Spatiotemporal Environment. In: Proc. 2nd IEEE Int’l Conf. Granular Computing (GrC 2006), Atlanta, GA, May 10-12 (2006)

    Google Scholar 

  19. Zhange, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. ACM SIGMOD Conference, pp. 103–114 (1996)

    Google Scholar 

  20. R.: The R Project for Statistical Computing (2008), http://www.r-project.org/

  21. Edgar Acuna, and members of the CASTLE group at UPR-Mayaguez, Puerto Rico.: dprep: Data preprocessing and visualization functions for classification (2008), http://cran.r-project.org/web/packages/dprep/index.html

  22. Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the clas-sifier accuracy. In: Banks, D., House, L., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering and Data Mining Applications, pp. 639–648. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  23. Weka3.: Data Mining Software in Java (2008), http://www.cs.waikato.ac.nz/ml/weka/

  24. Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL (August 2005)

    Google Scholar 

  25. Lozano, E., Acuña, E.: Parallel Algorithms for Distance-Based and Density-Based Outliers. In: ICDM (2005)

    Google Scholar 

  26. Raychaudhuri, S., Stuart, J.M., Altman, R.B.: Principal components analysis to summarize microarray experiments: application to sporulation time series. In: Pacific Symposium on Biocomputing (2000)

    Google Scholar 

  27. Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowledge and Information Systems 14(3) (2008)

    Google Scholar 

  28. Blake, C., Mertz, C.: UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1998), http://www.ics.uci.edu/mlearn/MLRepository.html

  29. Latecki, L.J., Miezianko, R., Megalooikonomou, V., Pokrajac, D.: Using Spatiotemporal Blocks to Reduce the Uncertainty in Detecting and Tracking Moving Objects in Video. International Journal of Intelligent Systems Technologies and Applications 1(3-4), 376–392 (2006)

    Article  Google Scholar 

  30. Lazarević, A.: IR video trajectories (2008), www.cs.umn.edu/~aleks/inclof

  31. MIT Lincoln Laboratory.: DARPA Intrusion Detection Evaluation (2008), http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/index.html

  32. Tcptrace.: tcptrace is a tool for analysis of TCP dump files (2008), http://jarok.cs.ohiou.edu/software/tcptrace/

  33. Isaksson, C., Meng, Y., Dunham, M.H.: Risk Leveling of Network Traffic Anomalies. Int’l Journal of Computer Science and Network Security (IJCSNS) 6(6) (2006)

    Google Scholar 

  34. Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. To Appear in ACM Computing Surveys (2009)

    Google Scholar 

  35. Agyemang, M., Barker, K., Alhajj, R.: A Comprehensive Survey of Numeric and Symbolic Outlier Mining Techniques. Intelligent Data Analysis 10(6), 521–538 (2006)

    Google Scholar 

  36. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22(2004), 85–126

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Isaksson, C., Dunham, M.H. (2009). A Comparative Study of Outlier Detection Algorithms. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03070-3_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03069-7

  • Online ISBN: 978-3-642-03070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics