Abstract
Data Mining is the process of extracting interesting information from large sets of data. Outliers are defined as events that occur very infrequently. Detecting outliers before they escalate with potentially catastrophic consequences is very important for various real life applications such as in the field of fraud detection, network robustness analysis, and intrusion detection. This paper presents a comprehensive analysis of three outlier detection methods Extensible Markov Model (EMM), Local Outlier Factor (LOF) and LCS-Mine, where algorithm analysis shows the time complexity analysis and outlier detection accuracy. The experiments conducted with Ozone level Detection, IR video trajectories, and 1999 and 2000 DARPA DDoS datasets demonstrate that EMM outperforms both LOF and LSC-Mine in both time and outlier detection accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elliott, J.: Distributed Denial of Service Attacks and the Zombie Ant Effect. IT Professional 2(2), 55–57 (2000)
Zhang, Z.(Mark).: Mining Surveillance Video for Independent Motion Detection. In: Second IEEE International Conference on Data Mining (ICDM 2002), p. 741 (2002)
Huang, J., Meng, Y., Dunham, M.H.: Extensible Markov Model. In: Proceedings IEEE ICDM Conference, November 2004, pp. 371–374 (2004)
Narayan, U., Bhat, Miller, G.K.: Elements of Applied Stochastic Processes, 3rd edn. John Wiley & sons, Chichester (2002)
Breunig, M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: Identifying Density-Based Local Outliers. In: Proc. of ACM SIGMOD 2000 Int. Conf. On Management of Data (SIGMOD 2000), Dallas, TX, pp. 93–104 (2000)
Agyemang, M., Ezeife, C.I.: Lsc-mine: Algorithm for mining local outliers. In: Proceedings of the 15th Information Resource Management Association (IRMA) International Conference, New Orleans, May 2004, pp. 5–8 (2004)
Grubbs, F.E.: Procedures for Detecting Outlying Observations in Samples. Technometrics 11, 1–21 (1969)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Datasets. In: Proceedings of the ACM SIGMOD Conference on Management of Data, Dallas, TX, pp. 427–438 (2000)
Nairac, A., Townsend, N., Carr, R., King, S., Cowley, P., Tarassenko, L.: A System for the Analysis of Jet System Vibration Data. Integrated ComputerAided Engineering 6(1), 53–65 (1999)
Bishop, C.M.: Novelty detection & Neural Network validation. In: Proceedings of the IEE Conference on Vision, Image and Signal Processing, pp. 217–222 (1994)
Japkowicz, N., Myers, C., Gluck, M.A.: A Novelty Detection Approach to Classification. In: Proceedings of the 14th International Conference on Artificial Intelligence (IJCAI 1995), pp. 518–523 (1995)
Caudell, T.P., Newman, D.S.: An Adaptive Resonance Architecture to Define Normality and Detect Novelties in Time Series and Databases. In: IEEE World Congress on Neural Networks, Portland, Oregon, pp. 166–176 (1993)
Carpenter, G., Grossberg, S.: A Massively Parallel Architecture for a Self Organizing Neural Pattern Recognition Machine. Computer Vision, Graphics, and Image Processing 37, 54–115 (1987)
John, G.H.: Robust Decision Trees: Removing Outliers from Databases. In: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 174–179. AAAI Press, Menlo Park (1995)
Skalak, D.B., Rissland, E.L.: Inductive Learning in a Mixed Paradigm Setting. In: Proceedings of the Eighth National Conference on Artificial Intelligence, Boston, MA, pp. 840–847 (1990)
Lane, T., Brodley, C.E.: Applications of Machine Learning to Anomaly Detection. In: Adey, R.A., Rzevski, G., Teti, T. (eds.) Applications of Artificial Intelligence in Engineering X11, pp. 113–114. Comput. Mech. Publications, Southampton (1997a)
Lane, T., Brodley, C.E.: Sequence matching and learning in anomaly detection for computer security. In: AAAI Workshop: AI Approaches to Fraud Detection and Risk Management, pp. 43–49. AAAI Press, Menlo Park (1997b)
Meng, Y., Dunham, M.H., Marchetti, F.M., Huang, J.: Rare Event Detection in A Spatiotemporal Environment. In: Proc. 2nd IEEE Int’l Conf. Granular Computing (GrC 2006), Atlanta, GA, May 10-12 (2006)
Zhange, T., Ramakrishnan, R., Livny, M.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. ACM SIGMOD Conference, pp. 103–114 (1996)
R.: The R Project for Statistical Computing (2008), http://www.r-project.org/
Edgar Acuna, and members of the CASTLE group at UPR-Mayaguez, Puerto Rico.: dprep: Data preprocessing and visualization functions for classification (2008), http://cran.r-project.org/web/packages/dprep/index.html
Acuna, E., Rodriguez, C.: The treatment of missing values and its effect in the clas-sifier accuracy. In: Banks, D., House, L., McMorris, F.R., Arabie, P., Gaul, W. (eds.) Classification, Clustering and Data Mining Applications, pp. 639–648. Springer, Heidelberg (2004)
Weka3.: Data Mining Software in Java (2008), http://www.cs.waikato.ac.nz/ml/weka/
Lazarevic, A., Kumar, V.: Feature Bagging for Outlier Detection. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL (August 2005)
Lozano, E., Acuña, E.: Parallel Algorithms for Distance-Based and Density-Based Outliers. In: ICDM (2005)
Raychaudhuri, S., Stuart, J.M., Altman, R.B.: Principal components analysis to summarize microarray experiments: application to sporulation time series. In: Pacific Symposium on Biocomputing (2000)
Forecasting skewed biased stochastic ozone days: analyses, solutions and beyond. Knowledge and Information Systems 14(3) (2008)
Blake, C., Mertz, C.: UCI repository of machine learning databases. University of California, Department of Information and Computer Science, Irvine, CA (1998), http://www.ics.uci.edu/mlearn/MLRepository.html
Latecki, L.J., Miezianko, R., Megalooikonomou, V., Pokrajac, D.: Using Spatiotemporal Blocks to Reduce the Uncertainty in Detecting and Tracking Moving Objects in Video. International Journal of Intelligent Systems Technologies and Applications 1(3-4), 376–392 (2006)
Lazarević, A.: IR video trajectories (2008), www.cs.umn.edu/~aleks/inclof
MIT Lincoln Laboratory.: DARPA Intrusion Detection Evaluation (2008), http://www.ll.mit.edu/mission/communications/ist/corpora/ideval/index.html
Tcptrace.: tcptrace is a tool for analysis of TCP dump files (2008), http://jarok.cs.ohiou.edu/software/tcptrace/
Isaksson, C., Meng, Y., Dunham, M.H.: Risk Leveling of Network Traffic Anomalies. Int’l Journal of Computer Science and Network Security (IJCSNS) 6(6) (2006)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. To Appear in ACM Computing Surveys (2009)
Agyemang, M., Barker, K., Alhajj, R.: A Comprehensive Survey of Numeric and Symbolic Outlier Mining Techniques. Intelligent Data Analysis 10(6), 521–538 (2006)
Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22(2004), 85–126
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Isaksson, C., Dunham, M.H. (2009). A Comparative Study of Outlier Detection Algorithms. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-03070-3_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03069-7
Online ISBN: 978-3-642-03070-3
eBook Packages: Computer ScienceComputer Science (R0)