Abstract
Outlier detection has gained considerable interest in several fields of research including various sciences, medical diagnosis, fraud detection, and network intrusion detection. Most existing techniques are either distance based or density based. In this paper, we present an effective reference point based outlier detection technique (RODD) which performs satisfactorily in high dimensional real-world datasets. The technique was evaluated in terms of detection rate and false positive rate over several synthetic and real-world datasets and the performance is excellent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proc of VLDB 1998, USA, pp. 392–403. Morgan Kaufmann, San Francisco (1998)
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proc of ACM SIGMOD on Management of Data, Washington, D.C., pp. 207–216. ACM Press, New York (1993)
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
Ester, M., Kriegel, H.-p., Jörg, S., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc of KDD, pp. 226–231. AAAI Press, Menlo Park (1996)
Hawkins, D.M.: Indentification of outliers. Chapman and Hall, London (1980)
Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection, 3rd edn. John Wiley & Sons, Chichester (1996)
Tan, P., Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison Wesley, London (2005)
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)
Wang, B., Wang, G.-R., Yu, G.: Outlier detection over sliding windows for probabilistic data streams. Journal of Computer Science and Technology 25(3), 389–400 (2010)
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)
Sequeira, K., Zaki, M.: Admit: Anomaly-based data mining for intrusions. In: ACM SIGKDD, pp. 386–395 (2002)
Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE TKDE 18, 145–160 (2006)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: ACM SIGMOD on Management of Data, pp. 386–395 (2000)
Papadimitriou, S., Kitawaga, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: ICDE (2003)
Agrawal, A.: Local subspace based outlier detection. In: Communication in Computer and Information Science, vol. 40, pp. 149–157. Springer, Heidelberg (2009)
Varun, C., Arindam, B., Vipin, K.: Outlier detection - a survey. Technical report, Dept of CSE,University of Minnesota, USA (2007)
Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Signal Processing 83, 2481–2497 (2003)
Pei, Y., Zaiane, O.R., Gao, Y.: An efficient reference-based approach to outlier detection in large datasets. In: Proc of ICDM 2006, USA, pp. 478–487. IEEE, Los Alamitos (2006)
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proc of ACM CSS Workshop on DMAS, Philadelphia, PA, pp. 5–8 (2001)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE TPAMI 24(7), 881–892 (2002)
Lohninger, H.: Teach/Me Data Analysis. Springer, Heidelberg (1999)
Barczynski, R.: System outlier mining (2010), http://sites.google.com/site/rafalba/
Bay, S., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proc of the Ninth ACM SIGKDD, pp. 29–38 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K. (2011). RODD: An Effective Reference-Based Outlier Detection Technique for Large Datasets. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advanced Computing. CCSIT 2011. Communications in Computer and Information Science, vol 133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17881-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-17881-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17880-1
Online ISBN: 978-3-642-17881-8
eBook Packages: Computer ScienceComputer Science (R0)