RODD: An Effective Reference-Based Outlier Detection Technique for Large Datasets

Bhuyan, Monowar H.; Bhattacharyya, D. K.; Kalita, J. K.

doi:10.1007/978-3-642-17881-8_8

Monowar H. Bhuyan⁴,
D. K. Bhattacharyya⁴ &
J. K. Kalita⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 133))

Included in the following conference series:

International Conference on Computer Science and Information Technology

1770 Accesses
5 Citations

Abstract

Outlier detection has gained considerable interest in several fields of research including various sciences, medical diagnosis, fraud detection, and network intrusion detection. Most existing techniques are either distance based or density based. In this paper, we present an effective reference point based outlier detection technique (RODD) which performs satisfactorily in high dimensional real-world datasets. The technique was evaluated in terms of detection rate and false positive rate over several synthetic and real-world datasets and the performance is excellent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large datasets. In: Proc of VLDB 1998, USA, pp. 392–403. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.N.: Mining association rules between sets of items in large databases. In: Proc of ACM SIGMOD on Management of Data, Washington, D.C., pp. 207–216. ACM Press, New York (1993)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey (1984)
MATH Google Scholar
Ester, M., Kriegel, H.-p., Jörg, S., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc of KDD, pp. 226–231. AAAI Press, Menlo Park (1996)
Google Scholar
Hawkins, D.M.: Indentification of outliers. Chapman and Hall, London (1980)
Book MATH Google Scholar
Rousseeuw, P., Leroy, A.: Robust Regression and Outlier Detection, 3rd edn. John Wiley & Sons, Chichester (1996)
MATH Google Scholar
Tan, P., Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison Wesley, London (2005)
Google Scholar
Hodge, V., Austin, J.: A survey of outlier detection methodologies. Artificial Intelligence Review 22, 85–126 (2004)
Article MATH Google Scholar
Wang, B., Wang, G.-R., Yu, G.: Outlier detection over sliding windows for probabilistic data streams. Journal of Computer Science and Technology 25(3), 389–400 (2010)
Article Google Scholar
Knorr, E.M., Ng, R.T., Tucakov, V.: Distance-based outliers: Algorithms and applications. VLDB Journal 8, 237–253 (2000)
Article Google Scholar
Sequeira, K., Zaki, M.: Admit: Anomaly-based data mining for intrusions. In: ACM SIGKDD, pp. 386–395 (2002)
Google Scholar
Angiulli, F., Basta, S., Pizzuti, C.: Distance-based detection and prediction of outliers. IEEE TKDE 18, 145–160 (2006)
MATH Google Scholar
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: ACM SIGMOD on Management of Data, pp. 386–395 (2000)
Google Scholar
Papadimitriou, S., Kitawaga, H., Gibbons, P.B., Faloutsos, C.: Loci: Fast outlier detection using the local correlation integral. In: ICDE (2003)
Google Scholar
Agrawal, A.: Local subspace based outlier detection. In: Communication in Computer and Information Science, vol. 40, pp. 149–157. Springer, Heidelberg (2009)
Google Scholar
Varun, C., Arindam, B., Vipin, K.: Outlier detection - a survey. Technical report, Dept of CSE,University of Minnesota, USA (2007)
Google Scholar
Markou, M., Singh, S.: Novelty detection: a review-part 1: statistical approaches. Signal Processing 83, 2481–2497 (2003)
Article MATH Google Scholar
Pei, Y., Zaiane, O.R., Gao, Y.: An efficient reference-based approach to outlier detection in large datasets. In: Proc of ICDM 2006, USA, pp. 478–487. IEEE, Los Alamitos (2006)
Google Scholar
Portnoy, L., Eskin, E., Stolfo, S.: Intrusion detection with unlabeled data using clustering. In: Proc of ACM CSS Workshop on DMAS, Philadelphia, PA, pp. 5–8 (2001)
Google Scholar
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: An efficient k-means clustering algorithm: Analysis and implementation. IEEE TPAMI 24(7), 881–892 (2002)
Article MATH Google Scholar
Lohninger, H.: Teach/Me Data Analysis. Springer, Heidelberg (1999)
MATH Google Scholar
Barczynski, R.: System outlier mining (2010), http://sites.google.com/site/rafalba/
Bay, S., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proc of the Ninth ACM SIGKDD, pp. 29–38 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science & Engineering, Tezpur University, Napaam, India
Monowar H. Bhuyan & D. K. Bhattacharyya
Dept. of Computer Science, University of Colorado, CO, 80918, USA
J. K. Kalita

Authors

Monowar H. Bhuyan
View author publications
You can also search for this author in PubMed Google Scholar
D. K. Bhattacharyya
View author publications
You can also search for this author in PubMed Google Scholar
J. K. Kalita
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jackson State University, 39217, Jackson, MS, USA
Natarajan Meghanathan
Deptt. of Electronics and Computer Engg, Indian Institute of Technology, Roorkee, India
Brajesh Kumar Kaushik
Wireilla Net Solutions PTY Ltd, Melbourne, Victoria, Australia
Dhinaharan Nagamalai

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bhuyan, M.H., Bhattacharyya, D.K., Kalita, J.K. (2011). RODD: An Effective Reference-Based Outlier Detection Technique for Large Datasets. In: Meghanathan, N., Kaushik, B.K., Nagamalai, D. (eds) Advanced Computing. CCSIT 2011. Communications in Computer and Information Science, vol 133. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17881-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-17881-8_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17880-1
Online ISBN: 978-3-642-17881-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics