Improving Prediction of Distance-Based Outliers

Angiulli, Fabrizio; Basta, Stefano; Pizzuti, Clara

doi:10.1007/978-3-540-30214-8_7

Fabrizio Angiulli²⁰,
Stefano Basta²⁰ &
Clara Pizzuti²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

International Conference on Discovery Science

889 Accesses

Abstract

An unsupervised distance-based outlier detection method that finds the top n outliers of a large and high-dimensional data set D, is presented. The method provides a subset R of the data set, called robust solving set, that contains the top n outliers and can be used to predict if a new unseen object p is an outlier or not by computing the distances of p to only the objects in R. Experimental results show that the prediction accuracy of the robust solving set is comparable with that obtained by using the overall data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 15–26. Springer, Heidelberg (2002)
Chapter Google Scholar
Arning, A., Aggarwal, C., Raghavan, P.: A linear method for deviation detection in large databases. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD 1996), pp. 164–169 (1996)
Google Scholar
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley & Sons, Chichester (1994)
MATH Google Scholar
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining, KDD 2003 (2003)
Google Scholar
Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: Proc. Int. Conf. on Managment of Data, SIGMOD 2000 (2000)
Google Scholar
Defense Advanced Research Projects Agency DARPA. Intrusion detection evaluation, In http://www.ll.mit.edu/IST/ideval/index.html
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security, Kluwer, Dordrecht (2002)
Google Scholar
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)
Article Google Scholar
Feng, C., Sutherland, A., King, S., Muggleton, S., Henery, R.: Comparison of machine learning classifiers to statistics and neural networks. In: AI & Stats Conf. 1993 (1993)
Google Scholar
Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2001 (2001)
Google Scholar
Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proc. Int. Conf. on Very Large Databases (VLDB 1998), pp. 392–403 (1998)
Google Scholar
Knorr, E., Ng, R., Tucakov, V.: Distance-based outlier: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)
Article Google Scholar
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proc. SIAM Int. Conf. on Data Mining, SIAM 2003 (2003)
Google Scholar
Lee, W., Stolfo, S.J., Mok, K.W.: Mining audit data to build intrusion detection models. In: Proc. Int. Conf on Knowledge Discovery and Data Mining (KDD 1998), pp. 66–72 (1998)
Google Scholar
Mangasarian, L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 25(5), 1–18 (1990)
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proc. Int. Conf. on Machine Learning, ICML 1998 (1998)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proc. Int. Conf. on Managment of Data (SIGMOD 2000), pp. 427–438 (2000)
Google Scholar
Torgo, L., Ribeiro, R.: Predicting outliers. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 447–458. Springer, Heidelberg (2003)
Chapter Google Scholar
Yamanishi, K., Takeuchi, J.: Discovering outlier filtering rules from unlabeled data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), pp. 389–394 (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

ICAR-CNR, Università della Calabria, Via Pietro Bucci, 41C, 87036, Rende, (CS), Italy
Fabrizio Angiulli, Stefano Basta & Clara Pizzuti

Authors

Fabrizio Angiulli
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Basta
View author publications
You can also search for this author in PubMed Google Scholar
Clara Pizzuti
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Graduate School of Information Science and Electrical Engineering, Kyushu University, 744 Motooka, Nishi, 819-0395, Fukuoka, Japan
Einoshin Suzuki
Kyushu University, 6–10–1 Hakozaki Higashi-ku, 812–8581, Fukuoka, Japan
Setsuo Arikawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Angiulli, F., Basta, S., Pizzuti, C. (2004). Improving Prediction of Distance-Based Outliers. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-30214-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics