Abstract
An unsupervised distance-based outlier detection method that finds the top n outliers of a large and high-dimensional data set D, is presented. The method provides a subset R of the data set, called robust solving set, that contains the top n outliers and can be used to predict if a new unseen object p is an outlier or not by computing the distances of p to only the objects in R. Experimental results show that the prediction accuracy of the robust solving set is comparable with that obtained by using the overall data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 15–26. Springer, Heidelberg (2002)
Arning, A., Aggarwal, C., Raghavan, P.: A linear method for deviation detection in large databases. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD 1996), pp. 164–169 (1996)
Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley & Sons, Chichester (1994)
Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining, KDD 2003 (2003)
Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: Proc. Int. Conf. on Managment of Data, SIGMOD 2000 (2000)
Defense Advanced Research Projects Agency DARPA. Intrusion detection evaluation, In http://www.ll.mit.edu/IST/ideval/index.html
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security, Kluwer, Dordrecht (2002)
Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)
Feng, C., Sutherland, A., King, S., Muggleton, S., Henery, R.: Comparison of machine learning classifiers to statistics and neural networks. In: AI & Stats Conf. 1993 (1993)
Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2001 (2001)
Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proc. Int. Conf. on Very Large Databases (VLDB 1998), pp. 392–403 (1998)
Knorr, E., Ng, R., Tucakov, V.: Distance-based outlier: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)
Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proc. SIAM Int. Conf. on Data Mining, SIAM 2003 (2003)
Lee, W., Stolfo, S.J., Mok, K.W.: Mining audit data to build intrusion detection models. In: Proc. Int. Conf on Knowledge Discovery and Data Mining (KDD 1998), pp. 66–72 (1998)
Mangasarian, L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 25(5), 1–18 (1990)
Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proc. Int. Conf. on Machine Learning, ICML 1998 (1998)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proc. Int. Conf. on Managment of Data (SIGMOD 2000), pp. 427–438 (2000)
Torgo, L., Ribeiro, R.: Predicting outliers. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 447–458. Springer, Heidelberg (2003)
Yamanishi, K., Takeuchi, J.: Discovering outlier filtering rules from unlabeled data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), pp. 389–394 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Angiulli, F., Basta, S., Pizzuti, C. (2004). Improving Prediction of Distance-Based Outliers. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-30214-8_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23357-2
Online ISBN: 978-3-540-30214-8
eBook Packages: Springer Book Archive