Skip to main content

Improving Prediction of Distance-Based Outliers

  • Conference paper
Discovery Science (DS 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3245))

Included in the following conference series:

  • 889 Accesses

Abstract

An unsupervised distance-based outlier detection method that finds the top n outliers of a large and high-dimensional data set D, is presented. The method provides a subset R of the data set, called robust solving set, that contains the top n outliers and can be used to predict if a new unseen object p is an outlier or not by computing the distances of p to only the objects in R. Experimental results show that the prediction accuracy of the robust solving set is comparable with that obtained by using the overall data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 15–26. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Arning, A., Aggarwal, C., Raghavan, P.: A linear method for deviation detection in large databases. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining (KDD 1996), pp. 164–169 (1996)

    Google Scholar 

  3. Barnett, V., Lewis, T.: Outliers in Statistical Data. John Wiley & Sons, Chichester (1994)

    MATH  Google Scholar 

  4. Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proc. Int. Conf. on Knowledge Discovery and Data Mining, KDD 2003 (2003)

    Google Scholar 

  5. Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: Lof: Identifying density-based local outliers. In: Proc. Int. Conf. on Managment of Data, SIGMOD 2000 (2000)

    Google Scholar 

  6. Defense Advanced Research Projects Agency DARPA. Intrusion detection evaluation, In http://www.ll.mit.edu/IST/ideval/index.html

  7. Eskin, E., Arnold, A., Prerau, M., Portnoy, L., Stolfo, S.: A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. In: Applications of Data Mining in Computer Security, Kluwer, Dordrecht (2002)

    Google Scholar 

  8. Fawcett, T., Provost, F.: Adaptive fraud detection. Data Mining and Knowledge Discovery 1, 291–316 (1997)

    Article  Google Scholar 

  9. Feng, C., Sutherland, A., King, S., Muggleton, S., Henery, R.: Comparison of machine learning classifiers to statistics and neural networks. In: AI & Stats Conf. 1993 (1993)

    Google Scholar 

  10. Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD 2001 (2001)

    Google Scholar 

  11. Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large datasets. In: Proc. Int. Conf. on Very Large Databases (VLDB 1998), pp. 392–403 (1998)

    Google Scholar 

  12. Knorr, E., Ng, R., Tucakov, V.: Distance-based outlier: algorithms and applications. VLDB Journal 8(3-4), 237–253 (2000)

    Article  Google Scholar 

  13. Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., Srivastava, J.: A comparative study of anomaly detection schemes in network intrusion detection. In: Proc. SIAM Int. Conf. on Data Mining, SIAM 2003 (2003)

    Google Scholar 

  14. Lee, W., Stolfo, S.J., Mok, K.W.: Mining audit data to build intrusion detection models. In: Proc. Int. Conf on Knowledge Discovery and Data Mining (KDD 1998), pp. 66–72 (1998)

    Google Scholar 

  15. Mangasarian, L., Wolberg, W.H.: Cancer diagnosis via linear programming. SIAM News 25(5), 1–18 (1990)

    Google Scholar 

  16. Provost, F., Fawcett, T., Kohavi, R.: The case against accuracy estimation for comparing induction algorithms. In: Proc. Int. Conf. on Machine Learning, ICML 1998 (1998)

    Google Scholar 

  17. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: Proc. Int. Conf. on Managment of Data (SIGMOD 2000), pp. 427–438 (2000)

    Google Scholar 

  18. Torgo, L., Ribeiro, R.: Predicting outliers. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 447–458. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  19. Yamanishi, K., Takeuchi, J.: Discovering outlier filtering rules from unlabeled data. In: Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2001), pp. 389–394 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Angiulli, F., Basta, S., Pizzuti, C. (2004). Improving Prediction of Distance-Based Outliers. In: Suzuki, E., Arikawa, S. (eds) Discovery Science. DS 2004. Lecture Notes in Computer Science(), vol 3245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30214-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-30214-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-23357-2

  • Online ISBN: 978-3-540-30214-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics