Advertisement

Research Issues in Outlier Detection

  • N. N. R. Ranga Suri
  • Narasimha Murty M
  • G. Athithan
Chapter
Part of the Intelligent Systems Reference Library book series (ISRL, volume 155)

Abstract

This chapter provides an overview of the outlier detection problem and brings out various research issues connected with this problem. It presents a detailed survey of the available literature on this problem with respect to the research issues identified. Basically, this chapter sets the direction for the material presented in the rest of the chapters of this book.

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: ACM SIGMOD International Conference on Management of Data, Santa Barbara, USA, pp. 37–46 (2001)Google Scholar
  2. 2.
    Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14(2), 211–221 (2005)CrossRefGoogle Scholar
  3. 3.
    Almeida, J.A.S., Barbosa, L.M.S., Pais, A.A.C.C., Formosinho, S.J.: Improving hierarchical cluster analysis: a new method with outlier detection and automatic clustering. Chemom. Intell. Lab. Syst. 87, 208–217 (2007)CrossRefGoogle Scholar
  4. 4.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: 6th European Conference on the Principles of Data Mining and Knowledge Discovery, pp. 15–26 (2002)CrossRefGoogle Scholar
  5. 5.
    Angiulli, F., Pizzuti, C.: Outlier mining in large high-dimensional data sets. IEEE Trans. Knowl. Data Eng. (KDE) 17, 203–215 (2005)CrossRefGoogle Scholar
  6. 6.
    Angiulli, F., Fassetti, F.: Distance-based outlier queries in data streams: the novel task and algorithms. Data Min. Knowl. Discov. 20(2), 290–324 (2010)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Assent, I., Krieger, R., Muller, E., Seidl, T.: Subspace outlier mining in large multimedia databases. In: Dagstuhl Seminar Proceedings on Parallel Universes and Local Patterns (2007)Google Scholar
  8. 8.
    Barnett, V., Lewis, T.: Outliers in Statistical Data. Wiley, New York (1994)zbMATHGoogle Scholar
  9. 9.
    Bay, S.D., Schwabacher, M.: Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 29–38. ACM, Washington, DC, USA (2003)Google Scholar
  10. 10.
    Ben-Gal, I.: Outlier detection. In: Maimon, O., Rockack, L. (eds.) Data Mining and Knowledge Discovery Handbook: A Complete Guide for Practitioners and Researchers, pp. 1–16. Kluwer Academic Publishers (2005)Google Scholar
  11. 11.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)CrossRefGoogle Scholar
  12. 12.
    Berchtold, S., Keim, D., Kreigel, H.P.: The x-tree: an index structure for high-dimensional data. In: 22nd International Conference on Very Large Databases, pp. 28–39 (1996)Google Scholar
  13. 13.
    Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U.: When is nearest neighbor meaningful? In: 7th International Conference on Database Theory, ICDT. Lecture Notes in Computer Science, vol. 1540, pp. 217–235. Springer, Jerusalem, Israel (1999)Google Scholar
  14. 14.
    Bock, H.H.: The classical data situation. In: Analysis of Symbolic Data, pp. 139–152. Springer (2002)Google Scholar
  15. 15.
    Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: SIAM International Conference on Data Mining, Atlanta, Georgia, USA, pp. 243–254 (2008)Google Scholar
  16. 16.
    Breunig, M., Kriegel, H., Ng, R., Sander, J.: Lof: Identifying density-based local outliers. In: ACM SIGMOD International Conference on Management of Data, Dallas, Texas, pp. 93–104 (2000)Google Scholar
  17. 17.
    Ceglar, A., Roddick, J.F., Powers, D.M.W.: CURIO: a fast outlier and outlier cluster detection algorithm for large datasets. In: Ong, K.L., Li, W., Gao, J. (eds.) Second International Workshop on Integrating AI and Data Mining, Conferences in Research and Practice in Information Technology, vol. 84. Australian Computer Society Inc., Gold Coast, Australia (2007)Google Scholar
  18. 18.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection for discrete sequences: a survey. IEEE Trans. Knowl. Data Eng. (TKDE) 24(5), 823–839 (2012)CrossRefGoogle Scholar
  19. 19.
    Das, K., Schneider, J.: Detecting anomalous records in categorical datasets. In: ACM KDD, San Jose, California, pp. 220–229 (2007)Google Scholar
  20. 20.
    Dua, D., Efi, K.T.: UCI machine learning repository (2017). http://archive.ics.uci.edu/ml
  21. 21.
    Duan, L., Xu, L., Liu, Y., Lee, J.: Cluster-based outlier detection. Ann. Oper. Res. 168, 151–168 (2009)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases. In: ACM KDD, Portland, Oregon, pp. 226–231 (1996)Google Scholar
  23. 23.
    Gao, J., Cheng, H., Tan, P.N.: Semi-supervised outlier detection. In: ACM SIGAC Symposium on Applied Computing, pp. 635–636. ACM Press, New York, USA (2006)Google Scholar
  24. 24.
    Ghoting, A., Otey, M.E., Parthasarathy, S.: LOADED: link-based outlier and anomaly detecting in evolving data sets. In: International Conference on Data Mining, pp. 387–390 (2004)Google Scholar
  25. 25.
    Ghoting, A., Parthasarathy, S., Otey, M.: Fast mining of distance-based outliers in high-dimensional datasets. In: SIAM International Conference on Data Mining (SDM), pp. 608–612. SIAM, Bethesda, MA, USA (2006)Google Scholar
  26. 26.
    Gutierrez, J.M.P., Gregori, J.F.: Clustering techniques applied to outlier detection of financial market series using a moving window filtering algorithm. In: Unpublished Working Paper Series, No. 948, European Central Bank. Frankfurt, Germany (2008)Google Scholar
  27. 27.
    Harkins, S., He, H., Williams, G.J., Baxter, R.A.: Outlier detection using replicator neural networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) 4th International Conference on Data Warehousing and Knowledge Discovery (DaWak). LNCS, vol. 2454, pp. 170–180. Springer, Aixen-Provence, France (2002)Google Scholar
  28. 28.
    He, Z., Xu, X., Deng, S.: A fast greedy algorithm for outlier mining. In: Proceedings of Pacific Asia Conference on Knowledge Discovery in Databases (PAKDD), Singapore, pp. 567–576 (2006)Google Scholar
  29. 29.
    He, Z., Xu, X., Deng, S.: Discovering cluster-based local outliers. Patten Recognit. Lett. 24, 1641–1650 (2003)CrossRefGoogle Scholar
  30. 30.
    Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)CrossRefGoogle Scholar
  31. 31.
    Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognit. Lett. 22(6–7), 691–700 (2001)CrossRefGoogle Scholar
  32. 32.
    Jin, W., Tung, A.K.H., Han, J.: Mining top-n local outliers in large databases. In: Proceedings of the seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 293–298. ACM, San Francisco, CA, USA (2001)Google Scholar
  33. 33.
    Knorr, E., Ng, R.: Algorithms for mining distance-based outliers in large data sets. In: 24th International conference on Very Large Databases (VLDB), New York, pp. 392–403 (1998)Google Scholar
  34. 34.
    Knorr, E., Ng, R., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J. Very Large Databases 8(3–4), 237–253 (2000)CrossRefGoogle Scholar
  35. 35.
    Koufakou, A., Ortiz, E., Georgiopoulos, M.: A scalable and efficient outlier detection strategy for categorical data. In: Proceedings of IEEE ICTAI, Patras, Greece, pp. 210–217 (2007)Google Scholar
  36. 36.
    Lazarevic, A., Kumar, V.: Feature bagging for outlier detection. In: ACM KDD, Chicago, USA, pp. 157–166 (2005)Google Scholar
  37. 37.
    Leskovec, J., Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets, 2nd edn. Cambridge University Press (2014)Google Scholar
  38. 38.
    Li, K., Teng, G.: Unsupervised SVM based on p-kernels for anomaly detection. IEEE International Conference on Innovative Computing, Information and Control, Beijing, China, pp. 59–62 (2006)Google Scholar
  39. 39.
    Markou, M., Singh, S.: Novelty detection: a review, Part 2: neural network based approaches. Signal Process. 83(12), 2499–2521 (2003)CrossRefGoogle Scholar
  40. 40.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: Proceedings of the 19th International Conference on Data Engineering, pp. 315–326. IEEE Computer Society, Bangalore, India (2003)Google Scholar
  41. 41.
    Pokrajac, D., Lazarevic, A., Latecki, L.J.: Incremental local outlier detection for data streams. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM, pp. 504–515. IEEE, Honolulu, Hawaii, USA (2007)Google Scholar
  42. 42.
    Ramaswami, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM SIGMOD International Conference on Management of Data, pp. 427–438. ACM Press, New York (2000)Google Scholar
  43. 43.
    Rasheed, F., Alhajj, R.: A framework for periodic outlier pattern detection in time-series sequences. IEEE Trans. Cybern. 44(5), 569–582 (2014)CrossRefGoogle Scholar
  44. 44.
    Salehi, M., Leckie, C., Bezdek, J.C., Vaithianathan, T., Zhang, X.: Fast memory efficient local outlier detection in data streams. In: 33rd IEEE International Conference on Data Engineering. ICDE, pp. 51–52. IEEE, San Diego, CA, USA (2017)Google Scholar
  45. 45.
    Suri, N.N.R.R., Murty, M., Athithan, G.: Data mining techniques for outlier detection. In: Zhang, Q., Segall, R.S., Cao, M. (eds.) Visual Analytics and Interactive Technologies: Data, Text and Web Mining Applications, Chap. 2, pp. 22–38. IGI Global, New York, USA (2011)Google Scholar
  46. 46.
    Tao, Y., Xiao, X., Zhou, S.: Mining distance-based outliers from large databases in any metric space. In: 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 394–403. ACM Press, Philadelphia, PA, USA (2006)Google Scholar
  47. 47.
    Torgo, L., Ribeiro, R.: Predicting outliers. In: Lavrac, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) Principles of Data Mining and Knowledge Discovery. LNAI, vol. 2838, pp. 447–458. Springer (2003)Google Scholar
  48. 48.
    Ye, H., Kitagawa, H., Xiao, J.: Continuous angle-based outlier detection on high-dimensional data streams. In: IDEAS, pp. 162–167. ACM, Yokohama, Japan (2015)Google Scholar
  49. 49.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: An efficient data clustering method for very large databases. In: ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, Montreal, Canada (1996)Google Scholar
  50. 50.
    Zhang, J., Wydrowski, R., Wang, Z., Arrabolu, S.S., Kanazawa, K., Gudalewicz, L., Gao, H., Batoukov, R., Aghajanyan, S., Tran, K.: Mbius: online anomaly detection and diagnosis. In: KDD. El London, UK (2018)Google Scholar
  51. 51.
    Zhang, Y., Yang, S., Wang, Y.: LDBOD: a novel distribution based outlier detector. Pattern Recognit. Lett. 29, 967–976 (2008)CrossRefGoogle Scholar
  52. 52.
    Zhu, X., Goldberg, A.: Introduction to Semi-Supervised Learning. Morgan and Claypool Publishers (2009)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • N. N. R. Ranga Suri
    • 1
  • Narasimha Murty M
    • 2
  • G. Athithan
    • 3
  1. 1.Centre for Artificial Intelligence and Robotics (CAIR)BangaloreIndia
  2. 2.Department of Computer Science and AutomationIndian Institute of Science (IISc)BangaloreIndia
  3. 3.Defence Research and Development Organization (DRDO)New DelhiIndia

Personalised recommendations