Advertisement

A Genetic Algorithm Based Technique for Outlier Detection with Fast Convergence

  • Xiaodong Zhu
  • Ji ZhangEmail author
  • Zewen Hu
  • Hongzhou Li
  • Liang Chang
  • Youwen Zhu
  • Jerry Chun-Wei Lin
  • Yongrui Qin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11323)

Abstract

In this paper, we study the problem of subspace outlier detection in high dimensional data space and propose a new genetic algorithm-based technique to identify outliers embedded in subspaces. The existing technique, mainly using genetic algorithm (GA) to carry out the subspace search, is generally slow due to its expensive fitness evaluation and long solution encoding scheme. In this paper, we propose a novel technique to improve the performance of the existing GA-based outlier detection method using a bit freezing approach to achieve a faster convergence. Through freezing converged bits in the solution encoding strings, this innovative approach can contribute to fast crossover and mutation operations and achieve an early stop of the GA that leads to more accurate approximation of fitness function. This research work can contribute to the development of a more efficient search method for detecting subspace outliers. The experimental results demonstrate the improved efficiency of our technique compared with the existing method.

Notes

Acknowledgment

This research was partially supported by National Key Research and Development Program of China (No. 2017YFB0802300), the National Natural Science Foundation of China (No. 61602240), Guangxi Key Laboratory of Trusted Software (No. kx201615) and Capacity Building Project for Young University Staff in Guangxi Province, Department of Education, Guangxi Province (No. ky2016YB149).

References

  1. 1.
    Aggarwal, C.C., Yu, P.S.: An effective and efficient algorithm for high-dimensional outlier detection. VLDB J. 14, 211–221 (2005)CrossRefGoogle Scholar
  2. 2.
    Aggarwal, C.C.: On abnormality detection in spuriously populated data streams. In: SDM 2005, Newport Beach, CA (2005)Google Scholar
  3. 3.
    Aggarwal, C.C., Yu, P.S.: Outlier detection in high dimensional data. In: SIGMOD 2001, Santa Barbara, California, USA, pp. 37–46 (2001)Google Scholar
  4. 4.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003, Berlin, Germany, pp. 81–92 (2003)Google Scholar
  5. 5.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB 2004, Toronto, Canada, pp. 852–863 (2004)CrossRefGoogle Scholar
  6. 6.
    Angiulli, F., Pizzuti, C.: Fast outlier detection in high dimensional spaces. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS, vol. 2431, pp. 15–27. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-45681-3_2CrossRefGoogle Scholar
  7. 7.
    Breuning, M., Kriegel, H.-P., Ng, R., Sander, J.: LOF: identifying density-based local outliers. In: SIGMOD 2000, Dallas, Texas, pp. 93–104 (2000)Google Scholar
  8. 8.
    Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: SIGMOD 1984, Boston, Massachusetts, pp. 47–57 (1984)Google Scholar
  9. 9.
    Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufman Publishers, Burlington (2000)zbMATHGoogle Scholar
  10. 10.
    Knorr, E.M., Ng, R.T.: Algorithms for mining distance-based outliers in large dataset. In: VLDB 1998, New York, NY, pp. 392–403 (1998)Google Scholar
  11. 11.
    Knorr, E.M., Ng, R.T.: Finding intentional knowledge of distance-based outliers. In: VLDB 1999, Edinburgh, Scotland, pp. 211–222 (1999)Google Scholar
  12. 12.
    Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Distributed deviation detection in sensor networks. SIGMOD Rec. 32(4), 77–82 (2003)CrossRefGoogle Scholar
  13. 13.
    Ramaswamy, S., Rastogi, R., Kyuseok, S.: Efficient algorithms for mining outliers from large data sets. In: SIGMOD 2000, Dallas Texas, pp. 427–438 (2000)CrossRefGoogle Scholar
  14. 14.
    Papadimitriou, S., Kitagawa, H., Gibbons, P.B., Faloutsos, C.: LOCI: fast outlier detection using the local correlation integral. In: ICDE 2003, Bangalore, India, p. 315 (2003)Google Scholar
  15. 15.
    Pokrajac, D., Lazarevic, A., Latecki, L.: Incremental local outlier detection for data streams. In: CIDM 2007, Honolulu, Hawaii, USA, pp. 504–515 (2007)Google Scholar
  16. 16.
    Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models. In: VLDB 2006, Seoul, Korea, pp. 187–198 (2006)Google Scholar
  17. 17.
    Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-47887-6_53CrossRefGoogle Scholar
  18. 18.
    Zhang, J., Lou, M., Ling, T.W., Wang, H.: HOS-miner: a system for detecting outlying subspaces of high-dimensional data. In: VLDB 2004, Toronto, Canada, pp. 1265–1268 (2004)Google Scholar
  19. 19.
    Zhang, J., Gao, Q., Wang, H.: A novel method for detecting outlying subspaces in high-dimensional databases using genetic algorithm. In: ICDM 2006, Hong Kong, China, pp. 731–740 (2006)Google Scholar
  20. 20.
    Zhang, J., Wang, H.: Detecting outlying subspaces for high-dimensional data the new task, algorithms and performance. Knowl. Inf. Syst. (KAIS) 10, 333–355 (2006)CrossRefGoogle Scholar
  21. 21.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: an efficient data clustering method for very large databases. In: SIGMOD 1996, Montreal, Canada, pp. 103–114 (1996)CrossRefGoogle Scholar
  22. 22.
    Zhu, C., Kitagawa, H., Faloutsos, C.: Example-based robust outlier detection in high dimensional datasets. In: ICDM 2005, Houston, Texas, pp. 829–832 (2005)Google Scholar
  23. 23.
    Zhang, J., Gao, Q., Wang, H., Liu, Q., Xu, K.: Detecting projected outliers in high-dimensional data streams. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 629–644. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-03573-9_53CrossRefGoogle Scholar
  24. 24.
    Zhang, J., Tao, X., Wang, H.: Outlier detection from large distributed databases. World Wide Web J. (WWWJ) 17(4), 539–568 (2014).  https://doi.org/10.1007/s11280-013-0218-4CrossRefGoogle Scholar
  25. 25.
    Zhu, X., Zhang, J., Li, H., Fournier-Viger, P., Lin, J.C.-W., Chang, L.: FRIOD: a deeply integrated feature-rich interactive system for effective and efficient outlier detection. IEEE Access 5, 25682–25695 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Xiaodong Zhu
    • 1
  • Ji Zhang
    • 2
    Email author
  • Zewen Hu
    • 1
  • Hongzhou Li
    • 3
  • Liang Chang
    • 3
  • Youwen Zhu
    • 4
  • Jerry Chun-Wei Lin
    • 5
  • Yongrui Qin
    • 6
  1. 1.Nanjing University of Information Science and TechnologyNanjingChina
  2. 2.University of Southern QueenslandToowoombaAustralia
  3. 3.Guilin University of Electronic TechnologyGuilinChina
  4. 4.Nanjing University of Aeronautics and AstronauticsNanjingChina
  5. 5.Western Norway University of Applied Sciences (HVL)BergenNorway
  6. 6.University of HuddersfieldHuddersfieldUK

Personalised recommendations