Advertisement

SIC-Means: A Semi-fuzzy Approach for Clustering Data Streams Using C-Means

  • Amr Magdy
  • Mahmoud K. Bassiouny
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5998)

Abstract

In recent years, data streaming has gained a significant importance. Advances in both hardware devices and software technologies enable many applications to generate continuous flows of data. This increases the need to develop algorithms that are able to efficiently process data streams. Additionaly, real-time requirements and evolving nature of data streams make stream mining problems, including clustering, challenging research problems. Fuzzy solutions are proposed in the literature for clustering data streams. In this work, we propose a Soft Incremental C-Means variant to enhance the fuzzy approach performance. The experimental evaluation has shown better performance for our approach in terms of Xie-Beni index compared with the pure fuzzy approach with changing different factors that affect the clustering results. In addition, we have conducted a study to analyze the sensitivity of clustering results to the allowed fuzziness level and the size of data history used. This study has shown that different datasets behave differently with changing these factors. Dataset behavior is correlated with the separation between clusters of the dataset.

Keywords

Fuzz Clustering Soft Clustering Data Streams C-Means Clustering Data Streams 

References

  1. 1.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: VLDB 2003: Proceedings of the 29th international conference on Very large data bases. VLDB Endowment, pp. 81–92 (2003)Google Scholar
  2. 2.
    Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: VLDB 2004: Proceedings of the Thirtieth international conference on Very large data bases. VLDB Endowment, pp. 852–863 (2004)Google Scholar
  3. 3.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: PODS 2002: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–16. ACM, New York (2002)CrossRefGoogle Scholar
  4. 4.
    Beringer, J., Hüllermeier, E.: Online clustering of parallel data streams. Data Knowledge Engineering 58(2), 180–204 (2006)CrossRefGoogle Scholar
  5. 5.
    Can, F., Ozkarahan, E.: A dynamic cluster maintenance system for information retrieval. In: SIGIR 1987: Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 123–131. ACM, New York (1987)CrossRefGoogle Scholar
  6. 6.
    Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings Sixth SIAM Intl. Conference Data Mining (2006)Google Scholar
  7. 7.
    Cormode, G., Garofalakis, M.: Sketching probabilistic data streams. In: SIGMOD 2007: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pp. 281–292. ACM, New York (2007)CrossRefGoogle Scholar
  8. 8.
    Dai, B.-R., Huang, J.-W., Yeh, M.-Y., Chen, M.-S.: Clustering on demand for multiple data streams. In: ICDM 2004. Fourth IEEE International Conference on Data Mining, November 2004, pp. 367–370 (2004)Google Scholar
  9. 9.
    Dang, X.H., Lee, V.C., Ng, W.K., Ong, K.L.: Incremental and adaptive clustering stream data over sliding window. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2009. LNCS, vol. 5690, pp. 660–674. Springer, Heidelberg (2009)Google Scholar
  10. 10.
    Eschrich, S., Ke, J., Hall, L., Goldgof, D.: Fast accurate fuzzy clustering through data reduction. IEEE Transactions on Fuzzy Systems 11(2), 262–270 (2003)CrossRefGoogle Scholar
  11. 11.
    Ester, M., Kriegel, H., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, pp. 226–231. AAAI Press, Menlo Park (1996)Google Scholar
  12. 12.
    Gaber, M.M., Zaslavsky, A., Krishnaswamy, S.: Mining data streams: a review. SIGMOD Record 34(2), 18–26 (2005)CrossRefGoogle Scholar
  13. 13.
    Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering 15(3), 515–528 (2003)CrossRefGoogle Scholar
  14. 14.
    Hore, P., Hall, L., Goldgof, D.: A fuzzy c means variant for clustering evolving data streams. In: ISIC. IEEE International Conference on Systems, Man and Cybernetics, October 2007, pp. 360–365 (2007)Google Scholar
  15. 15.
    Ismail, M.: Soft clustering: algorithms and validity of solutions. Fuzzy Computing, 445–472 (1988)Google Scholar
  16. 16.
    Jain, A.: Data Clustering: 50 Years Beyond K-Means. Pattern Recognition Letters (2009)Google Scholar
  17. 17.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Computing Surveys 31(3), 264–323 (1999)CrossRefGoogle Scholar
  18. 18.
    Jia, C., Tan, C., Yong, A.: A grid and density-based clustering algorithm for processing data stream. In: WGEC 2008: Proceedings of the 2008 Second International Conference on Genetic and Evolutionary Computing, pp. 517–521. IEEE Computer Society, Los Alamitos (2008)CrossRefGoogle Scholar
  19. 19.
    Liu, Y., Cai, J., Yin, J., Fu, A.: Clustering text data streams. Journal of Computer Science and Technology 23(1), 112–128 (2008)CrossRefGoogle Scholar
  20. 20.
    Lühr, S., Lazarescu, M.: Connectivity based stream clustering using localised density exemplars. In: Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A. (eds.) PAKDD 2008. LNCS (LNAI), vol. 5012, pp. 662–672. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  21. 21.
    Nasraoui, O., Uribe, C., Coronel, C., Gonzalez, F.: Tecno-streams: tracking evolving clusters in noisy data streams with a scalable immune system learning model. In: ICDM 2003. Third IEEE International Conference on Data Mining, November 2003, pp. 235–242 (2003)Google Scholar
  22. 22.
    Callaghan, L.O., Mishra, N., Meyerson, A., Guha, S., Motwani, R.: Streaming-data algorithms for high-quality clustering. In: Proceedings of the International Conference on Data Engineering, pp. 685–696. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  23. 23.
    Park, N.H., Lee, W.S.: Statistical grid-based clustering over data streams. SIGMOD Record 33(1), 32–37 (2004)CrossRefGoogle Scholar
  24. 24.
    Selim, S., Ismail, M.: Soft clustering of multidimensional data: a semi-fuzzy approach. Pattern Recognition 17(5), 559–568 (1984)zbMATHCrossRefGoogle Scholar
  25. 25.
    Tasoulis, D.K., Adams, N.M., Hand, D.J.: Unsupervised clustering in streaming data. In: ICDMW 2006: Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops, pp. 638–642. IEEE Computer Society Press, Los Alamitos (2006)CrossRefGoogle Scholar
  26. 26.
    Tu, L., Chen, Y.: Stream data clustering based on grid density and attraction. ACM Transactions Knowledge Discovery Data 3(3), 1–27 (2009)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Wan, L., Ng, W.K., Dang, X.H., Yu, P.S., Zhang, K.: Density-based clustering of data streams at multiple resolutions. ACM Transactions Knowledge Discovery Data 3(3), 1–28 (2009)CrossRefGoogle Scholar
  28. 28.
    Xie, X., Beni, G.: A validity measure for fuzzy clustering. IEEE Transactions on pattern analysis and machine intelligence 13(8), 841–847 (1991)CrossRefGoogle Scholar
  29. 29.
    Yang, C., Zhou, J.: Hclustream: A novel approach for clustering evolving heterogeneous data stream. In: ICDMW 2006: Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops, pp. 682–688. IEEE Computer Society, Los Alamitos (2006)CrossRefGoogle Scholar
  30. 30.
    Yang, J.: Dynamic clustering of evolving streams with a single pass. In: Proceedings of 19th International Conference on Data Engineering, March 2003, pp. 695–697 (2003)Google Scholar
  31. 31.
    Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. SIGMOD Record 25(2), 103–114 (1996)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Amr Magdy
    • 1
  • Mahmoud K. Bassiouny
    • 1
  1. 1.Computer and Systems EngineeringAlexandria UniversityEgypt

Personalised recommendations