Advertisement

Applied Intelligence

, Volume 48, Issue 12, pp 4905–4922 | Cite as

Collaborative fuzzy clustering of distributed concept-drifting dynamic data using a gossip-based approach

  • Hoda Mashayekhi
Article
  • 216 Downloads

Abstract

Clustering is a useful method of analyzing large data sets, such as distributed data streams, which are increasingly observed in various applications. In this paper, a collaborative gossip-based approach is proposed for deriving a fuzzy clustering model of distributed dynamic data which involve concept drift. The proposed algorithm consists of local and collaborative phases. During the two phases, prototypes of data are constructed which constitute a summarized view of the distributed data. This summarized view enables each node to extract a custom subset of the overall clustering model. Scalability is achieved by using gossip as a robust method of communication, and also prevention of excessive data transfer among nodes. When concept drift is present, the clustering model incrementally evolves and outdated parts of the summarized view are removed. The experimental results, with different scenarios of data distribution, show that the proposed method can detect fuzzy clusters efficiently, and adapt with concept-drifting data, with bounded communication costs compared to other state of the art algorithms.

Keywords

Distributed knowledge discovery Dynamic data Collaborative fuzzy clustering Concept drift Granular prototype Gossip-based communication 

References

  1. 1.
    Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, BostonCrossRefGoogle Scholar
  2. 2.
    Gan W, Lin JCW, Chao HC, Zhan J (2017) Data mining in distributed environment: a survey. Wiley Interdiscip Rev Data Min Knowl Discov 7:7Google Scholar
  3. 3.
    Hammouda KM, Kamel MS (2014) Models of distributed data clustering in peer-to-peer environments. Knowl Inf Syst 38(3):303–329CrossRefGoogle Scholar
  4. 4.
    Mashayekhi H, Habibi J, Khalafbeigi T, Voulgaris S, Van Steen M (2015) GDCluster: a general decentralized clustering algorithm. IEEE Trans Knowl Data Eng 27(7):1892–1905CrossRefGoogle Scholar
  5. 5.
    Rodrigues PP, Gama J (2014) Distributed clustering of ubiquitous data streams. Wiley Interdiscip Rev Data Min Knowl Discov 4(1):38–54CrossRefGoogle Scholar
  6. 6.
    Jiang Y, Chung FL, Wang S, Deng Z, Wang J, Qian P (2015) Collaborative fuzzy clustering from multiple weighted views. IEEE Trans Cybern 45(5):688–701CrossRefGoogle Scholar
  7. 7.
    Lodi S, Moro G, Sartori C (2010) Distributed data clustering in multi-dimensional peer-to-peer networks. In: Proceedings of the twenty-first Australas, pp 171–178Google Scholar
  8. 8.
    Vendramin L, Naldi MC, Campello RJGB (2015) Fuzzy clustering algorithms and validity indices for distributed data. In: Proceedings of partitional clustering algorithms, pp 147–192Google Scholar
  9. 9.
    Naldi MC, Campello RJGB (2014) Evolutionary k-means for distributed data sets. Neurocomputing 127:30–42CrossRefGoogle Scholar
  10. 10.
    Zhang Q, Liu J, Wang W (2008) Approximate clustering on distributed data streams. In: Proceedings of IEEE 24th international conference on data engineering, pp 1131–1139Google Scholar
  11. 11.
    Long B, Yu PS, Zhang Z (2008) A general model for multiple view unsupervised learning. In: Proceedings of 2008 SIAM international conference on data mining, pp 822–833CrossRefGoogle Scholar
  12. 12.
    Dhillon IS, Modha DS (2002) A data-clustering algorithm on distributed memory multiprocessors. In: Proceedings of large-scale parallel data mining, pp 245–260CrossRefGoogle Scholar
  13. 13.
    Karunaratne P, Karunasekera S, Harwood A (2017) Distributed stream clustering using micro-clusters on Apache Storm. J Parallel Distrib Comput 108:74–84CrossRefGoogle Scholar
  14. 14.
    Datta S, Giannella C, Kargupta H (2009) Approximate distributed k-means clustering over a peer-to-peer network. IEEE Trans Knowl Data Eng 21(10):1372–1388CrossRefGoogle Scholar
  15. 15.
    Elgohary A, Ismail MA (2011) Efficient data clustering over peer-to-peer networks. In: Proceedings of the 11th international conference on intelligent systems design and applications, pp 208–212Google Scholar
  16. 16.
    Di Fatta G, Blasa F, Cafiero S, Fortino G (2011) Epidemic k-means clustering. In: Proceedings of IEEE 11th international conference on data mining workshops, pp 151–158Google Scholar
  17. 17.
    Fellus J, Picard D, Gosselin PH (2013) Decentralized k-means using randomized gossip protocols for clustering large datasets. In: Proceedings of IEEE 13th international conference on data mining workshops, pp 599–606Google Scholar
  18. 18.
    Zhou J, Chen CP, Chen L, Li H X (2014) A collaborative fuzzy clustering algorithm in distributed network environments. IEEE Trans Fuzzy Syst 22(7):1443–1456CrossRefGoogle Scholar
  19. 19.
    Mashayekhi H, Habibi J, Voulgaris S, van Steen M (2013) GoSCAN: decentralized scalable data clustering. Computing 95(9):759–784MathSciNetCrossRefGoogle Scholar
  20. 20.
    Azimi R, Sajedi H (2018) Peer sampling gossip-based distributed clustering algorithm for unstructured P2P networks. Neural Comput Appl 29(3):593–612CrossRefGoogle Scholar
  21. 21.
    Dunn JC (1973) A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. J Cybern 3:32–57MathSciNetCrossRefGoogle Scholar
  22. 22.
    Wan R, Yan X, Su X (2008) A weighted fuzzy clustering algorithm for data stream. In: Proceedings of the ISECS international colloquium on computing, communication, control, and management, vol 1, pp 360–364Google Scholar
  23. 23.
    Baruah RD, Angelov P (2014) DEC: dynamically evolving clustering and its application to structure identification of evolving fuzzy models. IEEE Trans Cybern 44(9):1619–1631CrossRefGoogle Scholar
  24. 24.
    Mostafavi S, Amiri A (2012) Extending fuzzy c-means to clustering data streams. In: Proceedings of the 20th Iranian conference on electrical engineering, pp 726–729Google Scholar
  25. 25.
    Coletta LF, Vendramin L, Hruschka ER, Campello RJ, Pedrycz W (2012) Collaborative fuzzy clustering algorithms: some refinements and design guidelines. IEEE Trans Fuzzy Syst 20(4):444–462CrossRefGoogle Scholar
  26. 26.
    Pedrycz W (2002) Collaborative fuzzy clustering. Pattern Recognit Lett 23(14):1675–1686CrossRefGoogle Scholar
  27. 27.
    Dang TH, Ngo LT, Pedrycz W (2016) Multiple kernel based collaborative fuzzy clustering algorithm. In: Proceedings of the Asian conference on intelligent information and database systems, pp 585–594Google Scholar
  28. 28.
    Chao G, Sun S, Bi J (2017) A survey on multi-view clustering. arXiv:1712.06246
  29. 29.
    Visalakshi NK, Thangavel K (2009) Distributed data clustering: a comparative analysis. In: Proceedings of the foundations of computational, intelligence, vol 6, pp 371–397Google Scholar
  30. 30.
    Rahimi S, Zargham M, Thakre A, Chhillar D (2004) A parallel fuzzy C-mean algorithm for image segmentation. In: Proceedings of IEEE annual meeting of the fuzzy information, vol 1, pp 234–237Google Scholar
  31. 31.
    Pedrycz W, Rai P (2008) Collaborative clustering with the use of fuzzy C-means and its quantification. Fuzzy Sets Syst 159(18):2399–2427MathSciNetCrossRefGoogle Scholar
  32. 32.
    Shen Y, Pedrycz W (2017) Collaborative fuzzy clustering algorithm: some refinements. Int J Approx Reason 86:41–61MathSciNetCrossRefGoogle Scholar
  33. 33.
    Zarinbal M, Zarandi MF, Turksen IB (2015) Relative entropy collaborative fuzzy clustering method. Pattern Recogn 48(4):933–940CrossRefGoogle Scholar
  34. 34.
    Son LH (2015) DPFCM. Expert Syst Appl 42(1):51–66CrossRefGoogle Scholar
  35. 35.
    Mosk-Aoyama D, Shah D (2006) Computing separable functions via gossip. In: Proceedings of the twenty-fifth annual ACM symposium on principles of distributed computing, pp 113–122Google Scholar
  36. 36.
    Jelasity M, Voulgaris S, Guerraoui R, Kermarrec AM, Van Steen M (2007) Gossip-based peer sampling. ACM Trans Comput Syst 25(4):8CrossRefGoogle Scholar
  37. 37.
    Campello RJ, Hruschka ER (2006) A fuzzy extension of the silhouette width criterion for cluster analysis. Fuzzy Sets Syst 157(21):2858–2875MathSciNetCrossRefGoogle Scholar
  38. 38.
    Rand WM (1971) Objective criteria for the evaluation of clustering methods. J Am Stat Assoc, 66Google Scholar
  39. 39.
    Hubert L, Arabie P (1985) Comparing partitions. J Classif 2(1):193–218CrossRefGoogle Scholar
  40. 40.
    Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in information retrieval, pp 267–273Google Scholar
  41. 41.
    MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 281–297Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Computer EngineeringShahrood University of TechnologyShahroodIran

Personalised recommendations