Advertisement

Clustering data stream with uncertainty using belief function theory and fading function

  • Javad HamidzadehEmail author
  • Reyhaneh Ghadamyari
Methodologies and Application
  • 14 Downloads

Abstract

Data stream clustering faces major challenges such as lack of memory and time. Therefore, traditional clustering methods are not suitable for this kind of data. On the other hand, most data stream clustering methods do not consider the problems of uncertainty and ambiguity in the data. So, in this case, where an object is close to a set of clusters, this object cannot be correctly and simply categorized. The aim of this study is to provide a new method for clustering data stream, called clustering data stream using belief function, with regard to the problem of uncertain and ambiguous data. In the proposed method, the belief function theory is used to cluster objects into single clusters or a set of clusters and determines the structure of data. In addition, using window, weighted centers, and the fading function overcomes the restrictions of data stream. The results of the experiments have been compared with state-of-the-art methods, which show the superiority of the proposed method in terms of purity, error rate, and ambiguity rate measures.

Keywords

Clustering Data stream Uncertainty Belief function theory Dempster–Shafer theory Fading function 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with animals performed by any of the authors.

References

  1. Ackermann MR, Märtens M, Raupach C, Swierkot K, Lammersen C, Sohler C (2012) StreamKM ++: a clustering algorithm for data streams. J Exp Algorithm (JEA) 17:2–4MathSciNetzbMATHGoogle Scholar
  2. Aggarwal C (2013) A survey of stream clustering algorithms. In: Aggarwal CC, Reddy CK (eds) Data clustering: algorithms and applications. Chapman and Hall/CRC, Boca Raton, pp 229–256CrossRefGoogle Scholar
  3. Aggarwal C, Yu P (2008) A framework for clustering uncertain data streams. In: IEEE international conference on data engineering, pp 150–159Google Scholar
  4. Aggarwal C, Han J, Wang J, Yu P, Watson T (2003) A framework for clustering evolving data streams. In: Proceedings of VLDB 2003, pp 81–92Google Scholar
  5. Aggarwal C, Han J, Wang J, Yu P (2004) A framework for projected clustering of high dimensional data streams. In: Proceedings of VLDB, pp 852–863Google Scholar
  6. Ahmad S, Lavin A, Purdy S, Agha Z (2017) Unsupervised real-time anomaly detection for streaming data. Neurocomputing 262:134–147CrossRefGoogle Scholar
  7. Ahmouda A, Hochmair HH, Cvetojevic S (2018) Analyzing the effect of earthquakes on OpenStreetMap contribution patterns and tweeting activities. Geospat Inf Sci 21(3):195–212CrossRefGoogle Scholar
  8. Amini A, Saboohi H, Wah T, Herawan T (2014) A fast density-based clustering algorithm for real-time internet of things stream. Sci World J.  https://doi.org/10.1155/2014/926020 CrossRefGoogle Scholar
  9. Amini A, Saboohi H, Herawan T, Wah T (2016) MuDi-Stream: a multi density clustering algorithm for evolving data stream. Netw Comput Appl 59:370–385CrossRefGoogle Scholar
  10. Antoine V, Quost B, Masson MH, Denoeux T (2014) CEVCLUS: evidential clustering with instance-level constraints for relational data. Soft Comput 18:1321–1335CrossRefGoogle Scholar
  11. Bahri M, Elouedi Z (2017) Clustering data stream under a belief function framework. In: IEEE/ACS 13th international conference of computer systems and applications (AICCSA), pp 1–8Google Scholar
  12. Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Springer, New York.  https://doi.org/10.1007/978-1-4757-0450-1 CrossRefzbMATHGoogle Scholar
  13. Bhatnagar V, Kaur S, Chakravarthy S (2014) Clustering data streams using grid-based synopsis. Knowl Inf Syst 41:127–152CrossRefGoogle Scholar
  14. Calderwood S, McAreavey K, Liu W, Hong J (2017) Context-dependent combination of sensor information in Dempster–Shafer theory for BDI. Knowl Inf Syst 51:259–285CrossRefGoogle Scholar
  15. Cao F, Ester M, Qian W, Zhou A (2006) Density-based clustering over an evolving data stream with noise. In: Proceedings of the sixth SIAM international conference on data mining.  https://doi.org/10.1137/1.9781611972764.29
  16. Chakeri A, Nekooimehr I, Hall LO (2013) Dempster–Shafer theory of evidence in Single Pass Fuzzy C Means. In: 2013 IEEE international conference on fuzzy systems, Hyderabad, pp 1–5Google Scholar
  17. Chen Y, Tu L (2007) Density-based clustering for real-time stream data. In: Proceedings KDD’07 proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining, pp 133–142Google Scholar
  18. Croisard N, Vasile M, Kemble S, Radice G (2010) Preliminary space mission design under uncertainty. Acta Astronaut 66:654–664CrossRefGoogle Scholar
  19. da Silva A, Chiky R, Hébrail G (2012) A clustering approach for sampling data streams in sensor networks. Knowl Inf Syst 32:1–23CrossRefGoogle Scholar
  20. Ding S, Zhang J, Jia H, Qian J (2016) An adaptive density data stream clustering algorithm. Cognit Comput 8:30–38CrossRefGoogle Scholar
  21. Dua D, Taniskidou E (2017) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://archive.ics.uci.edu/ml. Accessed 5 Feb 2018
  22. Frey B, Dueck D (2007) Clustering by passing messages between data points. Science 315:972–976MathSciNetCrossRefGoogle Scholar
  23. Ghesmoune M, Lebbah M, Azzag H (2016) State-of-the-art on clustering data streams. Big Data Anal.  https://doi.org/10.1186/s41044-016-0011-3 CrossRefGoogle Scholar
  24. Ghosh S, Mitra S (2013) Clustering large data with uncertainty. Appl Soft Comput 13:1639–1645CrossRefGoogle Scholar
  25. Hamidzadeh J, Ghomanjani MH (2018) An unequal cluster-radius approach based on node density in clustering for wireless sensor networks. Wireless Pers Commun 101:1619–1637CrossRefGoogle Scholar
  26. Hamidzadeh J, Namaei N (2019) Belief-based chaotic algorithm for support vector data description. Soft Comput 23:4289–4314CrossRefGoogle Scholar
  27. Hamidzadeh J, Monsefi R, Sadoghi Yazdi H (2015) IRAHC: instance reduction algorithm using hyper rectangle clustering. Pattern Recogn 48:1878–1889CrossRefGoogle Scholar
  28. Hamidzadeh J, Zabihimayvan M, Sadeghi R (2018) Detection of Web site visitors based on fuzzy rough sets. Soft Comput 22(7):2175–2188CrossRefGoogle Scholar
  29. Helton JC (2011) Quantification of margins and uncertainties: conceptual and computational basis. Reliab Eng Syst Saf 96:976–1013CrossRefGoogle Scholar
  30. Hofmeyr DP, Pavlidis NG, Eckley IA (2016) Divisive clustering of high dimensional data streams. Stat Comput 26:1101–1120MathSciNetCrossRefGoogle Scholar
  31. Jin C, Yu JX, Zhou A, Cao F (2014) Efficient clustering of uncertain data streams. Knowl Inf Syst 40:509–539CrossRefGoogle Scholar
  32. Khan I, Huang JZ, Ivanov K (2016) Incremental density-based ensemble clustering over evolving data streams. Neurocomputing 191:34–43CrossRefGoogle Scholar
  33. Kranen P, Assent I, Baldauf C, Seidl T (2011) The ClusTree: indexing micro-clusters for anytime stream mining. Knowl Inf Syst 29(2):249–272CrossRefGoogle Scholar
  34. Li Y, Chen J, Feng L (2013) Dealing with uncertainty: a survey of theories and practices. IEEE Trans Knowl Data Eng 25(11):2463–2482CrossRefGoogle Scholar
  35. Liu Z, Pan Q, Dezert J, Martin A (2016) Adaptive imputation of missing values for incomplete pattern classification. Pattern Recogn 52:85–95CrossRefGoogle Scholar
  36. Masson M, Denœux T (2008) ECM: an evidential version of the fuzzy c-means algorithm. Pattern Recogn 41:1384–1397CrossRefGoogle Scholar
  37. Meesuksabai W, Kangkachit T, Waiyamai K (2011) HUE-stream: evolution-based clustering technique for heterogeneous data streams with uncertainty. In: Tang J, King I, Chen L, Wang J (eds) Advanced data mining and applications. ADMA 2011. Lecture notes in computer science. Springer, Berlin, pp 27–40Google Scholar
  38. Mousavi M, Abu Bakar A, Vakilian M (2015) Data stream clustering algorithms: a review. Int J Adv Soft Comput Appl 7:1–15Google Scholar
  39. Nguyen HL, Woon YK, Ng WK (2014) A survey on data stream clustering and classification. Knowl Inf Syst 45:535–569CrossRefGoogle Scholar
  40. Patra BK, Nandi S (2015) Effective data summarization for hierarchical clustering. Knowl Inf Syst 42:1–20CrossRefGoogle Scholar
  41. Pereira C, Mello R (2015) PTS: projected topological stream clustering algorithm. Neurocomputing 180:16–26CrossRefGoogle Scholar
  42. Ramírez-Gallego S, Krawczyk B, García S, Woźniak M, Herrera F (2017) A survey on data preprocessing for data stream mining: current status and future directions. Neurocomputing 239:39–57CrossRefGoogle Scholar
  43. Saxena A, Prasad M, Gupta A, Bharill N, Patel OP, Tiwari A, Er M, Ding W, Lin C (2017) A review of clustering techniques and developments. Neurocomputing 267:664–681CrossRefGoogle Scholar
  44. Serir L, Ramasso E, Zerhouni N (2012) Evidential evolving Gustafson–Kessel algorithm for online data streams partitioning using belief function theory. Int J Approx Reason 53:747–768MathSciNetCrossRefGoogle Scholar
  45. Shafer G (1976) A mathematical theory of evidence. Princeton University Press, PrincetonzbMATHGoogle Scholar
  46. Shang G, Zhu J, Gao T, Zheng X, Zhang J (2018) Using multi-source remote sensing data to classify larch plantations in Northeast China and support the development of multi-purpose silviculture. J For Res 29(4):889–904CrossRefGoogle Scholar
  47. Sheskin D (2011) Handbook of parametric and nonparametric statistical procedures. Chapman and Hall/CRC, Boca RatonzbMATHGoogle Scholar
  48. Silva J, Hruschka E, Gama J (2017) An evolutionary algorithm for clustering data streams with a variable number of clusters. Expert Syst Appl 67:228–238CrossRefGoogle Scholar
  49. Smets P (2000) Data fusion in the transferable belief model. In: Proceedings of the third international conference on information fusion, pp 21–33Google Scholar
  50. Yang Y, Liu Z, Xing Z (2015) A review of uncertain data stream clustering algorithms. In: Eighth international conference on internet computing for science and engineering (ICICSE), Harbin, pp 111–116Google Scholar
  51. Yin C, Xia L, Zhang S, Sun R, Wang J (2018) Improved clustering algorithm based on high-speed network data stream. Soft Comput 22:4185–4195CrossRefGoogle Scholar
  52. Yin C, Zhang S, Yin Z, Wang J (2019) Anomaly detection model based on data stream clustering. Cluster Comput 22:1729–1738CrossRefGoogle Scholar
  53. Yu X, Xu X, Lin L (2015) A data stream subspace clustering algorithm. In: Wang H et al (eds) Intelligent computation in big data era. ICYCSEE 2015. Communications in computer and information science. Springer, Berlin, pp 334–343Google Scholar
  54. Zabihi M, Vafaei Jahan M, Hamidzadeh J (2014) A density based clustering approach for web robot detection. In: Proceedings of the 4th international conference on computer and knowledge engineering.  https://doi.org/10.1109/ICCKE.2014.6993362
  55. Zaman K, Rangavajhala S, McDonald MP, Mahadevan S (2011) A probabilistic approach for representation of interval uncertainty. Reliab Eng Syst Saf 96:117–130CrossRefGoogle Scholar
  56. Zhang B, Qin S, Wang W, Wang D, Xue L (2016) Data stream clustering based on Fuzzy C-Mean algorithm and entropy theory. Sig Process 126:111–116CrossRefGoogle Scholar
  57. Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15:181–214CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Computer Engineering and Information TechnologySadjad University of TechnologyMashhadIran

Personalised recommendations