A Storm-Based Parallel Clustering Algorithm of Streaming Data

  • Fang-Zhu Xu
  • Zhi-Ying Jiang
  • Yan-Lin He
  • Ya-Jie Wang
  • Qun-Xiong ZhuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11304)


Aiming at solving the shortcomings of traditional Single-Pass clustering algorithms, such as low accuracy and large amount of computation, a novel Storm-based parallel Single-Pass clustering algorithm is proposed to discovery of hot events in the food field. In order to solve the problem of data inconsistency in parallel computing, a method of dynamically acquiring cluster increments and random delays is adopted to improve the Single-Pass algorithm. In order to validate the performance of the proposed method, a case study of news events classification is carried out. Simulation results show that the proposed algorithm can effectively improve the cluster repetition in clustering results and greatly improve the accuracy and efficiency of clustering compared with the traditional Single-Pass algorithm.


Parallel clustering Streaming data Single-Pass algorithm 


  1. 1.
    Cheng, X.Q., Jin, X.L., Wang, Y.Z., Guo, J., Zhang, T., Li, G.: Survey on big data system and analytic technology. J. Softw. 25(9), 1889–1908 (2014)Google Scholar
  2. 2.
    Hengmin, Z., Weiwei, Z.: Study on web topic online clustering approach based on single-pass algorithm. Data Anal. Knowl. Discov. 27(12), 52–57 (2011)Google Scholar
  3. 3.
    Wu, Y.: Network big data: a literature survey on stream data mining. JSW 9(9), 2427–2434 (2014)CrossRefGoogle Scholar
  4. 4.
    Gu, H., Si, F., Xu, Z.G.: Turbine performance monitoring method based on clustream data steam. China Acad. J. Electron. Publishing House 5, 5180–5184 (2013)Google Scholar
  5. 5.
    Forestiero, A., Pizzuti, C., Spezzano, G.: A single pass algorithm for clustering evolving data streams based on swarm intelligence. Data Min. Knowl. Discov. 26(1), 1–26 (2013)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Dong, X., Li, R., Zhou, W., Wang, C., Xue, Z., Liao, D.: Performance optimization and feature enhancements of hadoop system. J. Comput. Res. Develop. 50(S2), 1–15 (2013)Google Scholar
  7. 7.
    Zhao, F., Lin, S., Gao, X., Computers, S.O.: The research and application of storm framework for large data. Microcomput. Appl. (2016)Google Scholar
  8. 8.
    Tu, S., Huang, M.: Mining microblog user interests based on textrank with tf-idf factor. J. China Univ. Posts Telecommun. 23(5), 40–46 (2016)CrossRefGoogle Scholar
  9. 9.
    Guo, Q.: The similarity computing of documents based on VSM. Comput. Softw. Appl. 5186, 585–586 (2008)Google Scholar
  10. 10.
    Yan, D., Hua, E., Hu, B.: An improved single-pass algorithm for Chinese microblog topic detection and tracking. In: IEEE International Congress on Big Data, pp. 251–258 (2016)Google Scholar
  11. 11.
    Karunaratne, P., Karunasekera, S., Harwood, A.: Distributed stream clustering using micro-clusters on apache storm. J. Parallel Distrib. Comput. 108, 74–84 (2017)CrossRefGoogle Scholar
  12. 12.
    Yi, W., Teng, F., Xu, J.: Noval stream data mining framework under the background of big data. Cybern. Inf. Technol. 16(5), 69–77 (2016)MathSciNetGoogle Scholar
  13. 13.
    Hassani, M., Seidl, T.: Clustering big data streams: recent challenges and contributions. IT Inf. Technol. 58(4), 206–213 (2016)Google Scholar
  14. 14.
    Hyde, R., Angelov, P., Mackenzie, A.R.: Fully online clustering of evolving data streams into arbitrarily shaped clusters. Inf. Sci. 382, 96–114 (2016)Google Scholar
  15. 15.
    Zheng, L., Huo, H., Guo, Y., Fang, T.: Supervised adaptive incremental clustering for data stream of chunks. Neurocomputing 219, 502–517 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Fang-Zhu Xu
    • 1
    • 2
  • Zhi-Ying Jiang
    • 1
    • 2
  • Yan-Lin He
    • 1
    • 2
  • Ya-Jie Wang
    • 3
  • Qun-Xiong Zhu
    • 1
    • 2
    Email author
  1. 1.College of Information Science and TechnologyBeijing University of Chemical TechnologyBeijingChina
  2. 2.Engineering Research Center of Intelligent PSEMinistry of Education of ChinaBeijingChina
  3. 3.Guizhou Food Safety Testing Engineering Technology Research Center Co., Ltd.GuizhouChina

Personalised recommendations