Skip to main content

MicroGRID: An Accurate and Efficient Real-Time Stream Data Clustering with Noise

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10938))

Included in the following conference series:

Abstract

Data stream clustering aims to produce clusters from a data-stream in a real-time. Many of existing algorithms focus however on solving a single problem, leaving anomalous noise in data streams at the wayside. This paper describes the MicroGRID approach to cluster data from single data-streams to handle noisy data streams, accurately identifying and separating noise-affected data points from outlier points. In particular, MicroGRID utilises a combination of micro-cluster and grid-based prospectives, an approach that has not been attempted when clustering data-streams. The experimental results clearly show that MicroGRID significantly outperforms the baseline methods: MicroGRID is up 87% faster and up to 80% more accurate clustering outputs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for clustering evolving data streams. In: Proceedings of the 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)

    Chapter  Google Scholar 

  2. Aggarwal, C.C., Han, J., Wang, J., Yu, P.S.: A framework for projected clustering of high dimensional data streams. In: Proceedings of the 13th International Conference on Very Large Data Bases, pp. 852–863 (2004)

    Chapter  Google Scholar 

  3. Aggarwal, C.C., Yu, P.S.: A framework for clustering uncertain data streams. In: 24th Proceedings of the IEEE International Conference on Data Engineering, pp. 150–159 (2008)

    Google Scholar 

  4. Al Aghbari, Z., Kamel, I., Awad, T.: On clustering large number of data streams. Intell. Data Anal. 16(1), 69–91 (2012)

    Google Scholar 

  5. Amini, A., Wah, T.Y., Saybani, M.R., Yazdi, S.R.: A study of density-grid based clustering algorithms on data streams. In: Proceedings of the 8th IEEE International Conference on Fuzzy Systems and Knowledge Discovery, pp. 1652–1656 (2011)

    Google Scholar 

  6. Amini, A., Saboohi, H., Herawan, T., Wah, T.Y.: Mudi-stream: s multi density clustering algorithm for evolving data stream. J. Netw. Comput. Appl. 59, 370–385 (2016)

    Article  Google Scholar 

  7. Cao, F., Ester, M., Qian, W., Zhou, A.: Density-based clustering over an evolving data stream with noise. In: Proceedings of the SIAM International Conference on Data Mining, vol. 6, pp. 328–339 (2006)

    Chapter  Google Scholar 

  8. Chen, L., Zou, L.J., Tu, L.: A clustering algorithm for multiple data streams based on spectral component similarity. Inf. Sci. 183(1), 35–47 (2012)

    Article  Google Scholar 

  9. Chen, Y., Tu, L.: Density-based clustering for real-time stream data. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM (2007)

    Google Scholar 

  10. Ciampi, A., Appice, A., Malerba, D.: Summarization for geographically distributed data streams. In: Proceedings of Knowledge-Based and Intelligent Information and Engineering Systems, pp. 339–348 (2010)

    Chapter  Google Scholar 

  11. de Andrade Silva, J., Hruschka, E.R.: Extending k-means-based algorithms for evolving data streams with variable number of clusters. In: Proceedings of the 10th International Conference on Machine Learning and Applications, pp. 14–19 (2011)

    Google Scholar 

  12. Hahsler, M., Bolaos, M.: Clustering data streams based on shared density between micro-clusters. IEEE Trans. Knowl. Data Eng. 28, 1449–1461 (2016)

    Article  Google Scholar 

  13. Huang, G., Zhang, Y., Cao, J., Steyn, M., Taraporewalla, K.: Online mining abnormal period patterns from multiple medical sensor data streams. World Wide Web 17(4), 569–587 (2014)

    Article  Google Scholar 

  14. Liu, W., and J. OuYang. Clustering algorithm for high dimensional data stream over sliding windows. In: Proceedings of 10th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pp. 1537–1542 (2011)

    Google Scholar 

  15. Qi, Z., Jinze, L., Wei, W.: Approximate clustering on distributed data streams, pp. 1131–1139 (2008)

    Google Scholar 

  16. Sabit, H., Al-Anbuky, A., Gholam-Hosseini, H.: Distributed WSN data stream mining based on fuzzy clustering. In: Proceedings of Symposia on Ubiquitous, Autonomic and Trusted Computing, pp. 395–400 (2009)

    Google Scholar 

  17. Wang, C.D., Lai, J.H., Huang, D., Zheng, W.S.: SVStream: a support vector-based algorithm for clustering data streams. IEEE Trans. Knowl. Data Eng. 25(6), 1410–1424 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Z. Tari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tari, Z., Thompson, A., Almusalam, N., Bertok, P., Mahmood, A. (2018). MicroGRID: An Accurate and Efficient Real-Time Stream Data Clustering with Noise. In: Phung, D., Tseng, V., Webb, G., Ho, B., Ganji, M., Rashidi, L. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2018. Lecture Notes in Computer Science(), vol 10938. Springer, Cham. https://doi.org/10.1007/978-3-319-93037-4_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93037-4_38

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93036-7

  • Online ISBN: 978-3-319-93037-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics