Skip to main content

Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2013 (IDEAL 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8206))

Abstract

In this big data era, the capability of mining and analyzing large scale datasets is imperative. As data are becoming more abundant than ever before, data driven methods are playing a critical role in areas such as decision support and business intelligence. In this paper, we demonstrate how state-of-the-art GPUs and the Dynamic Parallelism feature of the latest CUDA platform can bring significant benefits to BIRCH, one of the most well-known clustering techniques for streaming data. Experiment results show that, on a number of benchmark problems, the GPU accelerated BIRCH can be made up to 154 times faster than the CPU version with good scalability and high accuracy. Our work suggests that massively parallel GPU computing is a promising and effective solution to the challenges of big data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhang, T., Raghu, R., Miron, L.: BIRCH: An Efficient Data Clustering Method for Very Large Databases. ACM SIGMOD Record 25(2), 103–114 (1996)

    Article  Google Scholar 

  2. Zhang, T., Raghu, R., Miron, L.: BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery 1(2), 141–182 (1997)

    Article  Google Scholar 

  3. Fang, W., Lau, K., Lu, M., et al.: Parallel Data Mining on Graphics Processors. Technical Report HKUST-CS08-07 (2008)

    Google Scholar 

  4. Bai, H., He, L., Ouyang, D., Li, Z., Li, H.: K-Means on Commodity GPUs with CUDA. In: 2009 WRI World Congress on Computer Science and Information Engineering, pp. 651–655 (2009)

    Google Scholar 

  5. Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31(3), 264–323 (1999)

    Article  Google Scholar 

  6. Mahdiraji, A.R.: Clustering Data Stream: A Survey of Algorithms. International Journal of Knowledge-Based and Intelligent Engineering Systems 13(2), 39–44 (2009)

    Google Scholar 

  7. Berkhin, P.: A Survey of Clustering Data Mining Techniques. In: Kogan, J., et al. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer (2006)

    Google Scholar 

  8. Barbará, D.: Requirements for Clustering Data Streams. ACM SIGKDD Explorations Newsletter 3(2), 23–27 (2002)

    Article  Google Scholar 

  9. Aggarwal, C.C., Han, J., Wang, J., Yu, P.: A Framework for Clustering Evolving Data Streams. In: 29th International Conference on Very Large Data Bases, pp. 81–92 (2003)

    Google Scholar 

  10. O’Callaghan, L., Meyerson, A., Motwani, R., Mishra, N., Guha, S.: Streaming-Data Algorithms for High-Quality Clustering. In: 18th International Conference on Data Engineering, pp. 685–694 (2002)

    Google Scholar 

  11. Shalom, S.A., Dash, M.: Efficient Partitioning Based Hierarchical Agglomerative Clustering Using Graphics Accelerations with CUDA. International Journal of Artificial Intelligence & Applications 4(2), 13–33 (2013)

    Article  Google Scholar 

  12. Shalom, S.A., Dash, M., Tue, M., Wilson, N.: Hierarchical Agglomerative Clustering Using Graphics Processor with Compute Unified Device Architecture. In: 2009 International Conference on Signal Processing Systems, pp. 556–561 (2009)

    Google Scholar 

  13. Garg, A., Mangla, A., Gupta, N., Bhatnagar, V.: PBIRCH: A Scalable Parallel Clustering Algorithm for Incremental Data. In: 10th IEEE International Database Engineering and Applications Symposium, pp. 315–316 (2006)

    Google Scholar 

  14. Bagga, A., Toshniwal, D.: Parallelization of Hierarchical Text Clustering on Multi-core CUDA Architecture. International Journal of Computer Science and Electrical Engineering 1, 72–76 (2012)

    Google Scholar 

  15. Guha, S., Rastogi, R., Shim, K.: CURE: An Efficient Clustering Algorithm for Large Databases. In: 1998 ACM International Conference on Management of Data, pp. 73–84 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dong, J., Wang, F., Yuan, B. (2013). Accelerating BIRCH for Clustering Large Scale Streaming Data Using CUDA Dynamic Parallelism. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2013. IDEAL 2013. Lecture Notes in Computer Science, vol 8206. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41278-3_50

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-41278-3_50

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-41277-6

  • Online ISBN: 978-3-642-41278-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics