Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1156))

Abstract

We are living in a world of heavy data bombing and the term Big Data is a key issue these days. The variety of applications, where huge amounts of data are produced (can be expressed in PBs and more), is great in many areas such as: Biology, Medicine, Astronomy, Geology, Geography, to name just a few. This trend is steadily increasing. Data Mining is the process for extracting useful information from large data-sets. There are different approaches to discovering properties of datasets. Machine Learning is one of them. In Machine Learning, unsupervised learning deals with unlabeled datasets. One of the primary approaches to unsupervised learning is clustering which is the process of grouping similar entities together. Therefore, it is a challenge to improve the performance of such techniques, especially when we are dealing with huge amounts of data. In this work, we present a survey of techniques which increase the efficiency of two well-known clustering algorithms, k-means and DBSCAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)

    Article  MathSciNet  Google Scholar 

  2. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery, pp. 226–231 (1996)

    Google Scholar 

  3. MPICH, Message Passing Interface. http://www.mpich.org/. Accessed 21 Apr 2019

  4. OpenMP, Open Multi-Processing. http://www.openmp.org/. Accessed 21 Apr 2019

  5. CUDA Zone: NVDIA Accelerated Computing. https://developer.nvidia.com/cudazone. Accessed 21 Apr 2019

  6. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 137–150 (2004)

    Google Scholar 

  7. Kang, S.J., Lee, S.-H., Lee, K.-M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. In: Advances in MM, pp. 575687:1–575687:9 (2015)

    Google Scholar 

  8. He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8(1), 83–99 (2014)

    Article  MathSciNet  Google Scholar 

  9. Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2011)

    MATH  Google Scholar 

  10. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  11. Xu, R., Wunsch, D.: Clustering. Wiley-IEEE Press, Hoboken (2008)

    Book  Google Scholar 

  12. Kadam, P., Jadhav, S., Kulkarni, A., Kulkarni, S.: Survey of parallel implementations of clustering algorithms. Int. J. Adv. Res. 6(10) (2017)

    Google Scholar 

  13. Farivar, R., Rebolledo, D., Chan, E., Campbell, R.H.: A parallel implementation of K-Means clustering on GPUs. In: Arabnia, H.R., Mun, Y. (eds.) PDPTA, pp. 340–345. CSREA Press (2009)

    Google Scholar 

  14. Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on MapReduce. In: Jaatun, M.G., et al. (eds.) CloudCom, pp. 674–679. Springer, Heidelberg (2009)

    Google Scholar 

  15. Savvas, I.K., Kechadi, M.T.: Mining on the cloud - K-means with MapReduce. In: Leymann, F., et al. (eds.) CLOSER, pp. 413–418. SciTePress (2012)

    Google Scholar 

  16. Yang, L., Chiu, S.C., Liao, W.K., Thomas, M.A.: High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J. Supercomput. 70(1), 284–300 (2014)

    Article  Google Scholar 

  17. Jin, S., Cui, Y., Yu, C.: A new parallelization method for k-means. CoRR. Abs/1608.06347 (2016)

    Google Scholar 

  18. Shahrivari, S., Jalili, S.: Single-pass and linear-time K-means clustering based on MapReduce. Inf. Syst. 60, 1–12 (2016)

    Article  Google Scholar 

  19. Savvas, I.K., Tselios, D.C.: Combining distributed and multi-core programming techniques to increase the performance of K-Means algorithm. In: Reddy, S., et al. (eds.) WETICE, pp. 95–100. IEEE Computer Society (2017)

    Google Scholar 

  20. Savvas, I.K., Sofianidou, G.N.: A novel near-parallel version of k-means algorithm for n-dimensional data objects using MPI. IJGUC 7(2), 80–91 (2016)

    Article  Google Scholar 

  21. Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-Means++. CoRR. Abs/1203.6402 (2012)

    Google Scholar 

  22. Wowczko, I.A.: Density-based clustering with DBSCAN and OPTICS. Business Intelligence and Data Mining, (2013)

    Google Scholar 

  23. Arlia, D., Coppola, M.: Experiments in parallel clustering with DBSCAN. In: Sakellariou, R. et al. (eds.) Euro-Par, pp. 326–331. Springer, Heidelberg (2001)

    Google Scholar 

  24. Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)

    Book  Google Scholar 

  25. Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM-SIGMOD International Conference on Management of Data, Atlantic City, NJ, pp. 322–331 (1990)

    Google Scholar 

  26. Patwary, M.M.A., Palsetia, D., Agrawal, A., Liao, K.W., Manne, F., Choudhary, A.N.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: Hollingsworth, J.K. (ed.) SC, p. 62. IEEE/ACM (2012)

    Google Scholar 

  27. Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Cheung, D.W.-L., et al. (eds.) CIKM, pp. 661–670. ACM (2009)

    Google Scholar 

  28. Loh, W.-K., Moon, Y.-S., Park, Y.-H.: Fast density-based clustering using graphics processing units. IEICE Trans. Inf. Syst. 97(7), 1947–1951 (2014)

    Article  Google Scholar 

  29. Savvas, I.K., Tselios, D.C.: Parallelizing DBSCaN algorithm using MPI. In: Reddy, S., Gaaloul, W. (eds.) WETICE, pp. 77–82. IEEE Computer Society (2016)

    Google Scholar 

  30. Song, H., Lee, J.-G.: RP-DBSCAN: a superfast parallel DBSCAN algorithm based on random partitioning. In: Das, G., et al. (eds.) SIGMOD Conference, pp. 1173–1187. ACM (2018)

    Google Scholar 

Download references

Acknowledgments

The reported study was funded by RFBR according to the research project 19-01-246-a, 19-07-00329-a, 18-01-00402-a, 18-08-00549-a.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ilias K. Savvas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Savvas, I.K., Michos, C., Chernov, A., Butakova, M. (2020). High Performance Clustering Techniques: A Survey. In: Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Fourth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19). IITI 2019. Advances in Intelligent Systems and Computing, vol 1156. Springer, Cham. https://doi.org/10.1007/978-3-030-50097-9_26

Download citation

Publish with us

Policies and ethics