High Performance Clustering Techniques: A Survey

Savvas, Ilias K.; Michos, Christos; Chernov, Andrey; Butakova, Maria

doi:10.1007/978-3-030-50097-9_26

Ilias K. Savvas¹⁸,
Christos Michos¹⁸,
Andrey Chernov¹⁹ &
…
Maria Butakova¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1156))

Included in the following conference series:

International Conference on Intelligent Information Technologies for Industry

785 Accesses
2 Citations

Abstract

We are living in a world of heavy data bombing and the term Big Data is a key issue these days. The variety of applications, where huge amounts of data are produced (can be expressed in PBs and more), is great in many areas such as: Biology, Medicine, Astronomy, Geology, Geography, to name just a few. This trend is steadily increasing. Data Mining is the process for extracting useful information from large data-sets. There are different approaches to discovering properties of datasets. Machine Learning is one of them. In Machine Learning, unsupervised learning deals with unlabeled datasets. One of the primary approaches to unsupervised learning is clustering which is the process of grouping similar entities together. Therefore, it is a challenge to improve the performance of such techniques, especially when we are dealing with huge amounts of data. In this work, we present a survey of techniques which increase the efficiency of two well-known clustering algorithms, k-means and DBSCAN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
Article MathSciNet Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd International Conference on Knowledge Discovery, pp. 226–231 (1996)
Google Scholar
MPICH, Message Passing Interface. http://www.mpich.org/. Accessed 21 Apr 2019
OpenMP, Open Multi-Processing. http://www.openmp.org/. Accessed 21 Apr 2019
CUDA Zone: NVDIA Accelerated Computing. https://developer.nvidia.com/cudazone. Accessed 21 Apr 2019
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51, 137–150 (2004)
Google Scholar
Kang, S.J., Lee, S.-H., Lee, K.-M.: Performance comparison of OpenMP, MPI, and MapReduce in practical problems. In: Advances in MM, pp. 575687:1–575687:9 (2015)
Google Scholar
He, Y., Tan, H., Luo, W., Feng, S., Fan, J.: MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data. Front. Comput. Sci. 8(1), 83–99 (2014)
Article MathSciNet Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc., San Francisco (2011)
MATH Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Xu, R., Wunsch, D.: Clustering. Wiley-IEEE Press, Hoboken (2008)
Book Google Scholar
Kadam, P., Jadhav, S., Kulkarni, A., Kulkarni, S.: Survey of parallel implementations of clustering algorithms. Int. J. Adv. Res. 6(10) (2017)
Google Scholar
Farivar, R., Rebolledo, D., Chan, E., Campbell, R.H.: A parallel implementation of K-Means clustering on GPUs. In: Arabnia, H.R., Mun, Y. (eds.) PDPTA, pp. 340–345. CSREA Press (2009)
Google Scholar
Zhao, W., Ma, H., He, Q.: Parallel K-means clustering based on MapReduce. In: Jaatun, M.G., et al. (eds.) CloudCom, pp. 674–679. Springer, Heidelberg (2009)
Google Scholar
Savvas, I.K., Kechadi, M.T.: Mining on the cloud - K-means with MapReduce. In: Leymann, F., et al. (eds.) CLOSER, pp. 413–418. SciTePress (2012)
Google Scholar
Yang, L., Chiu, S.C., Liao, W.K., Thomas, M.A.: High performance data clustering: a comparative analysis of performance for GPU, RASC, MPI, and OpenMP implementations. J. Supercomput. 70(1), 284–300 (2014)
Article Google Scholar
Jin, S., Cui, Y., Yu, C.: A new parallelization method for k-means. CoRR. Abs/1608.06347 (2016)
Google Scholar
Shahrivari, S., Jalili, S.: Single-pass and linear-time K-means clustering based on MapReduce. Inf. Syst. 60, 1–12 (2016)
Article Google Scholar
Savvas, I.K., Tselios, D.C.: Combining distributed and multi-core programming techniques to increase the performance of K-Means algorithm. In: Reddy, S., et al. (eds.) WETICE, pp. 95–100. IEEE Computer Society (2017)
Google Scholar
Savvas, I.K., Sofianidou, G.N.: A novel near-parallel version of k-means algorithm for n-dimensional data objects using MPI. IJGUC 7(2), 80–91 (2016)
Article Google Scholar
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable K-Means++. CoRR. Abs/1203.6402 (2012)
Google Scholar
Wowczko, I.A.: Density-based clustering with DBSCAN and OPTICS. Business Intelligence and Data Mining, (2013)
Google Scholar
Arlia, D., Coppola, M.: Experiments in parallel clustering with DBSCAN. In: Sakellariou, R. et al. (eds.) Euro-Par, pp. 326–331. Springer, Heidelberg (2001)
Google Scholar
Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, R., Sunderam, V.: PVM: Parallel Virtual Machine—A Users’ Guide and Tutorial for Networked Parallel Computing. MIT Press, Cambridge (1994)
Book Google Scholar
Beckmann, N., Kriegel, H., Schneider, R., Seeger, B.: The R*-tree: an efficient and robust access method for points and rectangles. In: Proceedings of ACM-SIGMOD International Conference on Management of Data, Atlantic City, NJ, pp. 322–331 (1990)
Google Scholar
Patwary, M.M.A., Palsetia, D., Agrawal, A., Liao, K.W., Manne, F., Choudhary, A.N.: A new scalable parallel DBSCAN algorithm using the disjoint-set data structure. In: Hollingsworth, J.K. (ed.) SC, p. 62. IEEE/ACM (2012)
Google Scholar
Böhm, C., Noll, R., Plant, C., Wackersreuther, B.: Density-based clustering using graphics processors. In: Cheung, D.W.-L., et al. (eds.) CIKM, pp. 661–670. ACM (2009)
Google Scholar
Loh, W.-K., Moon, Y.-S., Park, Y.-H.: Fast density-based clustering using graphics processing units. IEICE Trans. Inf. Syst. 97(7), 1947–1951 (2014)
Article Google Scholar
Savvas, I.K., Tselios, D.C.: Parallelizing DBSCaN algorithm using MPI. In: Reddy, S., Gaaloul, W. (eds.) WETICE, pp. 77–82. IEEE Computer Society (2016)
Google Scholar
Song, H., Lee, J.-G.: RP-DBSCAN: a superfast parallel DBSCAN algorithm based on random partitioning. In: Das, G., et al. (eds.) SIGMOD Conference, pp. 1173–1187. ACM (2018)
Google Scholar

Download references

Acknowledgments

The reported study was funded by RFBR according to the research project 19-01-246-a, 19-07-00329-a, 18-01-00402-a, 18-08-00549-a.

Author information

Authors and Affiliations

University of Thessaly, Larissa, Greece
Ilias K. Savvas & Christos Michos
Rostov State Transport University, Rostov-on-Don, Russia
Andrey Chernov & Maria Butakova

Authors

Ilias K. Savvas
View author publications
You can also search for this author in PubMed Google Scholar
Christos Michos
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Chernov
View author publications
You can also search for this author in PubMed Google Scholar
Maria Butakova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ilias K. Savvas .

Editor information

Editors and Affiliations

Rostovskogo Strelkovogo Polka Narodnogo, Rostov State Transport University, Rostov-on-Don, Russia
Sergey Kovalev
Bauman Moscow State Technical University, Moscow, Russia
Valery Tarassov
Department of Computer Science, VSB-Technical University of Ostrava, Ostrava-Poruba, Czech Republic
Vaclav Snasel
Rostovskogo Strelkovogo Polka, Rostov State Transport University, Rostov-on-Don, Russia
Andrey Sukhanov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Savvas, I.K., Michos, C., Chernov, A., Butakova, M. (2020). High Performance Clustering Techniques: A Survey. In: Kovalev, S., Tarassov, V., Snasel, V., Sukhanov, A. (eds) Proceedings of the Fourth International Scientific Conference “Intelligent Information Technologies for Industry” (IITI’19). IITI 2019. Advances in Intelligent Systems and Computing, vol 1156. Springer, Cham. https://doi.org/10.1007/978-3-030-50097-9_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-50097-9_26
Published: 23 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50096-2
Online ISBN: 978-3-030-50097-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics