Abstract
One of the most popular clustering algorithm is DBSCAN, which is known to be efficient and highly resistant to noise. In this paper we propose its distributed implementation. Distributed computing is a very fast growing way of solving problems in big datasets using a multinode cluster, rather than parallelization in one computer. Using its features in proper way, can lead to higher performance and, what is probably more important, higher scalability. In order to show added value of this way of designing and implementing algorithms we compare our results with GPU parallelization. On the basis of the obtained results We formulate the propositions how to improve our solution.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Cal, P., Woźniak, M.: Data preprocessing with GPU for DBSCAN algorithm. In: Proceedings of the 8th International Conference on Computer Recognition Systems, CORES 2013, pp. 793–801 (2013)
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Han, J., Fayyad, U.M. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD 1996), pp. 226–231. AAAI Press (1996)
Wang, R.Y., Storey, V.C., Firth, C.P.: A framework for analysis of data quality research. IEEE Trans. Knowl. Data Eng. 7, 623–640 (1995)
Zhu, X., Wu, X.: Class noise vs attribute noise: a quantitative study. Artif. Intell. Rev. 22, 177–210 (2004)
Wu, X., Zhu, X.: Mining with noise knowledge: error-aware data mining. IEEE Trans. Syst. Man Cybern. 38, 917–932 (2008)
Brecheisen, S., Kriegel, H.-P., Pfeifle, M.: Parallel density-based clustering of complex objects. In: PAKDD 2006, pp. 179–188. Springer, Heidelberg (2006)
Li, H., Chen, M., Gao, X.: Parallel dbscan with priority r-tree. In: Information Management and Engineering (ICIME) (2010)
Patwary, M.A., Palsetia, D., Agrawal, A., Liao, W.-K., Manne, F., Choudhary, A.: A new scalable parallel dbscan algorithm using the disjoint-set data structure. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 62:1–62:11. IEEE Computer Society Press, Los Alamitos (2012)
Xu, X., Jäger, J., Kriegel, H.-P.: A fast parallel clustering algorithm for large spatial databases. Data Min. Knowl. Discov., 263–290 (1999)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: OSDI 2004, pp. 137–150 (2004)
White, T.: Hadoop, The Definitive Guide. O’Reilly Media Inc. (2012)
Apache Hadoop Project, “Apache Hadoop" (2016). http://hadoop.apache.org/, Accessed December 2016
Karau, H., Konwinski, A., Wendell, P., Zaharia, M.: Learning Spark: Lightning-Fast Big Data Analytics. O’Reilly Media, Incorporated (2015)
Spark, A.: Lightning-fast cluster computing, “Apache Spar” (2016). https://spark.apache.org/, Accessed December 2016
Baldi, P., Sadowski, P., Whiteson, D.: Searching for exotic particles in high-energy physics with deep learning. Nat. Commun. 5, 2 July 2014
Porwik, P., Doroz, R.: Self-adaptive biometric classifier working on the reduced dataset. In: 9th International Conference on Hybrid Artificial Intelligence Systems (HAIS), Salamanca. Spain Book Series, LNCS, vol. 8480, pp. 377–388 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Merk, A., Cal, P., Woźniak, M. (2018). Distributed DBSCAN Algorithm – Concept and Experimental Evaluation. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-59162-9_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59161-2
Online ISBN: 978-3-319-59162-9
eBook Packages: EngineeringEngineering (R0)