Research and Application of DBSCAN Algorithm Based on Hadoop Platform

Fu, Xiufen; Wang, Yaguang; Ge, Yanna; Chen, Peiwen; Teng, Shaohua

doi:10.1007/978-3-319-09265-2_9

Xiufen Fu¹⁸,
Yaguang Wang¹⁸,
Yanna Ge¹⁸,
Peiwen Chen¹⁸ &
…
Shaohua Teng¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 8351))

Included in the following conference series:

Joint International Conference on Pervasive Computing and the Networked World

3146 Accesses
2 Citations

Abstract

Along with the rapid development of information age, more and more data can be obtained from the Internet, it is very difficult to get useful information and knowledge from these huge amounts of data. On the foundation of the existing algorithm based on DBSCAN, a new improved incremental DBSCAN clustering algorithm is proposed. Combining with cloud computing open source framework Hadoop, the improved algorithm use the programming model of MapReduce which can easy write distributed applications and simplify distributed programme to divide a huge amounts of data elements into chunks and distribute the chunks across the cluster and run the algorithm as a MapReduce job, in this way, this improved algorithm of data mining is integrated with framework Hadoop by the DBSCAN clustering algorithm. When data manipulation (add or delete) has occurred in the database, what we need to do is to mine the mutative data and merge the similar clusters, and ultimately form the final knowledge mining.Compared with single node server serial arithmetic and the overall mining, the time delay of data processing will be reduced. In the last part,the paper verified the effectiveness by experiments and data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Armbrust, M., Fox, A., Griffith, R., et al.: Above the clouds:A berkely view of cloud computing. University of California, Berkely (2009)
Google Scholar
Xu, G., Xu, F., Ma, H.: Deploying and researching Hadoop in virtual machines. In: IEEE International Conference on Digital Object Identifier Automation and Logistics (ICAL), pp. 395–399 (2012)
Google Scholar
Kurazumi, S., Tsumura, T., Saito, S., Matsuo, H.: Dynamic Processing Slots Scheduling for I/O Intensive Jobs of Hadoop MapReduce. In: Third International Conference on Networking and Computing (ICNC), pp. 288–292 (2012)
Google Scholar
He, Y., Tan, H., Luo, W., Mao, H., Ma, D., Feng, S., Fan, J.: MR-DBSCAN: An Efficient Parallel Density-Based Clustering Algorithm Using MapReduce. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 473–480 (2011)
Google Scholar
Yue, C., Jinsheng, Y.: Text Clustering Based on Improved DBSCAN Algorithm. Computer Engineering 37(12), 50–52 (2011)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce.:Simplified Data Processing on Large Clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Aggarwal, C.C., Yu, P.S.: A Survey of Uncertain Data Algorithms and Applications. IEEE Transactions on Knowledge and Data Engineering 21, 609–623 (2009)
Article Google Scholar
Hanxi, L.: Research Community Mining Based on DBSCAN Algorithm. Computer Applications and Software 26(9), 110–113 (2009)
Google Scholar
Wenfeng, L., Xiaoxia, Q.: Study of Chameleon Clustering Algorithm and implementation in Weka. Computer Systems & Applications 19(12), 246–250 (2010)
Google Scholar
Shenyi, J., Guansong, P., Lisha, Z.: Chameleon Algorithm is Improved. Journal of Chinese Computer Systems 31(8), 1643–1646 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Guangdong University of Technology, Guangzhou, P.R.China, 510006
Xiufen Fu, Yaguang Wang, Yanna Ge, Peiwen Chen & Shaohua Teng

Authors

Xiufen Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yaguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanna Ge
View author publications
You can also search for this author in PubMed Google Scholar
Peiwen Chen
View author publications
You can also search for this author in PubMed Google Scholar
Shaohua Teng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Logistics Engineering, Wuhan University of Technology, 430063, Wuhan, Hubei, China
Qiaohong Zu
Facultad de Ingenieria y Ciencias Universidad Adolfo Ibanez, Vina del Mar, Chile
Maria Vargas-Vera
Fujitsu, Hayes, Middlesex, UK
Bo Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, X., Wang, Y., Ge, Y., Chen, P., Teng, S. (2014). Research and Application of DBSCAN Algorithm Based on Hadoop Platform. In: Zu, Q., Vargas-Vera, M., Hu, B. (eds) Pervasive Computing and the Networked World. ICPCA/SWS 2013. Lecture Notes in Computer Science, vol 8351. Springer, Cham. https://doi.org/10.1007/978-3-319-09265-2_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-09265-2_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-09264-5
Online ISBN: 978-3-319-09265-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics