SparkSCAN: A Structure Similarity Clustering Algorithm on Spark

Zhou, Qijun; Wang, Jingbin

doi:10.1007/978-981-10-0457-5_16

Qijun Zhou¹⁷ &
Jingbin Wang¹⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 590))

Included in the following conference series:

National Conference on Big Data Technology and Applications

1295 Accesses
3 Citations

Abstract

The existing directed graph clustering algorithms are born with some problems such as high latency, resource depletion and poor performance of iterative data processing. A distributed parallel algorithm of structure similarity clustering on Spark (SparkSCAN) is proposed to solve these problems: considering the interaction between nodes in the network, the similar structure of nodes are clustered together; Aiming at the large-scale characteristics of directed graphs, a data structure suitable for distributed graph computing is designed, and a distributed parallel clustering algorithm is proposed based on Spark framework, which improves the processing performance on the premise of the accuracy of clustering results. The experimental results show that the SparkSCAN have a good performance, and can effectively deal with the problem of clustering algorithm for large-scale directed graph.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ding, Y., Zhang, Y., Li, Z.-H., Wang, Y.: Researach and advances on graph data mining. J. Comput. Appl. 32(1), 182–190 (2012)
Google Scholar
Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11(3), 033015-1–033015-18 (2009)
Article Google Scholar
Fallani, F.D.V., Nicosia, V., Latora, V., et al.: Nonparametric resampling of random walks for spectral network clustering. Phys. Rev. E 89(1), 012802-1–012802-5 (2014)
Article Google Scholar
Xu, X.-W., Yuruk, N., Feng, Z.-D., et al.: SCAN: a structural clustering algorithm for networks. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, pp. 824–833 (2007)
Google Scholar
Zhou, D.-Y., Huang, J.-Y., Schölkopf, B.: Learning from labeled and unlabeled data on a directed graph. In: Proceedings of the 22nd International Conference on Machine Learning, Bonn, pp. 1036–1043 (2005)
Google Scholar
Meila, M., Pentney, W.: Clustering by weighted cuts in directed graphs. In: Proceedings of the 7th SIAM International Conference on Data Mining, Minneapolis, pp. 135–144 (2007)
Google Scholar
Chen, J.-J.: Research on Clustering Algorithms for Large—Scale Social Networks based on Structural Similarity. Nankai University (2013)
Google Scholar
Chen, J.-M., Chen, J.-J., Liu, J., Huang, Y.-L., Wang, Y., Feng, X.: Clustering algorithms for large-scale social networks based on structural similarity. J. Electron. Inf. Technol. 02, 449–454 (2015)
Google Scholar
Zhao, W., Martha, V., Xu, X.: Pscan: a parallel structural clustering algorithm for big networks in mapreduce. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 862–869. IEEE (2013)
Google Scholar
Zaharia, M.A.: An Architecture for Fast and General Data Processing on Large Clusters. University of California, Berkeley (2013)
Google Scholar
Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, p. 2. USENIX Association (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Fuzhou University, Fuzhou, 350108, China
Qijun Zhou & Jingbin Wang

Authors

Qijun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Jingbin Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingbin Wang .

Editor information

Editors and Affiliations

Tsinghua University, Dept. of Computer Science and Technology, Beijing, China
Wenguang Chen
Harbin Engineering University, China
Guisheng Yin
South China Normal University, Guangzhou, China
Gansen Zhao
Harbin Engineering University, China
Qilong Han
Northeast Forestry University, Harbin, China
Weipeng Jing
Harbin Univ. of Science and Technology, Harbin, China
Guanglu Sun
Harbin Sea of Clouds & Computer Tech., Harbin, China
Zeguang Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhou, Q., Wang, J. (2016). SparkSCAN: A Structure Similarity Clustering Algorithm on Spark. In: Chen, W., et al. Big Data Technology and Applications. BDTA 2015. Communications in Computer and Information Science, vol 590. Springer, Singapore. https://doi.org/10.1007/978-981-10-0457-5_16

Download citation

DOI: https://doi.org/10.1007/978-981-10-0457-5_16
Published: 02 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0456-8
Online ISBN: 978-981-10-0457-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics