Abstract
Nowadays plenty of data are in graph format. For example, knowledge graph use vertices to represent entities and use edges to represent relations between entities; graph data in microbiology contain microorganisms and relations between them etc. So information can be obtained by graph mining from these data. Graph clustering is a part of graph mining. Recent years, many graph clustering algorithms have been proposed. But most of them are Sequential Algorithms. So they cannot run in distributed environment. In this case the volume of data that can be processed by the algorithms is limited. In this paper we propose a new parallelized graph clustering algorithm based on Spark. And some methods have been adopted in the algorithm to improve its running speed. From the experimental results we can find that the proposed algorithm is better than the parallelized graph clustering algorithm for comparison.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Huttlin, E.L., Bruckner, R.J., Paulo, J.A., et al.: Architecture of the human interactome defines protein communities and disease networks. Nature 545(7655), 505 (2017)
Vehlow, C., Beck, F., Auwärter, P., et al.: Visualizing the evolution of communities in dynamic graphs. Comput. Graphics Forum 34(1), 277–288 (2015)
Krishnamurthy, B., Wang, J.: On network-aware clustering of web clients. ACM SIGCOMM Comput. Commun. Rev. 30(4), 97–110 (2000)
Wickramaarachchi, C., Frincu, M., Small, P., et al.: Fast parallel algorithm for unfolding of communities in large graphs. In: High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)
Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)
Moon, S., Lee, J.G., Kang, M.: Scalable community detection from networks by computing edge betweenness on MapReduce. In: International Conference on Big Data and Smart Computing (BIGCOMP), pp. 145–148. IEEE (2014)
Shi, J., Xue, W., Wang, W., et al.: Scalable community detection in massive social networks using MapReduce. IBM J. Res. Dev. 57(3/4), 1–12 (2013)
Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009)
Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99(12), 7821–7826 (2002)
Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)
Lancichinetti, A., Fortunato, S., Kertsz, J.: Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11(3), 033015 (2009)
Abdelbary, H., El-Korany, A.: Semantic topics modeling approach for community detection. Int. J. Comput. Appl. 81(6), 50–58 (2013)
Nguyen, T., Phung, D., Adams, B., et al.: Hyper-community detection in the blogosphere. In: Proceedings of Second ACM SIGMM Workshop on Social Media, pp. 21–26. ACM (2010)
Donetti, L., Munoz, M.A.: Detecting network communities: a new systematic and efficient algorithm. J. Stat. Mech: Theory Exp. 2004(10), P10012 (2004)
Gulikers, L., Lelarge, M., Massoulié, L.: A spectral method for community detection in moderately sparse degree-corrected stochastic block models. Adv. Appl. Probab. 49(3), 686–721 (2017)
Zhang, X., Newman, M.E.J.: Multiway spectral community detection in networks. Phys. Rev. E 92(5), 052808 (2015)
Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
Zhang, Q., Qiu, Q., Guo, W., et al.: A social community detection algorithm based on parallel grey label propagation. Comput. Netw. 107(P1), 133–143 (2016)
Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)
Zaharia, M., Chowdhury, M., Franklin, M.J., et al.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)
Acknowledgements
The work is supported by the National Key Research and Development Plan under grant No. 2016YFB 1000600.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, D., Li, J. (2019). PGCAS: A Parallelized Graph Clustering Algorithm Based on Spark. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-28061-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28060-4
Online ISBN: 978-3-030-28061-1
eBook Packages: Computer ScienceComputer Science (R0)