Skip to main content

PGCAS: A Parallelized Graph Clustering Algorithm Based on Spark

  • Conference paper
  • First Online:
Big Scientific Data Management (BigSDM 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11473))

Included in the following conference series:

  • 735 Accesses

Abstract

Nowadays plenty of data are in graph format. For example, knowledge graph use vertices to represent entities and use edges to represent relations between entities; graph data in microbiology contain microorganisms and relations between them etc. So information can be obtained by graph mining from these data. Graph clustering is a part of graph mining. Recent years, many graph clustering algorithms have been proposed. But most of them are Sequential Algorithms. So they cannot run in distributed environment. In this case the volume of data that can be processed by the algorithms is limited. In this paper we propose a new parallelized graph clustering algorithm based on Spark. And some methods have been adopted in the algorithm to improve its running speed. From the experimental results we can find that the proposed algorithm is better than the parallelized graph clustering algorithm for comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Huttlin, E.L., Bruckner, R.J., Paulo, J.A., et al.: Architecture of the human interactome defines protein communities and disease networks. Nature 545(7655), 505 (2017)

    Article  Google Scholar 

  2. Vehlow, C., Beck, F., Auwärter, P., et al.: Visualizing the evolution of communities in dynamic graphs. Comput. Graphics Forum 34(1), 277–288 (2015)

    Article  Google Scholar 

  3. Krishnamurthy, B., Wang, J.: On network-aware clustering of web clients. ACM SIGCOMM Comput. Commun. Rev. 30(4), 97–110 (2000)

    Article  Google Scholar 

  4. Wickramaarachchi, C., Frincu, M., Small, P., et al.: Fast parallel algorithm for unfolding of communities in large graphs. In: High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)

    Google Scholar 

  5. Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)

    Article  MathSciNet  Google Scholar 

  6. Moon, S., Lee, J.G., Kang, M.: Scalable community detection from networks by computing edge betweenness on MapReduce. In: International Conference on Big Data and Smart Computing (BIGCOMP), pp. 145–148. IEEE (2014)

    Google Scholar 

  7. Shi, J., Xue, W., Wang, W., et al.: Scalable community detection in massive social networks using MapReduce. IBM J. Res. Dev. 57(3/4), 1–12 (2013)

    Article  Google Scholar 

  8. Chen, Y., Huang, C., Zhai, K.: Scalable community detection algorithm with MapReduce. Commun. ACM 53, 359–366 (2009)

    Google Scholar 

  9. Girvan, M., Newman, M.E.J.: Community structure in social and biological networks. Proc. Nat. Acad. Sci. 99(12), 7821–7826 (2002)

    Article  MathSciNet  Google Scholar 

  10. Peel, L., Larremore, D.B., Clauset, A.: The ground truth about metadata and community detection in networks. Sci. Adv. 3(5), e1602548 (2017)

    Article  Google Scholar 

  11. Lancichinetti, A., Fortunato, S., Kertsz, J.: Detecting the overlapping and hierarchical community structure in complex networks. New J. Phys. 11(3), 033015 (2009)

    Article  Google Scholar 

  12. Abdelbary, H., El-Korany, A.: Semantic topics modeling approach for community detection. Int. J. Comput. Appl. 81(6), 50–58 (2013)

    Google Scholar 

  13. Nguyen, T., Phung, D., Adams, B., et al.: Hyper-community detection in the blogosphere. In: Proceedings of Second ACM SIGMM Workshop on Social Media, pp. 21–26. ACM (2010)

    Google Scholar 

  14. Donetti, L., Munoz, M.A.: Detecting network communities: a new systematic and efficient algorithm. J. Stat. Mech: Theory Exp. 2004(10), P10012 (2004)

    Article  Google Scholar 

  15. Gulikers, L., Lelarge, M., Massoulié, L.: A spectral method for community detection in moderately sparse degree-corrected stochastic block models. Adv. Appl. Probab. 49(3), 686–721 (2017)

    Article  MathSciNet  Google Scholar 

  16. Zhang, X., Newman, M.E.J.: Multiway spectral community detection in networks. Phys. Rev. E 92(5), 052808 (2015)

    Article  Google Scholar 

  17. Golub, G.H., Van Loan, C.F.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  18. Zhang, Q., Qiu, Q., Guo, W., et al.: A social community detection algorithm based on parallel grey label propagation. Comput. Netw. 107(P1), 133–143 (2016)

    Article  Google Scholar 

  19. Newman, M.E.J., Girvan, M.: Finding and evaluating community structure in networks. Phys. Rev. E 69(2), 026113 (2004)

    Article  Google Scholar 

  20. Zaharia, M., Chowdhury, M., Franklin, M.J., et al.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)

    Google Scholar 

Download references

Acknowledgements

The work is supported by the National Key Research and Development Plan under grant No. 2016YFB 1000600.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dongjiang Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, D., Li, J. (2019). PGCAS: A Parallelized Graph Clustering Algorithm Based on Spark. In: Li, J., Meng, X., Zhang, Y., Cui, W., Du, Z. (eds) Big Scientific Data Management. BigSDM 2018. Lecture Notes in Computer Science(), vol 11473. Springer, Cham. https://doi.org/10.1007/978-3-030-28061-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-28061-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-28060-4

  • Online ISBN: 978-3-030-28061-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics