• Yingxia Shao
  • Bin Cui
  • Lei Chen
Part of the Big Data Management book series (BIGDM)


With the rapid development of Internet of Things (IoT), mobile devices, and social networks, our world has become more connected than ever before, resulting in ubiquitous linked data, more generally, graphs. To discover the knowledge from the connective world, graph analysis is the de facto technique. In the consensus study report (National Research Council, Frontiers in massive data analysis. The National Academies Press, Washington, DC, 2013), National Research Council of US National Academies points out that graph analysis is one of the seven major computational methods of massive data analysis. A wide array of applications such as social network analysis, recommendations, semantic web, bioinformatics, intelligence surveillance, and image processing utilize graph analysis techniques to discover helpful insights. However, unlike decades ago, nowadays graphs are large, sparse, and highly dynamic. The classical methods of graph analysis become inefficient or even infeasible. Large-scale graph analysis becomes a problem that both industry and academia trying to solve. In this chapter, we first introduce the background of large-scale graph analysis and briefly review existing solutions; then we introduce three advanced graph analysis tasks which are popular and fundamental but not yet have efficient solutions on large graphs; third, we summarize the research issues of the large-scale graph analysis, especially for the advanced graph analysis tasks. Finally, we present an overview of this book.


  1. 1.
  2. 2.
    Konect: the Koblenz network collection. Accessed: 2019-11-22.
  3. 3.
    Law: Laboratory for web algorithmics. Accessed: 2019-11-22.
  4. 4.
    Snap: Stanford network analysis platform. Accessed: 2019-11-22.
  5. 5.
    What is twitter, a social network or a news media? Accessed: 2019-11-22.
  6. 6.
    World wide web size. Accessed: 2019-12-05.
  7. 7.
    Ashraf Aboulnaga, Jingen Xiang, and Cong Guo. Scalable maximum clique computation using MapReduce. In ICDE, pages 74–85, 2013.Google Scholar
  8. 8.
    Foto N. Afrati, Dimitris Fotakis, and Jeffrey D. Ullman. Enumerating subgraph instances using map-reduce. In ICDE, pages 62–73, 2013.Google Scholar
  9. 9.
    Richard D. Alba. A graph-theoretic definition of a sociometric clique. Journal of Mathematical Sociology, pages 3–113, 1973.Google Scholar
  10. 10.
    J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In 2007 IEEE International Parallel and Distributed Processing Symposium, pages 1–14, March 2007.Google Scholar
  11. 11.
    Albert Chan, Frank Dehne, and Ryan Taylor. CGMGRAPH/CGMLIB: Implementing and testing CGM graph algorithms on PC clusters and shared memory machines. In Journal of HPCA, pages 81–97, 2005.Google Scholar
  12. 12.
    Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. One trillion edges: Graph processing at Facebook-scale. Proc. VLDB Endow., 8(12):1804–1815, August 2015.CrossRefGoogle Scholar
  13. 13.
    Laukik Chitnis, Anish Das Sarma, Ashwin Machanavajjhala, and Vibhor Rastogi. Finding connected components in map-reduce in logarithmic rounds. In ICDE, pages 50–61, 2013.Google Scholar
  14. 14.
    Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51(4):661–703, November 2009.Google Scholar
  15. 15.
    Jonathan Cohen. Trusses: Cohesive subgraphs for social network analysis. NSA., pages 1–29, 2008.Google Scholar
  16. 16.
    Jonathan Cohen. Graph twiddling in a MapReduce world. In Computing in Science and Engg., July 2009.Google Scholar
  17. 17.
    S. A. Cook. The complexity of theorem proving procedures. In Proceedings of the Third Annual ACM Symposium on the Theory of Computing, pages 151–158, New York, 1971. ACM.Google Scholar
  18. 18.
    Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI’04, pages 10–10, 2004.Google Scholar
  19. 19.
    Wenfei Fan, Xin Wang, Yinghui Wu, and Dong Deng. Distributed graph simulation: Impossibility and possibility. Proc. VLDB Endow., 7(12):1083–1094, 2014.CrossRefGoogle Scholar
  20. 20.
    Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: distributed graph-parallel computation on natural graphs. In OSDI, 2012.Google Scholar
  21. 21.
    Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, pages 599–613, Berkeley, CA, USA, 2014. USENIX Association.Google Scholar
  22. 22.
    M. S. Granovetter. The Strength of Weak Ties. volume 78 of Am. J. Sociol., 1973.Google Scholar
  23. 23.
    Douglas Gregor and Andrew Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In POOSC, 2005.Google Scholar
  24. 24.
    Safiollah Heidari, Yogesh Simmhan, Rodrigo N. Calheiros, and Rajkumar Buyya. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Comput. Surv., 51(3):60:1–60:53, June 2018.Google Scholar
  25. 25.
    B. Hendrickson and J. W. Berry. Graph analysis with high-performance computing. Computing in Science Engineering, 10(2):14–19, March 2008.CrossRefGoogle Scholar
  26. 26.
    M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 453–462, Oct 1995.Google Scholar
  27. 27.
    U. Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. GBASE: A scalable and general graph management system. In KDD, pages 1091–1099, 2011.Google Scholar
  28. 28.
    U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM, pages 229–238, 2009.Google Scholar
  29. 29.
    George Karypis and Vipin Kumar. Parallel multilevel graph partitioning. IPPS, 1996.Google Scholar
  30. 30.
    George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96–129, 1998.CrossRefGoogle Scholar
  31. 31.
    Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, 2013.Google Scholar
  32. 32.
    Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, pages 591–600, 2010.Google Scholar
  33. 33.
    Jure Leskovec, Ajit Singh, and Jon Kleinberg. Patterns of influence in a recommendation network. PAKDD, pages 380–389, 2006.Google Scholar
  34. 34.
    Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In VLDB, 2012.Google Scholar
  35. 35.
    Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. Large-scale distributed graph computing systems: An experimental evaluation. Proc. VLDB Endow., 8(3):281–292, November 2014.CrossRefGoogle Scholar
  36. 36.
    R. Duncan Luce and Albert D. Perry. A method of matrix analysis of group structure. Psychometrika, 1949.Google Scholar
  37. 37.
    Andrew Lumsdaine, Douglas P. Gregor, Bruce Hendrickson, and Jonathan W. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17:5–20, 2007.MathSciNetCrossRefGoogle Scholar
  38. 38.
    Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.Google Scholar
  39. 39.
    R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science (New York, N.Y.), 298(5594):824–827, October 2002.CrossRefGoogle Scholar
  40. 40.
    Robert J. Mokken. Cliques, clubs and clans. volume 13 of Qual. Quant., page 161–173, 1979.Google Scholar
  41. 41.
    J.W. Moon and L. Moser. On cliques in graphs. Israel J. Math., page 23–28, 1965.Google Scholar
  42. 42.
    Kameshwar Munagala and Abhiram Ranade. I/o-complexity of graph algorithms. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’99, pages 687–694, 1999.Google Scholar
  43. 43.
    M. E. J. Newman. Component sizes in networks with arbitrary degree distributions. Phys. Rev. E, 76:045101, Oct 2007.CrossRefGoogle Scholar
  44. 44.
    Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.CrossRefGoogle Scholar
  45. 45.
    Alex Pothen. Graph Partitioning Algorithms with Applications to Scientific Computing, pages 323–368. Springer Netherlands, Dordrecht, 1997.zbMATHGoogle Scholar
  46. 46.
    Semih Salihoglu and Jennifer Widom. GPS: A graph processing system. In SSDBM, 2013.Google Scholar
  47. 47.
    Stephen B. Seidman. Network structure and minimum degree. volume 5 of Social Networks, pages 269–287, 1983.Google Scholar
  48. 48.
    Stephen B. Seidman and Brian L. Foster. A graph-theoretic generalization of the clique concept. volume 6, pages 139–154, 1978.Google Scholar
  49. 49.
    Y. Shao, K. Lei, L. Chen, Z. Huang, B. Cui, Z. Liu, Y. Tong, and J. Xu. Fast parallel path concatenation for graph extraction. IEEE Transactions on Knowledge and Data Engineering, 29(10):2210–2222, Oct 2017.CrossRefGoogle Scholar
  50. 50.
    Yingxia Shao, Lei Chen, and Bin Cui. Efficient cohesive subgraphs detection in parallel. In Proc. of ACM SIGMOD Conference, pages 613–624, 2014.Google Scholar
  51. 51.
    Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow., 8(8):838–849, April 2015.CrossRefGoogle Scholar
  52. 52.
    Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. Parallel subgraph listing in a large-scale graph. In Proc. of ACM SIGMOD Conference, pages 625–636, 2014.Google Scholar
  53. 53.
    Isabelle Stanton and Gabriel Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of KDD, pages 1222–1230, 2012.Google Scholar
  54. 54.
    Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.Google Scholar
  55. 55.
    Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci., pages 28–42, 2006.Google Scholar
  56. 56.
    Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990.CrossRefGoogle Scholar
  57. 57.
    Stanley Wasserman and Katherine Faust. Social network analysis: Methods and applications. Cambridge University Press, 1994.Google Scholar
  58. 58.
    Douglas R. White and Frank Harary. The cohesiveness of blocks in social networks: Node connectivity and conditional density. Sociol. Methodol., pages 1–79, 2001.Google Scholar
  59. 59.
    Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow., 10(5):601–612, January 2017.CrossRefGoogle Scholar
  60. 60.
    Feng Zhao and Anthony K. H. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, pages 85–96, 2013.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Computer ScienceBeijing University of Posts and Telecommunications BeijingBeijingChina
  2. 2.School of Electronics Engineering and Computer SciencePeking University BeijingBeijingChina
  3. 3.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyHong KongChina

Personalised recommendations