Skip to main content

Part of the book series: Big Data Management ((BIGDM))

  • 709 Accesses

Abstract

With the rapid development of Internet of Things (IoT), mobile devices, and social networks, our world has become more connected than ever before, resulting in ubiquitous linked data, more generally, graphs. To discover the knowledge from the connective world, graph analysis is the de facto technique. In the consensus study report (National Research Council, Frontiers in massive data analysis. The National Academies Press, Washington, DC, 2013), National Research Council of US National Academies points out that graph analysis is one of the seven major computational methods of massive data analysis. A wide array of applications such as social network analysis, recommendations, semantic web, bioinformatics, intelligence surveillance, and image processing utilize graph analysis techniques to discover helpful insights. However, unlike decades ago, nowadays graphs are large, sparse, and highly dynamic. The classical methods of graph analysis become inefficient or even infeasible. Large-scale graph analysis becomes a problem that both industry and academia trying to solve. In this chapter, we first introduce the background of large-scale graph analysis and briefly review existing solutions; then we introduce three advanced graph analysis tasks which are popular and fundamental but not yet have efficient solutions on large graphs; third, we summarize the research issues of the large-scale graph analysis, especially for the advanced graph analysis tasks. Finally, we present an overview of this book.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://twitter.com/.

  2. 2.

    http://law.di.unimi.it/webdata/uk-2007-05/.

  3. 3.

    http://snap.stanford.edu/biodata/datasets/10028/10028-PP-Miner.html.

  4. 4.

    In this book, we use subgraph matching and subgraph enumeration interchangeably.

References

  1. Giraph, https://github.com/apache/giraph.

  2. Konect: the Koblenz network collection. http://konect.uni-koblenz.de/. Accessed: 2019-11-22.

  3. Law: Laboratory for web algorithmics. http://law.di.unimi.it/. Accessed: 2019-11-22.

  4. Snap: Stanford network analysis platform. http://snap.stanford.edu/snap/index.html. Accessed: 2019-11-22.

  5. What is twitter, a social network or a news media? https://an.kaist.ac.kr/traces/WWW2010.html. Accessed: 2019-11-22.

  6. World wide web size. https://www.worldwidewebsize.com/. Accessed: 2019-12-05.

  7. Ashraf Aboulnaga, Jingen Xiang, and Cong Guo. Scalable maximum clique computation using MapReduce. In ICDE, pages 74–85, 2013.

    Google Scholar 

  8. Foto N. Afrati, Dimitris Fotakis, and Jeffrey D. Ullman. Enumerating subgraph instances using map-reduce. In ICDE, pages 62–73, 2013.

    Google Scholar 

  9. Richard D. Alba. A graph-theoretic definition of a sociometric clique. Journal of Mathematical Sociology, pages 3–113, 1973.

    Google Scholar 

  10. J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In 2007 IEEE International Parallel and Distributed Processing Symposium, pages 1–14, March 2007.

    Google Scholar 

  11. Albert Chan, Frank Dehne, and Ryan Taylor. CGMGRAPH/CGMLIB: Implementing and testing CGM graph algorithms on PC clusters and shared memory machines. In Journal of HPCA, pages 81–97, 2005.

    Google Scholar 

  12. Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. One trillion edges: Graph processing at Facebook-scale. Proc. VLDB Endow., 8(12):1804–1815, August 2015.

    Article  Google Scholar 

  13. Laukik Chitnis, Anish Das Sarma, Ashwin Machanavajjhala, and Vibhor Rastogi. Finding connected components in map-reduce in logarithmic rounds. In ICDE, pages 50–61, 2013.

    Google Scholar 

  14. Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51(4):661–703, November 2009.

    Google Scholar 

  15. Jonathan Cohen. Trusses: Cohesive subgraphs for social network analysis. NSA., pages 1–29, 2008.

    Google Scholar 

  16. Jonathan Cohen. Graph twiddling in a MapReduce world. In Computing in Science and Engg., July 2009.

    Google Scholar 

  17. S. A. Cook. The complexity of theorem proving procedures. In Proceedings of the Third Annual ACM Symposium on the Theory of Computing, pages 151–158, New York, 1971. ACM.

    Google Scholar 

  18. Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI’04, pages 10–10, 2004.

    Google Scholar 

  19. Wenfei Fan, Xin Wang, Yinghui Wu, and Dong Deng. Distributed graph simulation: Impossibility and possibility. Proc. VLDB Endow., 7(12):1083–1094, 2014.

    Article  Google Scholar 

  20. Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: distributed graph-parallel computation on natural graphs. In OSDI, 2012.

    Google Scholar 

  21. Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, pages 599–613, Berkeley, CA, USA, 2014. USENIX Association.

    Google Scholar 

  22. M. S. Granovetter. The Strength of Weak Ties. volume 78 of Am. J. Sociol., 1973.

    Google Scholar 

  23. Douglas Gregor and Andrew Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In POOSC, 2005.

    Google Scholar 

  24. Safiollah Heidari, Yogesh Simmhan, Rodrigo N. Calheiros, and Rajkumar Buyya. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Comput. Surv., 51(3):60:1–60:53, June 2018.

    Google Scholar 

  25. B. Hendrickson and J. W. Berry. Graph analysis with high-performance computing. Computing in Science Engineering, 10(2):14–19, March 2008.

    Article  Google Scholar 

  26. M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 453–462, Oct 1995.

    Google Scholar 

  27. U. Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. GBASE: A scalable and general graph management system. In KDD, pages 1091–1099, 2011.

    Google Scholar 

  28. U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM, pages 229–238, 2009.

    Google Scholar 

  29. George Karypis and Vipin Kumar. Parallel multilevel graph partitioning. IPPS, 1996.

    Google Scholar 

  30. George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96–129, 1998.

    Article  Google Scholar 

  31. Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, 2013.

    Google Scholar 

  32. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, pages 591–600, 2010.

    Google Scholar 

  33. Jure Leskovec, Ajit Singh, and Jon Kleinberg. Patterns of influence in a recommendation network. PAKDD, pages 380–389, 2006.

    Google Scholar 

  34. Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In VLDB, 2012.

    Google Scholar 

  35. Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. Large-scale distributed graph computing systems: An experimental evaluation. Proc. VLDB Endow., 8(3):281–292, November 2014.

    Article  Google Scholar 

  36. R. Duncan Luce and Albert D. Perry. A method of matrix analysis of group structure. Psychometrika, 1949.

    Google Scholar 

  37. Andrew Lumsdaine, Douglas P. Gregor, Bruce Hendrickson, and Jonathan W. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17:5–20, 2007.

    Article  MathSciNet  Google Scholar 

  38. Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.

    Google Scholar 

  39. R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science (New York, N.Y.), 298(5594):824–827, October 2002.

    Article  Google Scholar 

  40. Robert J. Mokken. Cliques, clubs and clans. volume 13 of Qual. Quant., page 161–173, 1979.

    Google Scholar 

  41. J.W. Moon and L. Moser. On cliques in graphs. Israel J. Math., page 23–28, 1965.

    Google Scholar 

  42. Kameshwar Munagala and Abhiram Ranade. I/o-complexity of graph algorithms. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’99, pages 687–694, 1999.

    Google Scholar 

  43. M. E. J. Newman. Component sizes in networks with arbitrary degree distributions. Phys. Rev. E, 76:045101, Oct 2007.

    Article  Google Scholar 

  44. Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.

    Book  Google Scholar 

  45. Alex Pothen. Graph Partitioning Algorithms with Applications to Scientific Computing, pages 323–368. Springer Netherlands, Dordrecht, 1997.

    MATH  Google Scholar 

  46. Semih Salihoglu and Jennifer Widom. GPS: A graph processing system. In SSDBM, 2013.

    Google Scholar 

  47. Stephen B. Seidman. Network structure and minimum degree. volume 5 of Social Networks, pages 269–287, 1983.

    Google Scholar 

  48. Stephen B. Seidman and Brian L. Foster. A graph-theoretic generalization of the clique concept. volume 6, pages 139–154, 1978.

    Google Scholar 

  49. Y. Shao, K. Lei, L. Chen, Z. Huang, B. Cui, Z. Liu, Y. Tong, and J. Xu. Fast parallel path concatenation for graph extraction. IEEE Transactions on Knowledge and Data Engineering, 29(10):2210–2222, Oct 2017.

    Article  Google Scholar 

  50. Yingxia Shao, Lei Chen, and Bin Cui. Efficient cohesive subgraphs detection in parallel. In Proc. of ACM SIGMOD Conference, pages 613–624, 2014.

    Google Scholar 

  51. Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow., 8(8):838–849, April 2015.

    Article  Google Scholar 

  52. Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. Parallel subgraph listing in a large-scale graph. In Proc. of ACM SIGMOD Conference, pages 625–636, 2014.

    Google Scholar 

  53. Isabelle Stanton and Gabriel Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of KDD, pages 1222–1230, 2012.

    Google Scholar 

  54. Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.

    Google Scholar 

  55. Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci., pages 28–42, 2006.

    Google Scholar 

  56. Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990.

    Article  Google Scholar 

  57. Stanley Wasserman and Katherine Faust. Social network analysis: Methods and applications. Cambridge University Press, 1994.

    Google Scholar 

  58. Douglas R. White and Frank Harary. The cohesiveness of blocks in social networks: Node connectivity and conditional density. Sociol. Methodol., pages 1–79, 2001.

    Google Scholar 

  59. Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow., 10(5):601–612, January 2017.

    Article  Google Scholar 

  60. Feng Zhao and Anthony K. H. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, pages 85–96, 2013.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Shao, Y., Cui, B., Chen, L. (2020). Introduction. In: Large-scale Graph Analysis: System, Algorithm and Optimization. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-15-3928-2_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-3928-2_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-3927-5

  • Online ISBN: 978-981-15-3928-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics