Abstract
With the rapid development of Internet of Things (IoT), mobile devices, and social networks, our world has become more connected than ever before, resulting in ubiquitous linked data, more generally, graphs. To discover the knowledge from the connective world, graph analysis is the de facto technique. In the consensus study report (National Research Council, Frontiers in massive data analysis. The National Academies Press, Washington, DC, 2013), National Research Council of US National Academies points out that graph analysis is one of the seven major computational methods of massive data analysis. A wide array of applications such as social network analysis, recommendations, semantic web, bioinformatics, intelligence surveillance, and image processing utilize graph analysis techniques to discover helpful insights. However, unlike decades ago, nowadays graphs are large, sparse, and highly dynamic. The classical methods of graph analysis become inefficient or even infeasible. Large-scale graph analysis becomes a problem that both industry and academia trying to solve. In this chapter, we first introduce the background of large-scale graph analysis and briefly review existing solutions; then we introduce three advanced graph analysis tasks which are popular and fundamental but not yet have efficient solutions on large graphs; third, we summarize the research issues of the large-scale graph analysis, especially for the advanced graph analysis tasks. Finally, we present an overview of this book.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
In this book, we use subgraph matching and subgraph enumeration interchangeably.
References
Giraph, https://github.com/apache/giraph.
Konect: the Koblenz network collection. http://konect.uni-koblenz.de/. Accessed: 2019-11-22.
Law: Laboratory for web algorithmics. http://law.di.unimi.it/. Accessed: 2019-11-22.
Snap: Stanford network analysis platform. http://snap.stanford.edu/snap/index.html. Accessed: 2019-11-22.
What is twitter, a social network or a news media? https://an.kaist.ac.kr/traces/WWW2010.html. Accessed: 2019-11-22.
World wide web size. https://www.worldwidewebsize.com/. Accessed: 2019-12-05.
Ashraf Aboulnaga, Jingen Xiang, and Cong Guo. Scalable maximum clique computation using MapReduce. In ICDE, pages 74–85, 2013.
Foto N. Afrati, Dimitris Fotakis, and Jeffrey D. Ullman. Enumerating subgraph instances using map-reduce. In ICDE, pages 62–73, 2013.
Richard D. Alba. A graph-theoretic definition of a sociometric clique. Journal of Mathematical Sociology, pages 3–113, 1973.
J. W. Berry, B. Hendrickson, S. Kahan, and P. Konecny. Software and algorithms for graph queries on multithreaded architectures. In 2007 IEEE International Parallel and Distributed Processing Symposium, pages 1–14, March 2007.
Albert Chan, Frank Dehne, and Ryan Taylor. CGMGRAPH/CGMLIB: Implementing and testing CGM graph algorithms on PC clusters and shared memory machines. In Journal of HPCA, pages 81–97, 2005.
Avery Ching, Sergey Edunov, Maja Kabiljo, Dionysios Logothetis, and Sambavi Muthukrishnan. One trillion edges: Graph processing at Facebook-scale. Proc. VLDB Endow., 8(12):1804–1815, August 2015.
Laukik Chitnis, Anish Das Sarma, Ashwin Machanavajjhala, and Vibhor Rastogi. Finding connected components in map-reduce in logarithmic rounds. In ICDE, pages 50–61, 2013.
Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Rev., 51(4):661–703, November 2009.
Jonathan Cohen. Trusses: Cohesive subgraphs for social network analysis. NSA., pages 1–29, 2008.
Jonathan Cohen. Graph twiddling in a MapReduce world. In Computing in Science and Engg., July 2009.
S. A. Cook. The complexity of theorem proving procedures. In Proceedings of the Third Annual ACM Symposium on the Theory of Computing, pages 151–158, New York, 1971. ACM.
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, OSDI’04, pages 10–10, 2004.
Wenfei Fan, Xin Wang, Yinghui Wu, and Dong Deng. Distributed graph simulation: Impossibility and possibility. Proc. VLDB Endow., 7(12):1083–1094, 2014.
Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: distributed graph-parallel computation on natural graphs. In OSDI, 2012.
Joseph E. Gonzalez, Reynold S. Xin, Ankur Dave, Daniel Crankshaw, Michael J. Franklin, and Ion Stoica. GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, pages 599–613, Berkeley, CA, USA, 2014. USENIX Association.
M. S. Granovetter. The Strength of Weak Ties. volume 78 of Am. J. Sociol., 1973.
Douglas Gregor and Andrew Lumsdaine. The parallel BGL: A generic library for distributed graph computations. In POOSC, 2005.
Safiollah Heidari, Yogesh Simmhan, Rodrigo N. Calheiros, and Rajkumar Buyya. Scalable graph processing frameworks: A taxonomy and open challenges. ACM Comput. Surv., 51(3):60:1–60:53, June 2018.
B. Hendrickson and J. W. Berry. Graph analysis with high-performance computing. Computing in Science Engineering, 10(2):14–19, March 2008.
M. R. Henzinger, T. A. Henzinger, and P. W. Kopke. Computing simulations on finite and infinite graphs. In Proceedings of IEEE 36th Annual Foundations of Computer Science, pages 453–462, Oct 1995.
U. Kang, Hanghang Tong, Jimeng Sun, Ching-Yung Lin, and Christos Faloutsos. GBASE: A scalable and general graph management system. In KDD, pages 1091–1099, 2011.
U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. Pegasus: A peta-scale graph mining system implementation and observations. In ICDM, pages 229–238, 2009.
George Karypis and Vipin Kumar. Parallel multilevel graph partitioning. IPPS, 1996.
George Karypis and Vipin Kumar. Multilevel k-way partitioning scheme for irregular graphs. Journal of Parallel and Distributed Computing, 48(1):96–129, 1998.
Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, and Panos Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, 2013.
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What is Twitter, a social network or a news media? In WWW ’10: Proceedings of the 19th international conference on World wide web, pages 591–600, 2010.
Jure Leskovec, Ajit Singh, and Jon Kleinberg. Patterns of influence in a recommendation network. PAKDD, pages 380–389, 2006.
Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. In VLDB, 2012.
Yi Lu, James Cheng, Da Yan, and Huanhuan Wu. Large-scale distributed graph computing systems: An experimental evaluation. Proc. VLDB Endow., 8(3):281–292, November 2014.
R. Duncan Luce and Albert D. Perry. A method of matrix analysis of group structure. Psychometrika, 1949.
Andrew Lumsdaine, Douglas P. Gregor, Bruce Hendrickson, and Jonathan W. Berry. Challenges in parallel graph processing. Parallel Processing Letters, 17:5–20, 2007.
Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, 2010.
R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: simple building blocks of complex networks. Science (New York, N.Y.), 298(5594):824–827, October 2002.
Robert J. Mokken. Cliques, clubs and clans. volume 13 of Qual. Quant., page 161–173, 1979.
J.W. Moon and L. Moser. On cliques in graphs. Israel J. Math., page 23–28, 1965.
Kameshwar Munagala and Abhiram Ranade. I/o-complexity of graph algorithms. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’99, pages 687–694, 1999.
M. E. J. Newman. Component sizes in networks with arbitrary degree distributions. Phys. Rev. E, 76:045101, Oct 2007.
Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.
Alex Pothen. Graph Partitioning Algorithms with Applications to Scientific Computing, pages 323–368. Springer Netherlands, Dordrecht, 1997.
Semih Salihoglu and Jennifer Widom. GPS: A graph processing system. In SSDBM, 2013.
Stephen B. Seidman. Network structure and minimum degree. volume 5 of Social Networks, pages 269–287, 1983.
Stephen B. Seidman and Brian L. Foster. A graph-theoretic generalization of the clique concept. volume 6, pages 139–154, 1978.
Y. Shao, K. Lei, L. Chen, Z. Huang, B. Cui, Z. Liu, Y. Tong, and J. Xu. Fast parallel path concatenation for graph extraction. IEEE Transactions on Knowledge and Data Engineering, 29(10):2210–2222, Oct 2017.
Yingxia Shao, Lei Chen, and Bin Cui. Efficient cohesive subgraphs detection in parallel. In Proc. of ACM SIGMOD Conference, pages 613–624, 2014.
Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow., 8(8):838–849, April 2015.
Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. Parallel subgraph listing in a large-scale graph. In Proc. of ACM SIGMOD Conference, pages 625–636, 2014.
Isabelle Stanton and Gabriel Kliot. Streaming graph partitioning for large distributed graphs. In Proc. of KDD, pages 1222–1230, 2012.
Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.
Etsuji Tomita, Akira Tanaka, and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor. Comput. Sci., pages 28–42, 2006.
Leslie G. Valiant. A bridging model for parallel computation. Commun. ACM, 33(8):103–111, August 1990.
Stanley Wasserman and Katherine Faust. Social network analysis: Methods and applications. Cambridge University Press, 1994.
Douglas R. White and Frank Harary. The cohesiveness of blocks in social networks: Node connectivity and conditional density. Sociol. Methodol., pages 1–79, 2001.
Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow., 10(5):601–612, January 2017.
Feng Zhao and Anthony K. H. Tung. Large scale cohesive subgraphs discovery for social network visual analysis. PVLDB, pages 85–96, 2013.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Shao, Y., Cui, B., Chen, L. (2020). Introduction. In: Large-scale Graph Analysis: System, Algorithm and Optimization. Big Data Management. Springer, Singapore. https://doi.org/10.1007/978-981-15-3928-2_1
Download citation
DOI: https://doi.org/10.1007/978-981-15-3928-2_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3927-5
Online ISBN: 978-981-15-3928-2
eBook Packages: Computer ScienceComputer Science (R0)