Efficient Parallel Graph Extraction

  • Yingxia Shao
  • Bin Cui
  • Lei Chen
Part of the Big Data Management book series (BIGDM)


In this chapter, we introduce the homogeneous graph extraction task, which extracts homogeneous graphs from the heterogeneous graphs. In an extracted homogeneous graph, the relation is defined by a line pattern on the heterogeneous graph and the new attribute values of the relation are calculated by user-defined aggregate functions. When facing large-scale heterogeneous graphs, the key challenges of the extraction problem are how to efficiently enumerate paths matched by the line pattern and aggregate values for each pair of vertices from the matched paths. To address the above two challenges, we propose a parallel graph extraction framework. The framework compiles the line pattern into a path concatenation plan, which is selected by a cost model. To guarantee the performance of computing aggregate functions, we first classify the aggregate functions into distributive aggregation, algebraic aggregation, and holistic aggregation; then we speed up the distributive and algebraic aggregations by computing partial aggregate values during the path enumeration. The experimental results demonstrate the effectiveness of the proposed graph extraction.


  1. 1.
    Deng Cai, Zheng Shao, Xiaofei He, Xifeng Yan, and Jiawei Han. Community mining from multi-relational networks. In PKDD, pages 445–452, 2005.Google Scholar
  2. 2.
    Chen Chen, X. Yan, Feida Zhu, Jiawei Han, and P.S. Yu. Graph OLAP: Towards online analytical processing on graphs. In ICDM, pages 103–112, 2008.Google Scholar
  3. 3.
    Jörg Flum and Martin Grohe. The parameterized complexity of counting problems. SIAM J. Comput., 33(4):892–922, April 2004.MathSciNetCrossRefGoogle Scholar
  4. 4.
    Joseph E. Gonzalez, Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17–30, 2012.Google Scholar
  5. 5.
    Jim Gray, Adam Bosworth, Andrew Layman, and Hamid Pirahesh. Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-total. In ICDE, pages 152–159, 1996.Google Scholar
  6. 6.
    Xiangnan Kong, Philip S. Yu, Ying Ding, and David J. Wild. Meta path-based collective classification in heterogeneous information networks. In CIKM, pages 1567–1571, 2012.Google Scholar
  7. 7.
    Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. Pregel: A system for large-scale graph processing. In SIGMOD, pages 135–146, 2010.Google Scholar
  8. 8.
    Arnab Nandi, Cong Yu, Philip Bohannon, and Raghu Ramakrishnan. Distributed cube materialization on holistic measures. In ICDE, pages 183–194, 2011.Google Scholar
  9. 9.
    Mark Newman. Networks: An Introduction. Oxford University Press, Inc., New York, NY, USA, 2010.CrossRefGoogle Scholar
  10. 10.
    Maurizio Nolé and Carlo Sartiani. Processing regular path queries on Giraph. In EDBT/ICDT, pages 37–40, 2014.Google Scholar
  11. 11.
    Makoto Onizuka, Toshimasa Fujimori, and Hiroaki Shiokawa. Graph partitioning for distributed graph processing. Data Science and Engineering, 2(1):94–105, 2017.CrossRefGoogle Scholar
  12. 12.
    T. Pitoura and P. Triantafillou. Self-join size estimation in large-scale distributed data systems. In ICDE, pages 764–773, 2008.Google Scholar
  13. 13.
    Marko A. Rodriguez and Joshua Shinavier. Exposing multi-relational networks to single-relational network analysis algorithms. Journal of Informetrics, 4(1):29–41, 2010.CrossRefGoogle Scholar
  14. 14.
    Yingxia Shao, Lei Chen, and Bin Cui. Efficient cohesive subgraphs detection in parallel. In Proc. of ACM SIGMOD Conference, pages 613–624, 2014.Google Scholar
  15. 15.
    Yingxia Shao, Bin Cui, Lei Chen, Mingming Liu, and Xing Xie. An efficient similarity search framework for SimRank over large dynamic graphs. Proc. VLDB Endow., 8(8):838–849, April 2015.CrossRefGoogle Scholar
  16. 16.
    Yingxia Shao, Bin Cui, Lei Chen, Lin Ma, Junjie Yao, and Ning Xu. Parallel subgraph listing in a large-scale graph. In Proc. of ACM SIGMOD Conference, pages 625–636, 2014.Google Scholar
  17. 17.
    Yingxia Shao, Bin Cui, and Lin Ma. Page: A partition aware engine for parallel graph computation. TKDE, 27(2):518–530, Feb 2015.Google Scholar
  18. 18.
    Yizhou Sun and Jiawei Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.Google Scholar
  19. 19.
    Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In VLDB, pages 992–1003, 2011.Google Scholar
  20. 20.
    Zhengkui Wang, Qi Fan, Huiju Wang, Kian-Lee Tan, D. Agrawal, and A. El Abbadi. Pagrol: Parallel graph OLAP over large-scale attributed graphs. In ICDE, pages 496–507, 2014.Google Scholar
  21. 21.
    Zhipeng Zhang, Yingxia Shao, Bin Cui, and Ce Zhang. An experimental evaluation of SimRank-based similarity search algorithms. Proc. VLDB Endow., 10(5):601–612, January 2017.CrossRefGoogle Scholar
  22. 22.
    Peixiang Zhao, Xiaolei Li, Dong Xin, and Jiawei Han. Graph cube: On warehousing and OLAP multidimensional networks. In SIGMOD, pages 853–864, 2011.Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.School of Computer ScienceBeijing University of Posts and Telecommunications BeijingBeijingChina
  2. 2.School of Electronics Engineering and Computer SciencePeking University BeijingBeijingChina
  3. 3.Department of Computer Science and EngineeringHong Kong University of Science and TechnologyHong KongChina

Personalised recommendations