Advertisement

The VLDB Journal

, Volume 28, Issue 6, pp 871–896 | Cite as

Efficient distributed reachability querying of massive temporal graphs

  • Tianming Zhang
  • Yunjun GaoEmail author
  • Lu Chen
  • Wei Guo
  • Shiliang Pu
  • Baihua Zheng
  • Christian S. Jensen
Regular Paper
  • 123 Downloads

Abstract

Reachability computation is a fundamental graph functionality with a wide range of applications. In spite of this, little work has as yet been done on efficient reachability queries over temporal graphs, which are used extensively to model time-varying networks, such as communication networks, social networks, and transportation schedule networks. Moreover, we are faced with increasingly large real-world temporal networks that may be distributed across multiple data centers. This state of affairs motivates the paper’s study of efficient reachability queries on distributed temporal graphs. We propose an efficient index, called Temporal Vertex Labeling (TVL), which is a labeling scheme for distributed temporal graphs. We also present algorithms that exploit TVL to achieve efficient support for distributed reachability querying over temporal graphs in Pregel-like systems. The algorithms exploit several optimizations that hinge upon non-trivial lemmas. Extensive experiments using massive real and synthetic temporal graphs are conducted to provide detailed insight into the efficiency and scalability of the proposed methods, covering both index construction and query processing. Compared with the state-of-the-art methods, the TVL based query algorithms are capable of up to an order of magnitude speedup with lower index construction overhead.

Keywords

Graph Reachability Distributed processing Query processing Algorithm 

Notes

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Grant No. 2018YFB1004003, the NSFC under Grant No. 61972338, the NSFC-Zhejiang Joint Fund under Grant No. U1609217, the ZJU-Hikvision Joint Project, and the National Research Foundation, Prime Minister’s Office, Singapore under its International Research Centres in Singapore Funding Initiative. Yunjun Gao is the corresponding author of the work.

References

  1. 1.
    Agrawal, R., Borgida, A., Jagadish, H.V.: Efficient management of transitive relationships in large data and knowledge bases. In: SIGMOD, pp. 253–262 (1989)Google Scholar
  2. 2.
    Batarfi, O., Shawi, R.E., Fayoumi, A.G., Nouri, R., Beheshti, S., Barnawi, A., Sakr, S.: Large scale graph processing systems: survey and an experimental evaluation. Clust. Comput. 18(3), 1189–1213 (2015)CrossRefGoogle Scholar
  3. 3.
    Casteigts, A., Flocchini, P., Quattrociocchi, W., Santoro, N.: Time-varying graphs and dynamic networks. IJPEDS 27(5), 387–408 (2012)Google Scholar
  4. 4.
    Chen, L., Gupta, A., Kurul, M.E.: Stack-based algorithms for pattern matching on dags. In: VLDB, pp. 493–504 (2005)Google Scholar
  5. 5.
    Chen, Y., Chen, Y.: An efficient algorithm for answering graph reachability queries. In: ICDE, pp. 893–902 (2008)Google Scholar
  6. 6.
    Cheng, J., Huang, S., Wu, H., Fu, A.W.: Tf-label: A topological-folding labeling scheme for reachability querying in a large graph. In: SIGMOD, pp. 193–204 (2013)Google Scholar
  7. 7.
    Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  8. 8.
    Fan, W., Wang, X., Wu, Y.: Performance guarantees for distributed reachability queries. PVLDB 5(11), 1304–1315 (2012)Google Scholar
  9. 9.
    Gao, Y., Miao, X., Chen, G., Zheng, B., Cai, D., Cui, H.: On efficiently finding reverse k-nearest neighbors over uncertain graphs. VLDB J. 26(4), 467–492 (2017)CrossRefGoogle Scholar
  10. 10.
    Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: Graphx: Graph processing in a distributed dataflow framework. In: OSDI, pp. 599–613 (2014)Google Scholar
  11. 11.
    Gurajada, S., Theobald, M.: Distributed set reachability. In: SIGMOD, pp. 1247–1261 (2016)Google Scholar
  12. 12.
    Holme, P., Saramäki, J.: Temporal networks. Phys. Rep. 519(3), 97–125 (2012)CrossRefGoogle Scholar
  13. 13.
    Huang, S., Cheng, J., Wu, H.: Temporal graph traversals: definitions, algorithms, and applications. CoRR arxiv:1401.1919 (2014)
  14. 14.
    Huang, S., Fu, A.W., Liu, R.: Minimum spanning trees in temporal graphs. In: SIGMOD, pp. 419–430 (2015)Google Scholar
  15. 15.
    Jagadish, H.V.: A compression technique to materialize transitive closure. ACM Trans. Database Syst. 15(4), 558–598 (1990)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Jin, R., Ruan, N., Dey, S., Yu, J.X.: SCARAB: scaling reachability computation on large graphs. In: SIGMOD, pp. 169–180 (2012)Google Scholar
  17. 17.
    Jin, R., Ruan, N., Xiang, Y., Wang, H.: Path-tree: An efficient reachability indexing scheme for large directed graphs. ACM Trans. Database Syst. 36(1), 7:1–7:44 (2011)CrossRefGoogle Scholar
  18. 18.
    Jin, R., Wang, G.: Simple, fast, and scalable reachability oracle. PVLDB 6(14), 1978–1989 (2013)Google Scholar
  19. 19.
    Jin, R., Xiang, Y., Ruan, N., Wang, H.: Efficiently answering reachability queries on very large directed graphs. In: SIGMOD, pp. 595–608 (2008)Google Scholar
  20. 20.
    Kostakos, V.: Temporal graphs. Phys. A Stat. Mech. Appl. 388(6), 1007–1023 (2009)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Koubarakis, M., Stamou, G.B., Stoilos, G., Horrocks, I., Kolaitis, P.G., Lausen, G., Weikum, G. (eds.): Reasoning Web. Reasoning on the Web in the Big Data Era. Lecture Notes in Computer Science, vol. 8714. Springer (2014)Google Scholar
  22. 22.
    Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)Google Scholar
  23. 23.
    Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A system for large-scale graph processing. In: SIGMOD, pp. 135–146 (2010)Google Scholar
  24. 24.
    Michail, O., Spirakis, P.G.: Traveling salesman problems in temporal graphs. Theor. Comput. Sci. 634, 1–23 (2016)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Nicosia, V., Tang, J.K., Musolesi, M., Russo, G., Mascolo, C., Latora, V.: Components in time-varying graphs. CoRR arxiv:1106.2134 (2011)
  26. 26.
    Pan, R.K., Saramäki, J.: Path lengths, correlations, and centrality in temporal networks. CoRR arxiv:1101.5913 (2011)
  27. 27.
    Redmond, U., Cunningham, P.: Temporal subgraph isomorphism. In: ASONAM, pp. 1451–1452 (2013)Google Scholar
  28. 28.
    Redmond, U., Cunningham, P.: Subgraph isomorphism in temporal networks. CoRR arxiv:1605.02174 (2016)
  29. 29.
    van Schaik, S.J., de Moor, O.: A memory efficient reachability data structure through bit vector compression. In: SIGMOD, pp. 913–924 (2011)Google Scholar
  30. 30.
    Seufert, S., Anand, A., Bedathur, S.J., Weikum, G.: FERRARI: flexible and efficient reachability range assignment for graph indexing. In: ICDE, pp. 1009–1020 (2013)Google Scholar
  31. 31.
    Shao, B., Wang, H., Li, Y.: Trinity: A distributed graph engine on a memory cloud. In: SIGMOD, pp. 505–516 (2013)Google Scholar
  32. 32.
    Su, J., Zhu, Q., Wei, H., Yu, J.X.: Reachability querying: can it be even faster? TKDE 29(3), 683–697 (2017)Google Scholar
  33. 33.
    Tian, Y., Balmin, A., Corsten, S.A., Tatikonda, S., McPherson, J.: From think like a vertex to think like a graph. PVLDB 7(3), 193–204 (2013)Google Scholar
  34. 34.
    Trißl, S., Leser, U.: Fast and practical indexing and querying of very large graphs. In: SIGMOD, pp. 845–856 (2007)Google Scholar
  35. 35.
    Ueno, K., Suzumura, T., Maruyama, N., Fujisawa, K., Matsuoka, S.: Efficient breadth-first search on massively parallel and distributed-memory machines. Data Sci. Eng. 2(1), 22–35 (2017)CrossRefGoogle Scholar
  36. 36.
    Wang, H., He, H., Yang, J., Yu, P.S., Yu, J.X.: Dual labeling: Answering graph reachability queries in constant time. In: ICDE, p. 75 (2006)Google Scholar
  37. 37.
    Wang, S., Lin, W., Yang, Y., Xiao, X., Zhou, S.: Efficient route planning on public transportation networks: a labelling approach. In: SIGMOD, pp. 967–982 (2015)Google Scholar
  38. 38.
    Wei, H., Yu, J.X., Lu, C., Jin, R.: Reachability querying: an independent permutation labeling approach. PVLDB 7(12), 1191–1202 (2014)Google Scholar
  39. 39.
    Wu, H., Cheng, J., Huang, S., Ke, Y., Lu, Y., Xu, Y.: Path problems in temporal graphs. PVLDB 7(9), 721–732 (2014)Google Scholar
  40. 40.
    Wu, H., Huang, Y., Cheng, J., Li, J., Ke, Y.: Efficient processing of reachability and time-based path queries in a temporal graph. CoRR arxiv:1601.05909 (2016)
  41. 41.
    Wu, H., Huang, Y., Cheng, J., Li, J., Ke, Y.: Reachability and time-based path queries in temporal graphs. In: ICDE, pp. 145–156 (2016)Google Scholar
  42. 42.
    Yan, D., Cheng, J., Lu, Y., Ng, W.: Blogel: a block-centric framework for distributed computation on real-world graphs. PVLDB 7(14), 1981–1992 (2014)Google Scholar
  43. 43.
    Yan, D., Cheng, J., Lu, Y., Ng, W.: Effective techniques for message reduction and load balancing in distributed graph computation. In: WWW, pp. 1307–1317 (2015)Google Scholar
  44. 44.
    Yan, D., Tian, Y., Cheng, J.: Systems for Big Graph Analytics. Springer Briefs in Computer Science. Springer, Berlin (2017)CrossRefGoogle Scholar
  45. 45.
    Yang, Y., Yan, D., Wu, H., Cheng, J., Zhou, S., Lui, J.C.S.: Diversified temporal subgraph pattern mining. In: SIGKDD, pp. 1965–1974 (2016)Google Scholar
  46. 46.
    Yano, Y., Akiba, T., Iwata, Y., Yoshida, Y.: Fast and scalable reachability queries on graphs by pruned labeling with landmarks and paths. In: CIKM, pp. 1601–1606 (2013)Google Scholar
  47. 47.
    Yildirim, H., Chaoji, V., Zaki, M.J.: GRAIL: a scalable index for reachability queries in very large graphs. VLDB J. 21(4), 509–534 (2012)CrossRefGoogle Scholar
  48. 48.
    Yildirim, H., Chaoji, V., Zaki, M.J.: DAGGER: a scalable index for reachability queries in large dynamic graphs. CoRR arxiv:1301.0977 (2013)
  49. 49.
    Yu, J.X., Cheng, J.: Graph reachability queries: a survey. In: Managing and Mining Graph Data, pp. 181–215 (2010)CrossRefGoogle Scholar
  50. 50.
    Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: NSDI, pp. 15–28 (2012)Google Scholar
  51. 51.
    Zhang, X., Chen, L.: Distance-aware selective online query processing over large distributed graphs. Data Sci. Eng. 2(1), 2–21 (2017)CrossRefGoogle Scholar
  52. 52.
    Zhu, A.D., Lin, W., Wang, S., Xiao, X.: Reachability queries on large dynamic graphs: a total order approach. In: SIGMOD, pp. 1323–1334 (2014)Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Tianming Zhang
    • 1
  • Yunjun Gao
    • 1
    • 2
    Email author
  • Lu Chen
    • 3
  • Wei Guo
    • 1
  • Shiliang Pu
    • 4
  • Baihua Zheng
    • 5
  • Christian S. Jensen
    • 3
  1. 1.College of Computer ScienceZhejiang UniversityHangzhouChina
  2. 2.The Key Lab of Big Data Intelligent Computing of Zhejiang ProvinceZhejiang UniversityHangzhouChina
  3. 3.Department of Computer ScienceAalborg UniversityAalborgDenmark
  4. 4.Hangzhou Hikvision Digital Technology Co., Ltd.HangzhouChina
  5. 5.School of Information SystemsSingapore Management UniversitySingaporeSingapore

Personalised recommendations