Abstract
We first introduce Google’s Pregel, a framework for distributed graph computation that pioneered the idea of vertex-centric programming. In the vertex-centric model, a user writes a distributed graph algorithm simply by specifying the behavior of a generic vertex. In recent years, many Pregel-like systems have been developed, and we will investigate how these systems improve on the basic framework of Pregel. The background knowledge to be introduced in this chapter lays the foundation of hands-on experiencing (in the next chapter) of our Pregel-like system implementation, BigGraph@CUHK, that is both highly efficient, and very suitable for educational and research purposes due to its neat design.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Writing to a DFS generates network overhead since each data block are replicated to multiple machines to tolerate machine failures.
- 2.
- 3.
- 4.
- 5.
- 6.
References
S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International World-Wide Web Conference (WWW), pages 107–117, 1998.
Y. Bu, V. R. Borkar, J. Jia, M. J. Carey, and T. Condie. Pregelix: Big(ger) graph analytics on a dataflow engine. PVLDB, 8(2):161–172, 2014.
Y. Bu, V. R. Borkar, G. H. Xu, and M. J. Carey. A bloat-aware design for big data applications. In ISMM, pages 119–130, 2013.
K. M. Chandy and L. Lamport. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst., 3(1):63–75, 1985.
A. Ching, S. Edunov, M. Kabiljo, D. Logothetis, and S. Muthukrishnan. One trillion edges: Graph processing at facebook-scale. PVLDB, 8(12):1804–1815, 2015.
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, pages 137–150, 2004.
E. N. M. Elnozahy, L. Alvisi, Y.-M. Wang, and D. B. Johnson. A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv., 34(3):375–408, Sept. 2002.
S. Ghemawat, H. Gobioff, and S. Leung. The Google file system. In SOSP, pages 29–43, 2003.
J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. Powergraph: Distributed graph-parallel computation on natural graphs. In OSDI, pages 17–30, 2012.
J. E. Gonzalez, R. S. Xin, A. Dave, D. Crankshaw, M. J. Franklin, and I. Stoica. Graphx: Graph processing in a distributed dataflow framework. In OSDI, pages 599–613, 2014.
U. Kang, C. E. Tsourakakis, and C. Faloutsos. PEGASUS: A peta-scale graph mining system. In ICDM, pages 229–238, 2009.
Z. Khayyat, K. Awara, A. Alonazi, H. Jamjoom, D. Williams, and P. Kalnis. Mizan: a system for dynamic load balancing in large-scale graph processing. In EuroSys, pages 169–182, 2013.
A. Kyrola, G. E. Blelloch, and C. Guestrin. GraphChi: Large-scale graph computation on just a PC. In OSDI, pages 31–46, 2012.
J. Lin and M. Schatz. Design patterns for efficient graph algorithms in mapreduce. In MLG, pages 78–85. ACM, 2010.
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed GraphLab: A framework for machine learning in the cloud. PVLDB, 5(8):716–727, 2012.
G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In SIGMOD Conference, pages 135–146, 2010.
L. Quick, P. Wilkinson, and D. Hardcastle. Using pregel-like large scale graph processing frameworks for social network analysis. In ASONAM, pages 457–463, 2012.
A. Roy, L. Bindschaedler, J. Malicevic, and W. Zwaenepoel. Chaos: scale-out graph processing from secondary storage. In SOSP, pages 410–424, 2015.
A. Roy, I. Mihailovic, and W. Zwaenepoel. X-stream: edge-centric graph processing using streaming partitions. In SOSP, pages 472–488, 2013.
S. Salihoglu and J. Widom. GPS: a graph processing system. In SSDBM, pages 22:1–22:12, 2013.
S. Salihoglu and J. Widom. Optimizing graph algorithms on pregel-like systems. PVLDB, 7(7):577–588, 2014.
S. Schelter, S. Ewen, K. Tzoumas, and V. Markl. “All roads lead to Rome”: optimistic recovery for distributed iterative data processing. In CIKM, pages 1919–1928, 2013.
Z. Shang and J. X. Yu. Catch the wind: Graph workload balancing on cloud. In ICDE, pages 553–564, 2013.
Y. Shao, B. Cui, and L. Ma. PAGE: A partition aware engine for parallel graph computation. IEEE Trans. Knowl. Data Eng., 27(2):518–530, 2015.
Y. Shen, G. Chen, H. V. Jagadish, W. Lu, B. C. Ooi, and B. M. Tudor. Fast failure recovery in distributed graph processing systems. PVLDB, 8(4):437–448, 2014.
Y. Shiloach and U. Vishkin. An o(log n) parallel connectivity algorithm. J. Algorithms, 3(1):57–67, 1982.
P. Wang, K. Zhang, R. Chen, H. Chen, and H. Guan. Replication-based fault-tolerance for large-scale graph processing. In DSN, pages 562–573, 2014.
D. Yan, Y. Bu, Y. Tian, and A. Deshpande. Big graph analytics platforms. Foundations and Trends in Databases, 7(1–2):1–195, 2017.
D. Yan, J. Cheng, Y. Lu, and W. Ng. Effective techniques for message reduction and load balancing in distributed graph computation. In WWW, pages 1307–1317, 2015.
D. Yan, J. Cheng, M. T. Özsu, F. Yang, Y. Lu, J. C. S. Lui, Q. Zhang, and W. Ng. A general-purpose query-centric framework for querying big graphs. PVLDB, 9(7):564–575, 2016.
D. Yan, J. Cheng, K. Xing, Y. Lu, W. Ng, and Y. Bu. Pregel algorithms for graph connectivity problems with performance guarantees. PVLDB, 7(14):1821–1832, 2014.
D. Yan, J. Cheng, and F. Yang. Lightweight fault tolerance in large-scale distributed graph processing. CoRR, abs/1601.06496, 2016.
D. Yan, Y. Huang, J. Cheng, and H. Wu. Efficient processing of very large graphs in a small cluster. CoRR, abs/1601.05590, 2016.
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauly, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In NSDI, pages 15–28, 2012.
C. Zhou, J. Gao, B. Sun, and J. X. Yu. Mocgraph: Scalable distributed graph processing using message online computing. PVLDB, 8(4):377–388, 2014.
X. Zhu, W. Han, and W. Chen. Gridgraph: Large-scale graph processing on a single machine using 2-level hierarchical partitioning. In USENIX ATC, pages 375–386, 2015.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 The Author(s)
About this chapter
Cite this chapter
Yan, D., Tian, Y., Cheng, J. (2017). Pregel-Like Systems. In: Systems for Big Graph Analytics. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-58217-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-58217-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58216-0
Online ISBN: 978-3-319-58217-7
eBook Packages: Computer ScienceComputer Science (R0)