Abstract
Graph computing plays an important role in mining data at large scale. Partition is the primary step when we process large graph in a distributed system. A good partition has less communication and memory cost as well as more balanced load to take advantage of the whole system. Traditional edge cut methods introduce large communication cost for realistic power law graphs. Current vertex cut methods perform poorly with little consideration on load balance especially for online streaming vertex cut partition. In this paper, we formulate the total cost (partition cost, communication cost and computing cost) of graph computing especially that in iterating algorithms and analyze the cost of current partitioning methods. In addition, we explore a novel vertex cut method to ensure lower total cost. It has more balanced load with fewer communications. Experiments show that our method outperforms in state of the art graph computing frameworks at an average of 10 percent.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Holyer, I.: The np-completeness of some edge-partition problems. SIAM J. Comput. 10(4), 713–717 (1981)
Finding good approximate vertex and edge partitions is np-hard. Inf. Process. Lett. 42(3), 153–159 (1992)
Zhou, J., Bruno, N., Lin, W.: Advanced partitioning techniques for massively distributed computation. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 13–24. ACM (2012)
Andreev, K., Rcke, H.: Balanced graph partitioning. In: Proceedings of the Sixteenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 04, pp. 120–124 (2004)
Malewicz, G., Austern, M.H., Bik, A.J., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data, pp. 135–146. ACM (2010)
R. Chen et al.: Bigraph: Bipartite-aware distributed graph partition for big learning. Institute of Parallel and Distributed Systems Technical report, Number: IPADSTR-2013-002 (2013)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new framework for parallel machine learning. CoRR, vol. abs/1006.4990 (2010)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endowment 5(8), 716–727 (2012)
Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1222–1230. ACM (2012)
Bourse, F., Lelarge, M., Vojnovic, M.: Balanced graph edge partition in MSR Technical report, MSR-TR-2014-20, February 2014
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
Stanford large network dataset collection. http://snap.stanford.edu/data/
Graph 500. http://www.graph500.org/
Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: ACM SIGCOMM Computer Communication Review, vol. 29, no. 4, pp. 251–262. ACM (1999)
Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 29–42. ACM (2007)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. ACM SIGOPS Operating Syst. Rev. 41(3), 59–72 (2007)
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Salihoglu, S., Widom, J.: Optimizing graph algorithms on pregel-like systems (2014)
Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. (CSUR) 40(1), 1 (2008)
Chen, R. et al.: Powerlyra: Differentiated graph computation and partitioning on skewed graphs. Institute of Parallel and Distributed Systems Technical report, Number:IPADSTR-2013-001 (2013)
Acknowledgment
This work has been supported by National High Technology Research and Development 863 Program of China under Grant No.2013AA013205 and Program of State Key Laboratory of High-end Server & Storage Technology under Grant No.2014HSSA16.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Sun, R., Zhang, L., Chen, Z., Hao, Z. (2015). A Balanced Vertex Cut Partition Method in Distributed Graph Computing. In: He, X., et al. Intelligence Science and Big Data Engineering. Big Data and Machine Learning Techniques. IScIDE 2015. Lecture Notes in Computer Science(), vol 9243. Springer, Cham. https://doi.org/10.1007/978-3-319-23862-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-23862-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23861-6
Online ISBN: 978-3-319-23862-3
eBook Packages: Computer ScienceComputer Science (R0)