Online Graph Partitioning with an Affine Message Combining Cost Function

Chen, Xiang; Huan, Jun

doi:10.1007/978-81-322-3628-3_6

Xiang Chen⁴ &
Jun Huan⁴

5023 Accesses

Abstract

Graph partitioning is a key step in developing scalable data mining algorithms on massive graph data such as web graphs and social networks. Graph partitioning is often formalized as an optimization problem where we assign graph vertices to computing nodes with the objection to both minimize the communication cost between computing nodes and to balance the load of computing nodes. Such optimization was specified using a cost function to measure the quality of graph partition. Current graph systems such as Pregel, Graphlab take graph cut, i.e. counting the number of edges that cross different partitions, as the cost function of graph partition. We argue that graph cut ignores many characteristics of modern computing cluster and to develop better graph partitioning algorithm we should revise the cost function. In particular we believe that message combing, a new technique that was recently developed in order to minimize communication of computing nodes, should be considered in designing new cost functions for graph partitioning. In this paper, we propose a new cost function for graph partitioning which considers message combining. In this new cost function, we consider communication cost from three different sources: (1) two computing nodes establish a message channel between them; (2) a process creates a message utilize the channel and (3) the length of the message. Based on this cost function, we develop several heuristics for large graph partitioning. We have performed comprehensive experiments using real-world graphs. Our results demonstrate that our algorithms yield significant performance improvements over state-of-the-art approaches. The new cost function developed in this paper should help design new graph partition algorithms for better graph system performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Softcover Book: USD 139.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N et al (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International conference on management of data, 2010, pp 135–146
Google Scholar
Avery C (2011) Giraph: Large-scale graph processing infrastruction on Hadoop. In: Proceedings of Hadoop Summit. Santa Clara, USA
Google Scholar
Zaharia M, Chowdhury M, Franklin MJ, Shenker S, Stoica I (2010) Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX conference on Hot topics in cloud computing,pp. 10–10
Google Scholar
Low Y, Gonzalez J, Kyrola A, Bickson D, Guestrin C, Hellerstein JM (2010) Graphlab: a new framework for parallel machine learning. arXiv:1006.4990
Shao B, Wang H, Li Y (2013) Trinity: a distributed graph engine on a memory cloud. In: Proceedings of the 2013 international conference on Management of data, 2013, pp 505–516
Google Scholar
Ke Q, Prabhakaran V, Xie Y, Yu Y, Wu J, Yang J (2011) Optimizing data partitioning for data-parallel computing. HotOS XIII
Google Scholar
Kernighan BW, Lin S (1970) An efficient heuristic procedure for partitioning graphs. Bell Syst Tech J 49:291–307
Article MATH Google Scholar
Fiduccia CM, Mattheyses RM (1982) A linear-time heuristic for improving network partitions. In: 19th Conference on Design Automation, 1982, pp 175–181
Google Scholar
Feige U, Krauthgamer R (2002) A polylogarithmic approximation of the minimum bisection. SIAM J Comput 31:1090–1118
Article MathSciNet MATH Google Scholar
Ng AY (2002) On spectral clustering: analysis and an algorithm
Google Scholar
Karypis G, Kumar V (1998) A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J Sci Comput 20:359–392
Article MathSciNet MATH Google Scholar
LÜcking T, Monien B, Elsässer R (2001) New spectral bounds on k-partitioning of graphs. In: Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, pp 255–262
Google Scholar
Abou-Rjeili A, Karypis G (2006) Multilevel algorithms for partitioning power-law graphs. In: 20th International parallel and distributed processing symposium, 2006. IPDPS 2006, 10 pp
Google Scholar
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51:107–113
Article Google Scholar
Kang U, Tsourakakis CE, Faloutsos C (2009) Pegasus: a peta-scale graph mining system implementation and observations. In: Ninth IEEE International conference on data mining, 2009. ICDM’09, 2009, pp 229–238
Google Scholar
Gonzalez JE, Low Y, Gu H, Bickson D, Guestrin C (2012) Powergraph: Distributed graph-parallel computation on natural graphs. In: Proceedings of the 10th USENIX symposium on operating systems design and implementation (OSDI), 2012, pp 17–30
Google Scholar
Bourse F, Lelarge M, Vojnovic M (2014) Balanced graph edge partition
Google Scholar
Stanton I, Kliot G (2012) Streaming graph partitioning for large distributed graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012, pp 1222–1230
Google Scholar
Tsourakakis C, Gkantsidis C, Radunovic B, Vojnovic M (2014) Fennel: Streaming graph partitioning for massive scale graphs. In: Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp 333–342
Google Scholar
Leskovec J, Krevl A (2014) SNAP Datasets: Stanford Large Network Dataset Collection
Google Scholar

Download references

Acknowledgments

The work described in this paper was supported by the US NSF grant CNS 1337899: MRI: Acquisition of Computing Equipment for Supporting Data-intensive Bioinformatics Research at the University of Kansas, 2013–2016.

Author information

Authors and Affiliations

Information and Telecommunication Technology Center, The University of Kansas, Lawrence, 66047, USA
Xiang Chen & Jun Huan

Authors

Xiang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jun Huan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Huan .

Editor information

Editors and Affiliations

Indian Institute of Public Health , Hyderabad, India
Saumyadipta Pyne
CRRao AIMSCS, University of Hyderabad Campus CRRao AIMSCS, Hyderabad, India
B.L.S. Prakasa Rao
CRRao AIMSCS, University of Hyderabad Campus CRRao AIMSCS, Hyderabad, India
S.B. Rao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, X., Huan, J. (2016). Online Graph Partitioning with an Affine Message Combining Cost Function. In: Pyne, S., Rao, B., Rao, S. (eds) Big Data Analytics. Springer, New Delhi. https://doi.org/10.1007/978-81-322-3628-3_6

Download citation

DOI: https://doi.org/10.1007/978-81-322-3628-3_6
Published: 13 October 2016
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-3626-9
Online ISBN: 978-81-322-3628-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics