Skip to main content

On Characterizing the Performance of Distributed Graph Computation Platforms

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 8904))

Abstract

Graphs are widely used for modeling complicated data in different application domains such as social networks, protein networks, transportation networks, bibliographical networks, knowledge bases and many more. Currently, graphs with millions and billions of nodes and edges have become very common. Therefore, designing scalable systems for processing and analyzing large scale graphs has become one of the most timely problems facing the big data research community. In practice, distributed processing of large scale graphs is a challenging task due to their size in addition to their inherent irregular structure and the iterative nature of graph processing and computation algorithms. In recent years, several distributed graph processing systems have been presented, most notably Pregel and GraphLab, to tackle this challenge. In particular, both systems use a vertex-centric computation model which enables the user to design a program that is executed locally for each vertex in parallel. In this paper, we analyze the performance characteristics of distributed graph processing systems and provide an experimental comparison on the performance of two popular systems in this area.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.insidefacebook.com/2012/10/04/facebook-reaches-billion-user- milestone/.

  2. 2.

    http://hadoop.apache.org/.

  3. 3.

    http://giraph.apache.org/.

  4. 4.

    http://hama.apache.org/.

  5. 5.

    https://github.com/jzachr/goldenorb.

  6. 6.

    http://zookeeper.apache.org/.

  7. 7.

    http://graphlab.org/.

  8. 8.

    http://snap.stanford.edu/data/web-Amazon.html.

  9. 9.

    https://giraph.apache.org/io.html.

  10. 10.

    http://hive.apache.org/.

  11. 11.

    http://hbase.apache.org/.

References

  1. Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: The HaLoop approach to large-scale iterative data analysis. VLDB J. 21(2), 169–190 (2012)

    Article  Google Scholar 

  2. Dean, J., Ghemawa, S.: MapReduce: simplified data processing on large clusters. In: OSDI, pp. 137–150 (2004)

    Google Scholar 

  3. Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: a runtime for iterative MapReduce. In: HPDC, pp. 810–818 (2010)

    Google Scholar 

  4. Fard, A., Nisar, M.U., Ramaswamy, L., Miller, J.A., Saltz, M.: A distributed vertex-centric approach for pattern matching in massive graphs. In: BigData Conference, pp. 403–411 (2013)

    Google Scholar 

  5. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning in the cloud. PVLDB 5(8), 716–727 (2012)

    Google Scholar 

  6. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a system for large-scale graph processing. In: SIGMOD Conference, pp. 135–146 (2010)

    Google Scholar 

  7. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical report 1999–66, Stanford InfoLab, November 1999. Previous number = SIDL-WP-1999-0120

    Google Scholar 

  8. Sakr, S., Liu, A., Fayoumi, A.G.: The family of mapreduce and large-scale data processing systems. ACM Comput. Surv. 46(1), 11 (2013)

    Article  Google Scholar 

  9. Salihoglu, S., Widom, J.: GPS: a graph processing system. In: SSDBM, p. 22 (2013)

    Google Scholar 

  10. Schad, J., Dittrich, J., Quiané-Ruiz, J.-A.: Runtime measurements in the cloud: observing, analyzing, and reducing variance. PVLDB 3(1), 460–471 (2010)

    Google Scholar 

  11. Stutz, P., Bernstein, A., Cohen, W.: Signal/Collect: graph algorithms for the (semantic) web. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 764–780. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  12. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM 33(8), 103–111 (1990)

    Article  Google Scholar 

  13. Wang, G., Xie, W., Demers, A., Gehrke, J.: Asynchronous large-scale graph processing made easy. In: CIDR (2013)

    Google Scholar 

  14. Zhang, Y., Gao, Q., Gao, L., Wang, C.: iMapReduce: a distributed computing framework for iterative computation. J. Grid Comput. 10(1), 47–68 (2012)

    Article  Google Scholar 

Download references

Acknowledgment

This work was supported by King Abdulaziz City for Science and Technology (KACST) project 11-INF1990-03.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sherif Sakr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Barnawi, A. et al. (2015). On Characterizing the Performance of Distributed Graph Computation Platforms. In: Nambiar, R., Poess, M. (eds) Performance Characterization and Benchmarking. Traditional to Big Data. TPCTC 2014. Lecture Notes in Computer Science(), vol 8904. Springer, Cham. https://doi.org/10.1007/978-3-319-15350-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-15350-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-15349-0

  • Online ISBN: 978-3-319-15350-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics