Skip to main content

Quantifying Communication in Graph Analytics

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

Abstract

Data analytics require complex processing, often taking the shape of parallel graph-based workloads. In ensuring a high level of efficiency for these applications, understanding where the bottlenecks lie is key, particularly understanding to which extent their performance is computation or communication-bound. In this work, we analyze a reference workload in graph-based analytics, the Graph 500 benchmark. We conduct a wide array of tests on a high-performance computing system, the MareNostrum III supercomputer, using a custom high-precision profiling methodology. We show that the application performance is communication-bound, with up to 80 % of the execution time being spent enabling communication. We equally show that, with the increase in scale and concurrency that is expected in future big data systems and applications, the importance of communication increases. Finally, we characterize this representative data-analytics workload and show that the dominating data exchange is uniform all-to-all communication, opening avenues for workload and network optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Apache Giraph. http://giraph.apache.org/

  2. Barcelona Supercomputing Center (BSC) Marenostrum supercomputer. http://www.bsc.es/marenostrum-support-services/mn3

  3. Extrae instrumentation package. http://www.bsc.es/computer-sciences/extrae

  4. Graph 500 benchmark. http://www.graph500.org/

  5. SPEC MPI2007. https://www.spec.org/mpi/

  6. Top 500 list, November 2014. http://www.top500.org/list/2014/11/. Accessed 10 February 2015

  7. Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010, pp. 1–11. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/SC.2010.46

  8. Bader, D., Riedy, J., Meyerhenke, H.: Applications and challenges in large-scale graph analysis. In: HPC Graph Analytics Workshop (2013)

    Google Scholar 

  9. Badia, R.M., Labarta, J., Gimenez, J., Escale, F.: DIMEMAS: predicting MPI applications behavior in grid environments. In: Workshop on Grid Applications and Programming Tools (GGF8), vol. 86, pp. 52–62 (2003)

    Google Scholar 

  10. Borkar, S., Chien, A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)

    Article  Google Scholar 

  11. Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. SIAM (2004)

    Google Scholar 

  12. Checconi, F., Petrini, F.: Massive data analytics: the graph 500 on IBM blue gene/Q. IBM J. Res. Dev. 57(1/2), 10 (2013)

    Article  Google Scholar 

  13. Chung, I.H., Walkup, R.E., Wen, H.F., Yu, H.: MPI performance analysis tools on blue gene/L. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006. ACM, New York (2006). http://doi.acm.org/10.1145/1188455.1188583

  14. Crovella, M.E., LeBlanc, T.J.: Parallel performance prediction using lost cycles analysis. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, Supercomputing 1994, pp. 600–609. IEEE Computer Society Press, Los Alamitos (1994)

    Google Scholar 

  15. Dally, B.: Power, programmability, and granularity: the challenges of exascale computing. In: IEEE Parallel & Distributed Processing Symposium, pp. 878–878 (2011)

    Google Scholar 

  16. Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 109–124. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-38750-0_9

    Chapter  Google Scholar 

  17. Knpfer, A., et al.: The vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-68564-7_9

    Chapter  Google Scholar 

  18. Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: a parallel program. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 665–674. Springer, London (1996)

    Chapter  Google Scholar 

  19. Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). http://dx.doi.org/10.14778/2212351.2212354

    Article  Google Scholar 

  20. Murphy, R.C., Wheeler, K., Barrett, B., Ang, J.: Introducing the Graph 500. Cray Users Group (CUG) (2010)

    Google Scholar 

  21. Satish, N., Kim, C., Chhugani, J., Dubey, P.: Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 14:1–14:11 (2012)

    Google Scholar 

  22. Shende, S.S., Malony, A.D.: The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006). http://dx.doi.org/10.1177/1094342006064482

    Article  Google Scholar 

  23. Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC 2011, pp. 149–158 (2011)

    Google Scholar 

  24. Ueno, K., Suzumura, T.: 2D partitioning based graph search for the Graph500 benchmark. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW 2012, pp. 1925–1931 (2012)

    Google Scholar 

  25. Ueno, K., Suzumura, T.: Highly scalable graph search for the graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 149–160. ACM, New York (2012). http://doi.acm.org/10.1145/2287076.2287104

Download references

Acknowledgments

This work is partially conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe. We would like to thank the Barcelona Supercomputing Center for providing support and access to the MareNostrum III supercomputing cluster.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Andreea Anghel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Anghel, A., Rodriguez, G., Prisacari, B., Minkenberg, C., Dittmann, G. (2015). Quantifying Communication in Graph Analytics. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20119-1_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20118-4

  • Online ISBN: 978-3-319-20119-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics