Quantifying Communication in Graph Analytics

Anghel, Andreea; Rodriguez, German; Prisacari, Bogdan; Minkenberg, Cyriel; Dittmann, Gero

doi:10.1007/978-3-319-20119-1_33

Andreea Anghel¹⁵,
German Rodriguez¹⁵,
Bogdan Prisacari¹⁵,
Cyriel Minkenberg¹⁵ &
…
Gero Dittmann¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

International Conference on High Performance Computing

2811 Accesses
3 Citations
1 Altmetric

Abstract

Data analytics require complex processing, often taking the shape of parallel graph-based workloads. In ensuring a high level of efficiency for these applications, understanding where the bottlenecks lie is key, particularly understanding to which extent their performance is computation or communication-bound. In this work, we analyze a reference workload in graph-based analytics, the Graph 500 benchmark. We conduct a wide array of tests on a high-performance computing system, the MareNostrum III supercomputer, using a custom high-precision profiling methodology. We show that the application performance is communication-bound, with up to 80 % of the execution time being spent enabling communication. We equally show that, with the increase in scale and concurrency that is expected in future big data systems and applications, the importance of communication increases. Finally, we characterize this representative data-analytics workload and show that the dominating data exchange is uniform all-to-all communication, opening avenues for workload and network optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Apache Giraph. http://giraph.apache.org/
Barcelona Supercomputing Center (BSC) Marenostrum supercomputer. http://www.bsc.es/marenostrum-support-services/mn3
Extrae instrumentation package. http://www.bsc.es/computer-sciences/extrae
Graph 500 benchmark. http://www.graph500.org/
SPEC MPI2007. https://www.spec.org/mpi/
Top 500 list, November 2014. http://www.top500.org/list/2014/11/. Accessed 10 February 2015
Agarwal, V., Petrini, F., Pasetto, D., Bader, D.A.: Scalable graph exploration on multicore processors. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010, pp. 1–11. IEEE Computer Society, Washington, DC (2010). http://dx.doi.org/10.1109/SC.2010.46
Bader, D., Riedy, J., Meyerhenke, H.: Applications and challenges in large-scale graph analysis. In: HPC Graph Analytics Workshop (2013)
Google Scholar
Badia, R.M., Labarta, J., Gimenez, J., Escale, F.: DIMEMAS: predicting MPI applications behavior in grid environments. In: Workshop on Grid Applications and Programming Tools (GGF8), vol. 86, pp. 52–62 (2003)
Google Scholar
Borkar, S., Chien, A.: The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)
Article Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-MAT: A recursive model for graph mining. SIAM (2004)
Google Scholar
Checconi, F., Petrini, F.: Massive data analytics: the graph 500 on IBM blue gene/Q. IBM J. Res. Dev. 57(1/2), 10 (2013)
Article Google Scholar
Chung, I.H., Walkup, R.E., Wen, H.F., Yu, H.: MPI performance analysis tools on blue gene/L. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC 2006. ACM, New York (2006). http://doi.acm.org/10.1145/1188455.1188583
Crovella, M.E., LeBlanc, T.J.: Parallel performance prediction using lost cycles analysis. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing, Supercomputing 1994, pp. 600–609. IEEE Computer Society Press, Los Alamitos (1994)
Google Scholar
Dally, B.: Power, programmability, and granularity: the challenges of exascale computing. In: IEEE Parallel & Distributed Processing Symposium, pp. 878–878 (2011)
Google Scholar
Jose, J., Potluri, S., Tomko, K., Panda, D.K.: Designing scalable graph500 benchmark with hybrid MPI+OpenSHMEM programming models. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 109–124. Springer, Heidelberg (2013). http://dx.doi.org/10.1007/978-3-642-38750-0_9
Chapter Google Scholar
Knpfer, A., et al.: The vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008). http://dx.doi.org/10.1007/978-3-540-68564-7_9
Chapter Google Scholar
Labarta, J., Girona, S., Pillet, V., Cortes, T., Gregoris, L.: DiP: a parallel program. In: Fraigniaud, P., Mignotte, A., Robert, Y., Bougé, L. (eds.) Euro-Par 1996. LNCS, vol. 1124, pp. 665–674. Springer, London (1996)
Chapter Google Scholar
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed graphlab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012). http://dx.doi.org/10.14778/2212351.2212354
Article Google Scholar
Murphy, R.C., Wheeler, K., Barrett, B., Ang, J.: Introducing the Graph 500. Cray Users Group (CUG) (2010)
Google Scholar
Satish, N., Kim, C., Chhugani, J., Dubey, P.: Large-scale energy-efficient graph traversal: a path to efficient data-intensive supercomputing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, pp. 14:1–14:11 (2012)
Google Scholar
Shende, S.S., Malony, A.D.: The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006). http://dx.doi.org/10.1177/1094342006064482
Article Google Scholar
Suzumura, T., Ueno, K., Sato, H., Fujisawa, K., Matsuoka, S.: Performance characteristics of Graph500 on large-scale distributed environment. In: Proceedings of the 2011 IEEE International Symposium on Workload Characterization, IISWC 2011, pp. 149–158 (2011)
Google Scholar
Ueno, K., Suzumura, T.: 2D partitioning based graph search for the Graph500 benchmark. In: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum, IPDPSW 2012, pp. 1925–1931 (2012)
Google Scholar
Ueno, K., Suzumura, T.: Highly scalable graph search for the graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 149–160. ACM, New York (2012). http://doi.acm.org/10.1145/2287076.2287104

Download references

Acknowledgments

This work is partially conducted in the context of the joint ASTRON and IBM DOME project and is funded by the Netherlands Organisation for Scientific Research (NWO), the Dutch Ministry of EL&I, and the Province of Drenthe. We would like to thank the Barcelona Supercomputing Center for providing support and access to the MareNostrum III supercomputing cluster.

Author information

Authors and Affiliations

IBM Research — Zurich, Zurich, Switzerland
Andreea Anghel, German Rodriguez, Bogdan Prisacari, Cyriel Minkenberg & Gero Dittmann

Authors

Andreea Anghel
View author publications
You can also search for this author in PubMed Google Scholar
German Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Prisacari
View author publications
You can also search for this author in PubMed Google Scholar
Cyriel Minkenberg
View author publications
You can also search for this author in PubMed Google Scholar
Gero Dittmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Andreea Anghel .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Thomas Ludwig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anghel, A., Rodriguez, G., Prisacari, B., Minkenberg, C., Dittmann, G. (2015). Quantifying Communication in Graph Analytics. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-20119-1_33
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics