Abstract
Graphics Processing Units (GPUs) have fundamentally altered the approach to parallel computing despite the substantial PCIe overheads that they manifest. In order to maximize performance-per-dollar, systems are now being deployed with multiple GPUs in the same node. However, multiple GPUs exacerbate the PCIe overheads by inflicting additional data-movement performance penalties when moving non-local data.
In this paper, we first evaluate the PCIe performance loss that occurs due to improper affinity between CPUs and GPUs, using a PCIeBandwidth benchmark specifically developed for systems with multiple GPUs. Our experiments demonstrate that the performance loss can be up to 2.5\(\times \) on a single GPU and up to 4.4\(\times \) when four GPUs are used. We then leverage our learnings from the PCIe studies to optimize and accelerate the Graph500 benchmark on a 4-GPU, multi-socket system. Our optimization techniques include binding the CPU threads to appropriate cores as well as the careful partitioning of data for every GPU. We achieve a speedup of 1.8\(\times \) over a single GPU implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
The Top500 Supercomputer Sites. http://www.top500.org
The Graph500 Benchmark (2012). http://www.graph500.org
Bader, D.A., Meyerhenke, H., Sanders, P., Wagner, D. (eds.): Graph Partitioning and Graph Clustering -10th DIMACS Implementation Challenge Workshop, Georgia Institute of Technology, Atlanta, GA, USA, 13–14 February 2012, Proceedings, Contemporary Mathematics (2013). http://dblp.uni-trier.de/db/conf/dimacs/dimacs2012.html
Beamer, S., Asanović, K., Patterson, D.: Direction-optimizing breadth-first search. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC 2012, Los Alamitos, CA, USA, pp. 12:1–12:10 (2012). http://dl.acm.org/citation.cfm?id=2388996.2389013
Checconi, F., Petrini, F.: Traversing trillions of edges in real-time: graph exploration on large-scale parallel machines. In: IEEE 28th International Symposium on Parallel Distributed Processing (IPDPS). IEEE (2014)
Daga, M., Nutter, M.: Exploiting coarse-grained parallelism in B+ Tree searches on an APU. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 240–247, November 2012
Daga, M., Nutter, M., Meswani, M.: Efficient breadth-first search on a heterogeneous processor. In: Proceedings of the 2014 IEEE International Conference on Big Data (Big Data), October 2014
Daga, M., Feng, W., Scogland, T.: Towards accelerating molecular modeling via multiscale approximation on a GPU. In: Proceedings of the 1st IEEE International Conference on Computational Advances in Bio and medical Sciences (2011)
Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing. Proc. IEEE 96(5), 879–899 (2008). http://www.idav.ucdavis.edu/publications/print_pub?pub_id=936
Ueno, K., Suzumura, T.: Highly scalable graph search for the Graph500 benchmark. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, New York, NY, USA, pp. 149–160 (2012). http://doi.acm.org/10.1145/2287076.2287104
Yasui, Y., Fujisawa, K., Goto, K.: NUMA-optimized parallel breadth-first search on multicore single-node system. In: BigData Conference, pp. 394–402. IEEE (2013). http://dblp.uni-trier.de/db/conf/bigdataconf/bigdataconf2013.html#YasuiFG13
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Daga, M. (2017). On the Acceleration of Graph500: Characterizing PCIe Overheads with Multi-GPUs. In: Dutra, I., Camacho, R., Barbosa, J., Marques, O. (eds) High Performance Computing for Computational Science – VECPAR 2016. VECPAR 2016. Lecture Notes in Computer Science(), vol 10150. Springer, Cham. https://doi.org/10.1007/978-3-319-61982-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-61982-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61981-1
Online ISBN: 978-3-319-61982-8
eBook Packages: Computer ScienceComputer Science (R0)