Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Nozal, Raúl; Perez, Borja; Bosque, Jose Luis; Beivide, Ramón

doi:10.1007/s11227-018-2318-5

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Published: 17 March 2018

Volume 75, pages 1123–1136, (2019)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Raúl Nozal ORCID: orcid.org/0000-0002-4927-9829¹,
Borja Perez¹,
Jose Luis Bosque¹ &
…
Ramón Beivide¹

373 Accesses
9 Citations
Explore all metrics

Abstract

Heterogeneous systems composed by a CPU and a set of different hardware accelerators are very compelling thanks to their excellent performance and energy consumption features. One of the most important problems of those systems is the workload distribution among their devices. This paper describes an extension of the Maat library to allow the co-execution of a data-parallel OpenCL kernel on a heterogeneous system composed by a CPU and an Intel Xeon Phi. Maat provides an abstract view of the heterogeneous system as well as set of load balancing algorithms to squeeze the performance out of the node. It automatically performs the data partition and distribution among the devices, generates the kernels and efficiently merges the partial outputs together. Experimental results show that this approach always outperforms the baseline with only a Xeon Phi, giving excellent performance and energy efficiency. Furthermore, it is essential to select the right load balancing algorithm because it has a huge impact in the system performance and energy consumption.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the impact of quantum computing technology on future developments in high-performance scientific computing

Article Open access 31 August 2017

Parallelizing the dual revised simplex method

Article Open access 14 December 2017

Cluster-aware scheduling in multitasking GPUs

Article 22 November 2023

References

Aji AM et al (2016) MultiCL: enabling automatic scheduling for task-parallel workloads in OpenCL. Parallel Comput 58:37–55
Article MathSciNet Google Scholar
AMD Accelerated Parallel Processing (APP) Software Development Kit (SDK) V3. Last accessed January 2018. https://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/
Belviranli ME, Bhuyan LN, Gupta R (2013) A dynamic self-scheduling scheme for heterogeneous multiprocessor architectures. ACM Trans Archit Code Optim 9(4):1–20
Article Google Scholar
Castillo E et al (2014) Financial applications on multi-CPU and multi-GPU architectures. J Supercomput 71(2):729–739
Article Google Scholar
Donyanavard B, Mück T, Sarma S, Dutt N (2016) SPARTA: runtime task allocation for energy efficient heterogeneous many-cores bryan. In: Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, pp 1–10
Lastovetsky A, Szustak L, Wyrzykowski R (2017) Model-based optimization of eulag kernel on intel xeon phi through load imbalancing. IEEE Trans Parallel Distrib Syst 28(3):787–797
Article Google Scholar
Lee J, Samadi M, Park Y, Mahlke S (2015) Skmd. ACM Trans Comput Syst 33(3):1–27
Article Google Scholar
Li P, Brunet E, Trahay F, Parrot C, Thomas G, Namyst R (2015) Automatic OpenCL code generation for multi-device heterogeneous architectures. In: Proceedings of the International Conference on Parallel Processing, pp 959–968
Lopez et al (2016) Towards achieving performance portability using directives for accelerators. In: Third workshop on accelerator programming using directives, pp 13–24
Ma K, Li X, Chen W, Zhang C, Wang X (2012) GreenGPU: a holistic approach to energy efficiency in GPU-CPU heterogeneous architectures. In: Proceedings of the International Conference on Parallel Processing, pp 48–57
Pandit P, Govindarajan R (2014) Fluidic kernels: cooperative execution of opencl programs on multiple heterogeneous devices. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp 273–283
Pérez B, Bosque JL, Beivide R (2016) Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, ACM, pp 42–51
Salehian S, Liu J, Yan Y (2017) Comparison of threading programming models. In: Proceedings IEEE 31st International Parallel and Distributed Processing Sym. Workshops, pp 766–774
Stone JE, Gohara D, Shi G (2010) OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des Test 12(3):66–73
Google Scholar
Vilches A, Asenjo R, Navarro A, Corbera F, Gran R, Garzarán M (2015) Adaptive partitioning for irregular applications on heterogeneous CPU–GPU chips. Procedia Comput Sci 51(1):140–149
Article Google Scholar
Wienke S, Terboven C, An Mey D, Muller MS (2013) Accelerators, quo vadis? Performance vs. productivity. In: Proceedings of the International Conference on High Performance Computing and Simulation, pp 471–473
Xiao X, Hirasawa S, Takizawa H, Kobayashi H (2016) The importance of dynamic load balancing among openmp thread teams for irregular workloads. In: 4th International Symposium on Computing and Networking, pp 529–535
Zhang F, Zhai J, He B, Zhang S, Chen W (2017) Understanding co-running behaviors on integrated cpu/gpu architectures. IEEE Trans Parallel Distrib Syst 28(3):905–918
Article Google Scholar
Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-GPU platforms using functional performance models. IEEE Trans Comput 64(9):2506–2518
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This work has been supported by the Spanish Ministry of Education, FPU grant FPU16/03299, the University of Cantabria, grant CVE-2014-18166, the Spanish Science and Technology Commission under contracts TIN2016-76635-C2-2-R and TIN2016-81840-REDT (CAPAP-H6 network), the European Research Council (G.A. No. 321253) and the European HiPEAC Network of Excellence. The Mont-Blanc project has received funding from the European Unions Horizon 2020 research and innovation programme under Grant Agreement No. 671697.

Author information

Authors and Affiliations

Computer Science and Electronics Department, University of Cantabria, Santander, Spain
Raúl Nozal, Borja Perez, Jose Luis Bosque & Ramón Beivide

Authors

Raúl Nozal
View author publications
You can also search for this author in PubMed Google Scholar
Borja Perez
View author publications
You can also search for this author in PubMed Google Scholar
Jose Luis Bosque
View author publications
You can also search for this author in PubMed Google Scholar
Ramón Beivide
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raúl Nozal.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nozal, R., Perez, B., Bosque, J.L. et al. Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels. J Supercomput 75, 1123–1136 (2019). https://doi.org/10.1007/s11227-018-2318-5

Download citation

Published: 17 March 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s11227-018-2318-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Abstract

Access this article

Similar content being viewed by others

On the impact of quantum computing technology on future developments in high-performance scientific computing

Parallelizing the dual revised simplex method

Cluster-aware scheduling in multitasking GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Load balancing in a heterogeneous world: CPU-Xeon Phi co-execution of data-parallel kernels

Abstract

Access this article

Similar content being viewed by others

On the impact of quantum computing technology on future developments in high-performance scientific computing

Parallelizing the dual revised simplex method

Cluster-aware scheduling in multitasking GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation