Abstract
Beginning with an existing well-optimized lattice quantum chromodynamics solver using OpenMP+MPI, we develop two task-based implementations, one with OpenMP tasking and one with hand-coded “untasking”. We achieve better overlap of MPI communication and computation with both methods, and expose some performance issues in OpenMP tasking. Both task-based implementations outperform the original implementation when strong scaling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The quality and even availablity of MPI_THREAD_MULTIPLE varies between implementations, and it is a good place for taskyield.
- 2.
Since there are 68 cores, we should have been able to run up to 4 MPI ranks per node, but we were unable to persuade the MPI implementation to avoid the first 4 cores.
References
Boku, T., Ishikawa, K.I., Kuramashi, Y., Meadows, L., D‘Mello, M., Troute, M., Vemuri, R.: A performance evaluation of CCS QCD Benchmark on the COMA (Intel(R) Xeon Phi\(^{\rm TM}\), KNC) system. In: PoS LATTICE 2016, vol. 261 arXiv:1612.06556 [hep-lat] (2016)
Meadows, L., Pennycook, S.J., Duran, A., Wilmarth, T., Cownie, J.: Workstealing and nested parallelism in SMP systems. In: Maruyama, N., Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 47–60. Springer, Cham (2016). doi:10.1007/978-3-319-45550-1_4
Intel Corporation: Intel\(\textregistered \) Advanced Vector Extensions 512 (Intel\(\textregistered \) AVX-512), Intel\(\textregistered \) 64 and IA-32 Architectures Software Developer’s Manual, Order Number 325462–061US, Intel Corporation, Section 5.19., December 2016
Kalamkar, D.D., Smelyanskiy, M., Farber, R., Vaidyanathan, K.: Quantum chromodynamics (QCD). In: Intel Xeon Phi Processor High Performance Programming Knights Landing Edition, chap. 26, pp. 581–598. Morgan Kaufmann (2016)
Boyle, P.: Grid: data parallel C++ mathematical object library. https://github.com/paboyle/Grid
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Meadows, L., Ishikawa, Ki. (2017). OpenMP Tasking and MPI in a Lattice QCD Benchmark. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-65578-9_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)