Skip to main content

OpenMP Tasking and MPI in a Lattice QCD Benchmark

  • Conference paper
  • First Online:
Scaling OpenMP for Exascale Performance and Portability (IWOMP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 10468))

Included in the following conference series:

Abstract

Beginning with an existing well-optimized lattice quantum chromodynamics solver using OpenMP+MPI, we develop two task-based implementations, one with OpenMP tasking and one with hand-coded “untasking”. We achieve better overlap of MPI communication and computation with both methods, and expose some performance issues in OpenMP tasking. Both task-based implementations outperform the original implementation when strong scaling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The quality and even availablity of MPI_THREAD_MULTIPLE varies between implementations, and it is a good place for taskyield.

  2. 2.

    Since there are 68 cores, we should have been able to run up to 4 MPI ranks per node, but we were unable to persuade the MPI implementation to avoid the first 4 cores.

References

  1. Boku, T., Ishikawa, K.I., Kuramashi, Y., Meadows, L., D‘Mello, M., Troute, M., Vemuri, R.: A performance evaluation of CCS QCD Benchmark on the COMA (Intel(R) Xeon Phi\(^{\rm TM}\), KNC) system. In: PoS LATTICE 2016, vol. 261 arXiv:1612.06556 [hep-lat] (2016)

  2. Meadows, L., Pennycook, S.J., Duran, A., Wilmarth, T., Cownie, J.: Workstealing and nested parallelism in SMP systems. In: Maruyama, N., Supinski, B.R., Wahib, M. (eds.) IWOMP 2016. LNCS, vol. 9903, pp. 47–60. Springer, Cham (2016). doi:10.1007/978-3-319-45550-1_4

    Chapter  Google Scholar 

  3. Intel Corporation: Intel\(\textregistered \) Advanced Vector Extensions 512 (Intel\(\textregistered \) AVX-512), Intel\(\textregistered \) 64 and IA-32 Architectures Software Developer’s Manual, Order Number 325462–061US, Intel Corporation, Section 5.19., December 2016

    Google Scholar 

  4. Kalamkar, D.D., Smelyanskiy, M., Farber, R., Vaidyanathan, K.: Quantum chromodynamics (QCD). In: Intel Xeon Phi Processor High Performance Programming Knights Landing Edition, chap. 26, pp. 581–598. Morgan Kaufmann (2016)

    Google Scholar 

  5. Boyle, P.: Grid: data parallel C++ mathematical object library. https://github.com/paboyle/Grid

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Larry Meadows .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Meadows, L., Ishikawa, Ki. (2017). OpenMP Tasking and MPI in a Lattice QCD Benchmark. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-65578-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-65577-2

  • Online ISBN: 978-3-319-65578-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics