Abstract
Emerging novel architectures for shared memory parallel computing are incorporating increasingly creative innovations to deliver higher memory performance. A notable exemplar of this phenomenon is the Multi-Channel DRAM (MCDRAM) that is included in the \(\hbox {Intel}^{\circledR }\) XeonPhi\(^{\text {TM}}\) processors. In this paper, we examine techniques to use OpenMP to exploit the high bandwidth of MCDRAM by staging data. In particular, we implement double buffering using OpenMP sections and tasks to explicitly manage movement of data into MCDRAM. We compare our double-buffered approach to a non-buffered implementation and to Intel’s cache mode, in which the system manages the MCDRAM as a transparent cache. We also demonstrate the sensitivity of performance to parameters such as dataset size and the distribution of threads between compute and copy operations.
(“The rights of this work are transferred to the extent transferable according to title 17 § 105 U.S.C.”).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Optimizing Memory Bandwidth in Knights Landing on Stream Triad. https://software.intel.com/en-us/articles/optimizing-memory-bandwidth-in-knights-landing-on-stream-triad
Bauer, M., Cook, H., Khailany, B.: CudaDMA: Optimizing GPU memory bandwidth via warp specialization. In: 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC11), pp. 12:1–12:11. ACM (2011)
Cantalupo, C., Venkatesan, V., Hammond, J., Czurylo, K., Hammond, S.: Memkind: an extensible heap memory manager for heterogeneous memory platforms and mixed memory policies. http://memkind.github.io/memkind/memkind_arch_20150318.pdf
Chen, T., Sura, Z., O’Brien, K., O’Brien, J.K.: Optimizing the use of static buffers for DMA on a CELL chip. In: Almási, G., Cascaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 314–329. Springer, Heidelberg (2007). doi:10.1007/978-3-540-72521-3_23
Dokulil, J., Bajrovic, E., Benkner, S., Sandrieser, M., Bachmayer, B.: HyPHI - task based hybrid execution C++ library for the intel xeon phi coprocessor. In: 2013 International Conference on Parallel Processing, pp. 280–289 (2013)
Liu, F., Chaudhary, V.: Extending OpenMP for heterogeneous chip multiprocessors. In: 2003 International Conference on Parallel Processing, pp. 161–168, October 2003
OpenMP Architecture Review Board: OpenMP application programming interface, version 4.5. http://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
OpenMP Architecture Review Board: OpenMP technical report 5: memory management support for OpenMP 5.0. http://www.openmp.org/wp-content/uploads/openmp-TR5-final.pdf
Perez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: making it easier to program the cell broadband engine processor. IBM J. Res. Dev. 51(5), 593–604 (2007)
Sancho, J.C., Kerbyson, D.J.: Analysis of double buffering on two different multicore architectures: quad-core opteron and the Cell-BE. In: 2008 IEEE International Symposium on Parallel and Distributed Processing, pp. 1–12, April 2008
Sewall, J., Pennycook, S., Duran, A., Tian, X., Narayanaswamy, R.: A modern memory management system for OpenMP. In: Third International Workshop on Accelerator Programming Using Directives, pp. 25–35. IEEE Press (2016)
Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation intel xeon phi product. IEEE Micro 36(2), 34–46 (2016)
Spafford, K., Meredith, J., Vetter, J.: Maestro: data orchestration and tuning for OpenCL devices. In: DÁmbra, P., Guarracino, M., Talia, D. (eds.) Euro-Par 2010. LNCS, vol. 6272, pp. 275–286. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15291-7_26
Acknowledgments
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. We wish to acknowledge our appreciation for the use of the Advanced Architecture Test Bed, Bowman, at Sandia National Laboratories. The test beds are provided by NNSA’s Advanced Simulation and Computing (ASC) program for research and development of advanced architectures for exascale computing.
Disclaimers: Intel, Xeon, and Xeon Phi are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
* Other brands and names are the property of their respective owners.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Olivier, S.L., Hammond, S.D., Duran, A. (2017). Double Buffering for MCDRAM on Second Generation \(\hbox {Intel}^{\circledR }\) Xeon Phi\(^{\text {TM}}\) Processors with OpenMP. In: de Supinski, B., Olivier, S., Terboven, C., Chapman, B., Müller, M. (eds) Scaling OpenMP for Exascale Performance and Portability. IWOMP 2017. Lecture Notes in Computer Science(), vol 10468. Springer, Cham. https://doi.org/10.1007/978-3-319-65578-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-65578-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-65577-2
Online ISBN: 978-3-319-65578-9
eBook Packages: Computer ScienceComputer Science (R0)