Advertisement

Using C++ AMP to Accelerate HPC Applications on Multiple Platforms

  • M. Graham LopezEmail author
  • Christopher Bergstrom
  • Ying Wai Li
  • Wael Elwasif
  • Oscar Hernandez
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)

Abstract

Many high-end HPC systems support accelerators in their compute nodes to target a variety of workloads including high-performance computing simulations, big data / data analytics codes and visualization. To program both the CPU cores and attached accelerators, users now have multiple programming models available such as CUDA, OpenMP 4, OpenACC, C++14, etc., but some of these models fall short in their support for C++ on accelerators because they can have difficulty supporting advanced C++ features e.g. templating, class members, loops with iterators, lambdas, deep copy, etc. Usually, they either rely on unified memory, or the programming language is not aware of accelerators (e.g. C++14). In this paper, we explore a base-language solution called C++ Accelerated Massive Parallelism (AMP), which was developed by Microsoft and implemented by the PathScale ENZO compiler to program GPUs on a variety of HPC architectures including OpenPOWER and Intel Xeon. We report some prelminary in-progress results using C++ AMP to accelerate a matrix multiplication and quantum Monte Carlo application kernel, examining its expressiveness and performance using NVIDIA GPUs and the PathScale ENZO compiler. We hope that this preliminary report will provide a data point that will inform the functionality needed for future C++ standards to support accelerators with discrete memory spaces.

Keywords

HPC C++ for Accelerators C++ AMP Accelerator programming 

Notes

Acknowledgements

This material is based upon work supported by the U.S. Department of Energy, Office of science, and this research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

References

  1. 1.
    Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-Specific Languages and High-Level Frameworks High-Performance Computing. http://www.sciencedirect.com/science/article/pii/S0743731514001257 CrossRefGoogle Scholar
  2. 2.
    Hornung, R.D., Keasler, J.A.: The RAJA portability layer: Overview and status (2014). https://e-reports-ext.llnl.gov/pdf/782261.pdf
  3. 3.
    Hoberock, J.: Working draft, technical specification for C++ extensions for parallelism (2014). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4071.htm
  4. 4.
    Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  5. 5.
    Liao, C., Yan, Y., de Supinski, B.R., Quinlan, D.J., Chapman, B.: Early experiences with the OpenMP accelerator model. In: Rendell, A.P., Chapman, B.M., Müller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 84–98. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    CAPS, CRAY and NVIDIA, PGI: The OpenACC application programming interface (2013). http://openacc.org
  7. 7.
  8. 8.
    Microsoft Corporation “Reference (C++ AMP)” (2012). http://msdn.microsoft.com/en-us/library/hh289390%28v=vs.110%29.aspx
  9. 9.
    PathSCale Inc.: PathScale EKOPath Compiler & ENZO GPGPU Solutions (2016). http://www.pathscale.com
  10. 10.
    Sharlet, D., Kunze, A., Junkins, S., Joshi, D.: Shevlin Park: ImplementingC++ AMP with Clang/LLVM and OpenCL 2012 LLVM Developers’ Meeting (2012). http://llvm.org/devmtg/201211#talk10
  11. 11.
    HSA Foundation: Bringing C++ AMP Beyond Windows via CLANG and LLVM (2013). http://www.hsafoundation.com/bringing-camp-beyond-windows-via-clang-llvm/
  12. 12.
  13. 13.
  14. 14.
    Bland, A.S., Wells, J.C., Messer, O.E., Hernandez, O.R., Rogers, J.H.: Titan: early experience with the cray XK6 at Oak Ridge National Laboratory. In: Proceedings of Cray User Group Conference (CUG) (2012)Google Scholar
  15. 15.
    SUMMIT: Scale new heights. Discover new solutions. https://www.olcf.ornl.gov/summit/
  16. 16.
    Walkthrough: Matrix multiplication. https://msdn.microsoft.com/en-us/library/hh873134.aspx
  17. 17.
    Kim, J., Esler, K.P., McMinis, J., Morales, M.A., Clark, B.K., Shulenburger, L., Ceperley, D.M.: Hybrid algorithms in quantum Monte Carlo. J. Phys.: Conf. Ser. 402(1), 012008 (2012). http://stacks.iop.org/1742-6596/402/i=1/a=012008 Google Scholar
  18. 18.
    Esler, K.P., Kim, J., Schulenburger, L., Ceperley, D.: Fully accelerating quantum monte carlo simulations of real materials on GPU clusters. Comput. Sci. Eng. 13(5), 1–9 (2011)CrossRefGoogle Scholar
  19. 19.
    Wong, M., Kaiser, H., Heller, T.: Towards Massive Parallelism (aka Heterogeneous Devices/Accelerator/GPGPU) support in C++ with HPX (2015). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0234r0.pdf
  20. 20.
    Wong, M., Richards, A., Rovatsou, M., Reyes, R.: Kronos’s OpenCL SYCL to support Heterogeneous Devices for C++ (2016). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0236r0.pdf
  21. 21.
    Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In: Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models, ser PGAS 2014, pp. 6:1–6:11. ACM, New York (2014). http://doi.acm.org/10.1145/2676870.2676883
  22. 22.
    Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. IEEE Des. Test 12(3), 66–73 (2010). http://dx.doi.org/10.1109/MCSE.2010.69 Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • M. Graham Lopez
    • 1
    Email author
  • Christopher Bergstrom
    • 2
  • Ying Wai Li
    • 3
  • Wael Elwasif
    • 1
  • Oscar Hernandez
    • 1
  1. 1.Computer Science and Mathematics DivisionOak Ridge National LaboratoryOak RidgeUSA
  2. 2.Pathscale Inc.WilmingtonUSA
  3. 3.National Center for Computational SciencesOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations