Exploiting GPUs with the Super Instruction Architecture
- 174 Downloads
The Super Instruction Architecture (SIA) is a parallel programming environment designed for problems in computational chemistry involving complicated expressions defined in terms of tensors. Tensors are represented by multidimensional arrays which are typically very large. The SIA consists of a domain specific programming language, Super Instruction Assembly Language (SIAL), and its runtime system, Super Instruction Processor. An important feature of SIAL is that algorithms are expressed in terms of blocks (or tiles) of multidimensional arrays rather than individual floating point numbers. In this paper, we describe how the SIA was enhanced to exploit GPUs, obtaining speedups ranging from two to nearly four for computational chemistry calculations, thus saving hours of elapsed time on large-scale computations. The results provide evidence that the “programming-with-blocks” approach embodied in the SIA will remain successful in modern, heterogeneous computing environments.
KeywordsParallel programming Tensors GPU Domain specific language
Shawn McDowell provided the CUDA implementation of the contraction operator. This work was supported by the National Science Foundation Grant OCI-0725070 and the Office of Science of the U.S. Department of Energy under grant DE-SC0002565. The development of the SIA and ACES III has been also been supported by the US Department of Defense’s High Performance Computing Modernization Program (HPCMP) under the two programs, Common High Performance Computing Software Initiative (CHSSI), Project CBD-03, and User Productivity Enhancement and Technology Transfer (PET). We also thank the University of Florida High Performance Computing Center for use of its facilities.
- 1.Aces III. http://www.qtp.ufl.edu/ACES/
- 2.Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Proceedings of the 7th International Conference on OpenMP in the Petascale Era, IWOMP’11, pp. 108–121. Springe, Berlin, Heidelberg (2011). http://dl.acm.org/citation.cfm?id=2023025.2023037
- 6.Jindal, N., Lotrich, V., Deumens, E., Sanders, B.A.: SIPMaP: A tool for modeling irregular parallel computations in the Super Instruction Architecture. In: 27th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2013) (2013)Google Scholar
- 7.Lee, S., Eigenmann, R.: OpenMPC: Extended openmp programming and tuning for GPUs. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC’10, pp. 1–11. IEEE Computer Society, Washington, DC, USA (2010). doi: 10.1109/SC.2010.36.
- 8.Lee, S., Vetter, J.S.: Early evaluation of directive-based GPU programming models for productive exascale computing. In: SC12: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis. IEEE Press, IEEE press, Salt Lake City, Utah, USA (2012). doi: 10.1109/SC.2012.51. http://dl.acm.org/citation.cfm?id=2388996.2389028
- 9.Lotrich, V.F., Ponton, J.M., Perera, A.S., Deumens, E., Bartlett, R.J., Sanders, B.A.: Super Instruction Architecture for petascale electronic structure software: the story. Mol. Phys. (2010). Special issue: Electrons, Molecules, Solids, and Biosystems: Fifty Years of the Quantum Theory Project. (conditionally accepted)Google Scholar
- 13.NVIDIA developer zone. https://developer.nvidia.com/category/zone/cuda-zone
- 14.OpenACC: Directives for accelerators. http://www.openacc-standard.org
- 15.Sanders, B.A., Bartlett, R., Deumens, E., Lotrich, V., Ponton, M.: A block-oriented language and runtime system for tensor algebra with very large arrays. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pp. 1–11. IEEE Computer Society, Washington, DC, USA (2010). doi: 10.1109/SC.2010.3