Analysing the Performance Improvements of Optimizations on Modern HPC Systems
Recently, there are many types of supercomputing systems being equipped with vector processors, scalar processors, and accelerators as processing elements of the systems. Although all kinds of calculations cannot effectively be performed on one HPC system, a part of calculations can exploit the potential of a HPC system by considering both the calculations and the system. These tendencies that each HPC system is designed and suitable for specific fields of calculations continue in order to achieve higher performance for target HPC codes. Therefore, even though the same HPC code is executed on multiple HPC systems, the sustained performances on HPC systems are different. As characteristics of a HPC code mainly depend on optimization methods, clarifying the performances by the optimization methods on multiple HPC systems becomes important for developing performance-portable HPC codes, which can exploit the potential of every HPC system. By considering both the optimization methods and the HPC systems, this paper clarifies the performances of the optimization methods on multiple HPC systems.
Authors would like to thank Information Initiative Center, Hokkaido University, Cyberscience Center, Tohoku University, Information Technology Center, University of Tokyo, and Information Technology Center, Nagoya University for the supercomputing resources used for the performance evaluation.
This research was partially supported by Grant-in-Aid for Scientific Research (S) #21226018, Grant-in-Aid for Scientific Research (B) #25280041, and Core Research of Evolutional Science and Technology of Japan Science and Technology Agency (JST CREST) “An Evolutionary Approach to Construction of a Software Development Environment for Massively-Parallel Heterogeneous Systems”.
- 1.Top 500 supercomputers sites. http://www.top500.org/.
- 2.Thomas H. Dunigan Jr., Jeffrey S. Vetter, James B. White III, and Patrick H. Worley. Performance evaluation of the cray x1 distributed shared-memory architecture. IEEE Micro, 25(1):30–40, January 2005, http://dx.doi.org/10.1109/MM.2005.20.
- 3.Yukihiro Hasegawa, Jun-Ichi Iwata, Miwako Tsuji, Daisuke Takahashi, Atsushi Oshiyama, Kazuo Minami, Taisuke Boku, Fumiyoshi Shoji, Atsuya Uno, Motoyoshi Kurokawa, Hikaru Inoue, Ikuo Miyoshi, and Mitsuo Yokokawa. First-principles calculations of electron states of a silicon nanowire with 100,000 atoms on the k computer. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 1:1–1:11, 2011, http://doi.acm.org/10.1145/2063384.2063386.
- 4.Abtin Rahimian, Ilya Lashuk, Shravan Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc, Denis Zorin, and George Biros. Petascale direct numerical simulation of blood flow on 200k cores and heterogeneous architectures. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’10, pages 1–11, 2010, http://dx.doi.org/10.1109/SC.2010.42.
- 5.Takashi Shimokawabe, Takayuki Aoki, Tomohiro Takaki, Toshio Endo, Akinori Yamanaka, Naoya Maruyama, Akira Nukada, and Satoshi Matsuoka. Peta-scale phase-field simulation for dendritic solidification on the tsubame 2.0 supercomputer. In Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC ’11, pages 3:1–3:11, 2011, http://doi.acm.org/10.1145/2063384.2063388.
- 6.Takashi Soga, Akihiro Musa, Youichi Shimomura, Ryusuke Egawa, Ken’ichi Itakura, Hiroyuki Takizawa, Koki Okabe, and Hiroaki Kobayashi. Performance evaluation of nec sx-9 using real science and engineering applications. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, SC ’09, pages 28:1–28:12, 2009. http://doi.acm.org/10.1145/1654059.1654088.