Performance Evaluation and Scalability Analysis of NPB-MZ on Intel Xeon Phi Coprocessor
Intel Many Integrated Cores (Intel MIC) is a novel architecture for high performance computing (HPC). It features large thread parallelism and wide vector processing units, targeting highly parallel applications. The HPC communities are faced with the problem of porting their applications to the MIC platforms. But it is still an open question that how current HPC applications can exploit the capabilities of MIC. This paper evaluates the performance of NPB-MZ programs which are derived from real world Computational Fluid Dynamics (CFD) applications on Intel Xeon Phi coprocessor, the first MIC product. The strong scaling behaviors of the applications with different process-thread combinations are investigated. The performance obtained on the Intel Xeon Phi coprocessors is compared against that obtained on Sandy Bridge CPU based computer nodes. The results show that these programs can achieve good parallel scalability when running with appropriate combinations of processes and threads. But their absolute performance on Intel Xeon Phi coprocessor is significantly lower than that on CPU node, due primarily to the much lower single thread performance. The findings of this paper are of help to the performance optimization of other applications on MIC.
KeywordsIntel MIC NPB-MZ performance evaluation scalability single thread performance
Unable to display preview. Download preview PDF.
- 2.Top500 supercomputer sites (December 2012), www.top500.org
- 3.NASA Advanced Supercomputing Division, http://www.nas.nasa.gov/publications/npb.html
- 4.Intel: Intel Xeon Phi Coprocessor System Software Development Guide (2012)Google Scholar
- 5.Van der, W., Jin, H.: Nas parallel benchmarks, multi-zone versions. NASA Ames Research Center, Tech. Rep. NAS-03-010 (2003) Google Scholar
- 6.Intel@ Xeon PhiTM Coprocessor, http://software.intel.com/mic-developer
- 7.Jeffers, J., Reinders, J.: Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann (2013)Google Scholar
- 8.Jin, H., Van der, W.: Performance characteristics of the multi-zone NAS parallel benchmarks. J. Parallel and Distributed Computing 66(5), 674–685(2006)Google Scholar
- 9.Newburn, C.J., Deodhar, R., Dmitriev, S.: Offload Compiler Runtime for the Intel® Xeon PhiTM CoprocessorGoogle Scholar