Abstract
This paper reports our experience optimizing the performance of a high-order and high accurate Computational Fluid Dynamics (CFD) application (HOSTA) on the state of art multicore processor and the emerging Intel Many Integrated Core (MIC) coprocessor. We focus on effective loop vectorization and memory access optimization. A series techniques, including data structure transformations, procedure inlining, compiler SIMDization, OpenMP loop collapsing, and the use of Huge Pages, are explored. Detailed execution time and event counts from Performance Monitoring Units are measured. The results show that our optimizations have improved the performance of HOSTA by 1.61× on a two Intel Sandy Bridge processors based computer node and 1.97× on a Intel Knights Corner coprocessor, the public MIC product. The microarchitecture level effects of these optimizations are also discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Slotnick, J., Khodadoust, A., Alonso, J., et al.: CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences. Prepared for NASA Langley Research Center, Hampton, Virginia (2013)
Intel Corporation: Many Integrated Core (MIC) Architecture (2012)
Deng, X., Jiang, Y., Mao, M., et al.: Developing hybrid celledge and cell-node dissipative compact scheme for complex geometry flows. In: The Ninth Asian Computational Fluid Dynamics Conference (2012)
Deng, X., Jiang, Y., Mao, M., et al.: High-order and high accurate CFD methods and their applications for complex grid problems. Commun. Comput. Phys. 11, 1081–1102 (2012)
Top500 Supercomputers sites, http://www.top500.org (accessed December 19, 2013)
David, K.: Intel’s Sandy Bridge Microarchitecture (2010)
Jim, J., James, R.: Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann Press (2013)
Intel Corporation: An Overview of Programming for Intel Xeon rocessors and Intel Xeon Phi coprocessors. Technical report (2012)
Che, Y., Zhang, L., Wang, Y., et al.: Uniprocessor Performance Tuning of a tructured Grid based Parallel CFD Application. In: Annual Conference on High Performance Computing of China, Zhangjiajie, China, pp. 39–46 (2012) (in Chinese)
Intel Corporation: A Guide to Vectorization with Intel C++ Compilers (2012)
Nikolay, S.: Enabling Huge Paging on MIC with libhugetlbfs library. Technical report, Intel Corporation (2012)
Intel Vtune Amplifier 2013 XE, http://www.intel.com/software/products/vtune (accessed September 12, 2013)
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes (2013)
Intel Corporation: Intel Xeon Phi Coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Che, Y., Zhang, L., Wang, Y., Xu, C., Liu, W., Cheng, X. (2014). Performance Optimization of a CFD Application on Intel Multicore and Manycore Architectures. In: Wu, J., Chen, H., Wang, X. (eds) Advanced Computer Architecture. Communications in Computer and Information Science, vol 451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44491-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-662-44491-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44490-0
Online ISBN: 978-3-662-44491-7
eBook Packages: Computer ScienceComputer Science (R0)