Performance Optimization of a CFD Application on Intel Multicore and Manycore Architectures

Che, Yonggang; Zhang, Lilun; Wang, Yongxian; Xu, Chuanfu; Liu, Wei; Cheng, Xinghua

doi:10.1007/978-3-662-44491-7_7

Yonggang Che¹⁵,
Lilun Zhang¹⁵,
Yongxian Wang¹⁵,
Chuanfu Xu¹⁵,
Wei Liu¹⁵ &
…
Xinghua Cheng¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 451))

980 Accesses
2 Citations

Abstract

This paper reports our experience optimizing the performance of a high-order and high accurate Computational Fluid Dynamics (CFD) application (HOSTA) on the state of art multicore processor and the emerging Intel Many Integrated Core (MIC) coprocessor. We focus on effective loop vectorization and memory access optimization. A series techniques, including data structure transformations, procedure inlining, compiler SIMDization, OpenMP loop collapsing, and the use of Huge Pages, are explored. Detailed execution time and event counts from Performance Monitoring Units are measured. The results show that our optimizations have improved the performance of HOSTA by 1.61× on a two Intel Sandy Bridge processors based computer node and 1.97× on a Intel Knights Corner coprocessor, the public MIC product. The microarchitecture level effects of these optimizations are also discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Slotnick, J., Khodadoust, A., Alonso, J., et al.: CFD Vision 2030 Study: A Path to Revolutionary Computational Aerosciences. Prepared for NASA Langley Research Center, Hampton, Virginia (2013)
Google Scholar
Intel Corporation: Many Integrated Core (MIC) Architecture (2012)
Google Scholar
Deng, X., Jiang, Y., Mao, M., et al.: Developing hybrid celledge and cell-node dissipative compact scheme for complex geometry flows. In: The Ninth Asian Computational Fluid Dynamics Conference (2012)
Google Scholar
Deng, X., Jiang, Y., Mao, M., et al.: High-order and high accurate CFD methods and their applications for complex grid problems. Commun. Comput. Phys. 11, 1081–1102 (2012)
MathSciNet Google Scholar
Top500 Supercomputers sites, http://www.top500.org (accessed December 19, 2013)
David, K.: Intel’s Sandy Bridge Microarchitecture (2010)
Google Scholar
Jim, J., James, R.: Intel Xeon Phi Coprocessor High Performance Programming. Morgan Kaufmann Press (2013)
Google Scholar
Intel Corporation: An Overview of Programming for Intel Xeon rocessors and Intel Xeon Phi coprocessors. Technical report (2012)
Google Scholar
Che, Y., Zhang, L., Wang, Y., et al.: Uniprocessor Performance Tuning of a tructured Grid based Parallel CFD Application. In: Annual Conference on High Performance Computing of China, Zhangjiajie, China, pp. 39–46 (2012) (in Chinese)
Google Scholar
Intel Corporation: A Guide to Vectorization with Intel C++ Compilers (2012)
Google Scholar
Nikolay, S.: Enabling Huge Paging on MIC with libhugetlbfs library. Technical report, Intel Corporation (2012)
Google Scholar
Intel Vtune Amplifier 2013 XE, http://www.intel.com/software/products/vtune (accessed September 12, 2013)
Intel Corporation: Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes (2013)
Google Scholar
Intel Corporation: Intel Xeon Phi Coprocessor (codename: Knights Corner) Performance Monitoring Units. Revision 1.01 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, China
Yonggang Che, Lilun Zhang, Yongxian Wang, Chuanfu Xu, Wei Liu & Xinghua Cheng

Authors

Yonggang Che
View author publications
You can also search for this author in PubMed Google Scholar
Lilun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yongxian Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chuanfu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xinghua Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, 410073, Changsha, China
Junjie Wu
Shanghai Jiao Tong University, 200240, Shanghai, China
Haibo Chen
College of Information Science and Engineering, Northeastern University Shenyang, China
Xingwei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Che, Y., Zhang, L., Wang, Y., Xu, C., Liu, W., Cheng, X. (2014). Performance Optimization of a CFD Application on Intel Multicore and Manycore Architectures. In: Wu, J., Chen, H., Wang, X. (eds) Advanced Computer Architecture. Communications in Computer and Information Science, vol 451. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44491-7_7

Download citation

DOI: https://doi.org/10.1007/978-3-662-44491-7_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44490-0
Online ISBN: 978-3-662-44491-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics