Performance Characterization and Optimization for Intel Xeon Phi Coprocessor

Zhang, Cheng; Liu, Li; Li, Ruizhe; Yang, Guangwen

doi:10.1007/978-3-319-27119-4_2

Cheng Zhang¹⁷,
Li Liu¹⁸,
Ruizhe Li¹⁷ &
…
Guangwen Yang^17,18

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9528))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1702 Accesses

Abstract

The Intel Xeon Phi is a many-core accelerator which focuses on the high performance applications. To characterize the performance of the Intel Xeon Phi, a system of dual 8-core Intel Xeon E5-2670 processors is employed as a control platform, and a subset of the PARSEC benchmark suite is selected as the benchmark applications. The first evaluation in this paper shows that the applications on the Intel Xeon Phi is averagely 2.06x slower than on the dual Intel Xeon E5-2670. The further detailed performance characterization quantifies the performance impact of various architecture parameters on the Intel Xeon Phi. To set an example for how to improve the architecture of the Intel Xeon Phi for better performance, the hardware optimization with an additional set of vector processing units is discussed and a simple emulator is developed accordingly. The evaluation results show that this optimization can provide an average speedup of 1.10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Muller, M.S.: Assessing the performance of Openmp programs on the Intel Xeon Phi. In: Euro-Par 2013 Parallel Processing, pp. 547–558 (2013)
Google Scholar
Semelyanskiy, M., Sewall, J., Kalamkar, D.D., Satish, N., Dubey, P., Astafiev, N., Burylov, I., Nikolaev, A., Maidanov, S., Li, S., Kulkarni, S., Finan, C.H.: Analysis and optimization of financial analytics benchmark on modern multi- and many-core ia-based architectures. In: SC Companion: High Performance Computing, Networking, Storage and Analysis (2012)
Google Scholar
Williams, S., Kalamkar, D.D., Singh, A., Deshpande, A.M., Van Straalen, B., Smelyanskiy, M., Almgren, A., Dubey, P., Shalf, J., Oliker, L.: Optimization of geometric multigrid for emerging multi-and manycore processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)
Google Scholar
Park, J., Tang, P.T.P., Smelyanskiy, M., Kim, D., Benson, T.: Efficient backprojection-based synthetic aperture radar computation with many-core processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)
Google Scholar
Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: Openmp programming on Intel Xeon Phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, pp. 38–44 (2012)
Google Scholar
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on X86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS), Eugene, Oregon, USA (2013)
Google Scholar
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring Simd for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS). Boston, MA, USA (2013)
Google Scholar
Saule, E., Kaya, K., Catalyurek, U.V.: Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel Processing and Applied Mathematics (2013)
Google Scholar
Gao, T., Lu, Y., Zhang, B., Suo, G.: Using the intel many integrated core to accelerate graph traversal. Int. J. High Perform. Comput. Appl. 28(3), 255–266 (2014)
Article Google Scholar
Ravi, N., Yang, Y., Bao, T., Chakradhar, S.: Semi-automatic restructuring of offloadable tasks for many-core accelerators. In: Conference on High Performance Computing, Networking, Storage and Analysis (SC). Denver, USA (2013)
Google Scholar
Reinders, J.: An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors. Intel (2012)
Google Scholar
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)
Google Scholar
Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)
Google Scholar
Molka, D., Hackenberg, D., Schone, R., Mller, M.S.: Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 261–270 (2009)
Google Scholar
Iyer, R., Bhuyan, L.N.: Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors. In: Proceedings of 5th International Symposium on High-Performance Computer Architecture (1999)
Google Scholar
Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early experiences with the intel many integrated cores accelerated computing technology. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. Salt Lake City, Utah, USA (2011)
Google Scholar
Schulz, K.W., Ulerich, R., Malaya, N., Bauman, P.T., Stogner, R.H., Simmons, C.: Early experiences porting scientic applications to the many integrated core (Mic) platform. In: TACC-Intel Highly Parallel Computing Symposium. Austin, TX (2012)
Google Scholar
Saini, S., Jin, H., Jespersen, D., Feng, H., Djomehri, J., Arasin, W., Hood, R., Mehrotra, P., Biswas, R.: An early performance evaluation of many integrated core architecture based SGI rackable computing system. In: Conference on High Performance Computing, Networking, Storage and Analysis (2013)
Google Scholar
Rahman, R.: Intel Xeon Phi Coprocessor Architecture and Tools: the Guide for Application Developers (Experts Voice in Microprocessors). Springer, Berlin (2013)
Google Scholar
Thiagarajan, S.U., Congdon, C., Naik, S., Nguyen, L.Q.: Intel Xeon Phi Coprocessor Developer’s Quick Start Guide". https://software.intel.com/enus/articles/intel-xeon-phi-coprocessor-developers-quick-start-guide
Pentium Processor Family Developers Manual Volume 3: Architecture and Programming Manual. vol. 3, no. 241430 (1995)
Google Scholar
Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An Empirical Study of Intel Xeon Phi. arXiv preprint (2013)
Google Scholar
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Symposium on Programming Language Design & Implementation (PLDI). Chicago, Illinois, USA (2005)
Google Scholar
Hazelwood, K., Lueck, G., Cohn, R.: Scalable support for multithreaded applications on dynamic binary instrumentation systems. In: Proceedings of the 2009 International Symposium on Memory Management (ISMM), Dublin, Ireland (2009)
Google Scholar
Shao, Y.S., Brooks, D.: Energy characterization and instructionlevel energy model of Intels Xeon Phi processor. In: 2013 IEEE International Symposium on Low Power Electronics and Design (2013)
Google Scholar
Czechowski, K., Lee, V.M., Grochowski, E., Ronen, R., Singhal, R., Vuduc, R., Dubey, P.: Improving the energy efficiency of big cores. In: Proceedings of the 41st Annual International Symposium on Computer Architecture (2014)
Google Scholar

Download references

Acknowledgments

This work is supported in part by the Natural Science Foundation of China (no. 41275098), the National Grand Fundamental Research 973 Program of China (no. 2013CB956603), and the Tsinghua University Initiative Scientific Research Program (no. 20131089356).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Cheng Zhang, Ruizhe Li & Guangwen Yang
Center for Earth System Science, Tsinghua University, Beijing, 100084, China
Li Liu & Guangwen Yang

Authors

Cheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Ruizhe Li
View author publications
You can also search for this author in PubMed Google Scholar
Guangwen Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Cheng Zhang or Guangwen Yang .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Guojun Wang
The University of Sydney, Sydney, New South Wales, Australia
Albert Zomaya
University of Murcia, Murcia, Murcia, Spain
Gregorio Martinez
Hunan University , Changsha, China
Kenli Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, C., Liu, L., Li, R., Yang, G. (2015). Performance Characterization and Optimization for Intel Xeon Phi Coprocessor. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-27119-4_2
Published: 16 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27118-7
Online ISBN: 978-3-319-27119-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics