Abstract
The Intel Xeon Phi is a many-core accelerator which focuses on the high performance applications. To characterize the performance of the Intel Xeon Phi, a system of dual 8-core Intel Xeon E5-2670 processors is employed as a control platform, and a subset of the PARSEC benchmark suite is selected as the benchmark applications. The first evaluation in this paper shows that the applications on the Intel Xeon Phi is averagely 2.06x slower than on the dual Intel Xeon E5-2670. The further detailed performance characterization quantifies the performance impact of various architecture parameters on the Intel Xeon Phi. To set an example for how to improve the architecture of the Intel Xeon Phi for better performance, the hardware optimization with an additional set of vector processing units is discussed and a simple emulator is developed accordingly. The evaluation results show that this optimization can provide an average speedup of 1.10.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Muller, M.S.: Assessing the performance of Openmp programs on the Intel Xeon Phi. In: Euro-Par 2013 Parallel Processing, pp. 547–558 (2013)
Semelyanskiy, M., Sewall, J., Kalamkar, D.D., Satish, N., Dubey, P., Astafiev, N., Burylov, I., Nikolaev, A., Maidanov, S., Li, S., Kulkarni, S., Finan, C.H.: Analysis and optimization of financial analytics benchmark on modern multi- and many-core ia-based architectures. In: SC Companion: High Performance Computing, Networking, Storage and Analysis (2012)
Williams, S., Kalamkar, D.D., Singh, A., Deshpande, A.M., Van Straalen, B., Smelyanskiy, M., Almgren, A., Dubey, P., Shalf, J., Oliker, L.: Optimization of geometric multigrid for emerging multi-and manycore processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)
Park, J., Tang, P.T.P., Smelyanskiy, M., Kim, D., Benson, T.: Efficient backprojection-based synthetic aperture radar computation with many-core processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)
Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: Openmp programming on Intel Xeon Phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, pp. 38–44 (2012)
Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on X86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS), Eugene, Oregon, USA (2013)
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring Simd for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS). Boston, MA, USA (2013)
Saule, E., Kaya, K., Catalyurek, U.V.: Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel Processing and Applied Mathematics (2013)
Gao, T., Lu, Y., Zhang, B., Suo, G.: Using the intel many integrated core to accelerate graph traversal. Int. J. High Perform. Comput. Appl. 28(3), 255–266 (2014)
Ravi, N., Yang, Y., Bao, T., Chakradhar, S.: Semi-automatic restructuring of offloadable tasks for many-core accelerators. In: Conference on High Performance Computing, Networking, Storage and Analysis (SC). Denver, USA (2013)
Reinders, J.: An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors. Intel (2012)
Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)
Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)
Molka, D., Hackenberg, D., Schone, R., Mller, M.S.: Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 261–270 (2009)
Iyer, R., Bhuyan, L.N.: Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors. In: Proceedings of 5th International Symposium on High-Performance Computer Architecture (1999)
Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early experiences with the intel many integrated cores accelerated computing technology. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. Salt Lake City, Utah, USA (2011)
Schulz, K.W., Ulerich, R., Malaya, N., Bauman, P.T., Stogner, R.H., Simmons, C.: Early experiences porting scientic applications to the many integrated core (Mic) platform. In: TACC-Intel Highly Parallel Computing Symposium. Austin, TX (2012)
Saini, S., Jin, H., Jespersen, D., Feng, H., Djomehri, J., Arasin, W., Hood, R., Mehrotra, P., Biswas, R.: An early performance evaluation of many integrated core architecture based SGI rackable computing system. In: Conference on High Performance Computing, Networking, Storage and Analysis (2013)
Rahman, R.: Intel Xeon Phi Coprocessor Architecture and Tools: the Guide for Application Developers (Experts Voice in Microprocessors). Springer, Berlin (2013)
Thiagarajan, S.U., Congdon, C., Naik, S., Nguyen, L.Q.: Intel Xeon Phi Coprocessor Developer’s Quick Start Guide". https://software.intel.com/enus/articles/intel-xeon-phi-coprocessor-developers-quick-start-guide
Pentium Processor Family Developers Manual Volume 3: Architecture and Programming Manual. vol. 3, no. 241430 (1995)
Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An Empirical Study of Intel Xeon Phi. arXiv preprint (2013)
Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Symposium on Programming Language Design & Implementation (PLDI). Chicago, Illinois, USA (2005)
Hazelwood, K., Lueck, G., Cohn, R.: Scalable support for multithreaded applications on dynamic binary instrumentation systems. In: Proceedings of the 2009 International Symposium on Memory Management (ISMM), Dublin, Ireland (2009)
Shao, Y.S., Brooks, D.: Energy characterization and instructionlevel energy model of Intels Xeon Phi processor. In: 2013 IEEE International Symposium on Low Power Electronics and Design (2013)
Czechowski, K., Lee, V.M., Grochowski, E., Ronen, R., Singhal, R., Vuduc, R., Dubey, P.: Improving the energy efficiency of big cores. In: Proceedings of the 41st Annual International Symposium on Computer Architecture (2014)
Acknowledgments
This work is supported in part by the Natural Science Foundation of China (no. 41275098), the National Grand Fundamental Research 973 Program of China (no. 2013CB956603), and the Tsinghua University Initiative Scientific Research Program (no. 20131089356).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Zhang, C., Liu, L., Li, R., Yang, G. (2015). Performance Characterization and Optimization for Intel Xeon Phi Coprocessor. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-27119-4_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27118-7
Online ISBN: 978-3-319-27119-4
eBook Packages: Computer ScienceComputer Science (R0)