Skip to main content

Performance Characterization and Optimization for Intel Xeon Phi Coprocessor

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9528))

  • 1702 Accesses

Abstract

The Intel Xeon Phi is a many-core accelerator which focuses on the high performance applications. To characterize the performance of the Intel Xeon Phi, a system of dual 8-core Intel Xeon E5-2670 processors is employed as a control platform, and a subset of the PARSEC benchmark suite is selected as the benchmark applications. The first evaluation in this paper shows that the applications on the Intel Xeon Phi is averagely 2.06x slower than on the dual Intel Xeon E5-2670. The further detailed performance characterization quantifies the performance impact of various architecture parameters on the Intel Xeon Phi. To set an example for how to improve the architecture of the Intel Xeon Phi for better performance, the hardware optimization with an additional set of vector processing units is discussed and a simple emulator is developed accordingly. The evaluation results show that this optimization can provide an average speedup of 1.10.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Schmidl, D., Cramer, T., Wienke, S., Terboven, C., Muller, M.S.: Assessing the performance of Openmp programs on the Intel Xeon Phi. In: Euro-Par 2013 Parallel Processing, pp. 547–558 (2013)

    Google Scholar 

  2. Semelyanskiy, M., Sewall, J., Kalamkar, D.D., Satish, N., Dubey, P., Astafiev, N., Burylov, I., Nikolaev, A., Maidanov, S., Li, S., Kulkarni, S., Finan, C.H.: Analysis and optimization of financial analytics benchmark on modern multi- and many-core ia-based architectures. In: SC Companion: High Performance Computing, Networking, Storage and Analysis (2012)

    Google Scholar 

  3. Williams, S., Kalamkar, D.D., Singh, A., Deshpande, A.M., Van Straalen, B., Smelyanskiy, M., Almgren, A., Dubey, P., Shalf, J., Oliker, L.: Optimization of geometric multigrid for emerging multi-and manycore processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)

    Google Scholar 

  4. Park, J., Tang, P.T.P., Smelyanskiy, M., Kim, D., Benson, T.: Efficient backprojection-based synthetic aperture radar computation with many-core processors. In: Conference on High Performance Computing, Networking, Storage and Analysis (2012)

    Google Scholar 

  5. Cramer, T., Schmidl, D., Klemm, M., an Mey, D.: Openmp programming on Intel Xeon Phi coprocessors: an early performance comparison. In: Proceedings of the Many-core Applications Research Community (MARC) Symposium at RWTH Aachen University, pp. 38–44 (2012)

    Google Scholar 

  6. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on X86-based many-core processors. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS), Eugene, Oregon, USA (2013)

    Google Scholar 

  7. Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring Simd for molecular dynamics, using Intel Xeon processors and Intel Xeon Phi coprocessors. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing (IPDPS). Boston, MA, USA (2013)

    Google Scholar 

  8. Saule, E., Kaya, K., Catalyurek, U.V.: Performance evaluation of sparse matrix multiplication kernels on Intel Xeon Phi. In: Parallel Processing and Applied Mathematics (2013)

    Google Scholar 

  9. Gao, T., Lu, Y., Zhang, B., Suo, G.: Using the intel many integrated core to accelerate graph traversal. Int. J. High Perform. Comput. Appl. 28(3), 255–266 (2014)

    Article  Google Scholar 

  10. Ravi, N., Yang, Y., Bao, T., Chakradhar, S.: Semi-automatic restructuring of offloadable tasks for many-core accelerators. In: Conference on High Performance Computing, Networking, Storage and Analysis (SC). Denver, USA (2013)

    Google Scholar 

  11. Reinders, J.: An overview of programming for Intel Xeon processors and Intel Xeon Phi coprocessors. Intel (2012)

    Google Scholar 

  12. Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81 (2008)

    Google Scholar 

  13. Bienia, C., Li, K.: Parsec 2.0: a new benchmark suite for chip-multiprocessors. In: Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation (2009)

    Google Scholar 

  14. Molka, D., Hackenberg, D., Schone, R., Mller, M.S.: Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system. In: Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 261–270 (2009)

    Google Scholar 

  15. Iyer, R., Bhuyan, L.N.: Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors. In: Proceedings of 5th International Symposium on High-Performance Computer Architecture (1999)

    Google Scholar 

  16. Koesterke, L., Boisseau, J., Cazes, J., Milfeld, K., Stanzione, D.: Early experiences with the intel many integrated cores accelerated computing technology. In: Proceedings of the 2011 TeraGrid Conference: Extreme Digital Discovery. Salt Lake City, Utah, USA (2011)

    Google Scholar 

  17. Schulz, K.W., Ulerich, R., Malaya, N., Bauman, P.T., Stogner, R.H., Simmons, C.: Early experiences porting scientic applications to the many integrated core (Mic) platform. In: TACC-Intel Highly Parallel Computing Symposium. Austin, TX (2012)

    Google Scholar 

  18. Saini, S., Jin, H., Jespersen, D., Feng, H., Djomehri, J., Arasin, W., Hood, R., Mehrotra, P., Biswas, R.: An early performance evaluation of many integrated core architecture based SGI rackable computing system. In: Conference on High Performance Computing, Networking, Storage and Analysis (2013)

    Google Scholar 

  19. Rahman, R.: Intel Xeon Phi Coprocessor Architecture and Tools: the Guide for Application Developers (Experts Voice in Microprocessors). Springer, Berlin (2013)

    Google Scholar 

  20. Thiagarajan, S.U., Congdon, C., Naik, S., Nguyen, L.Q.: Intel Xeon Phi Coprocessor Developer’s Quick Start Guide". https://software.intel.com/enus/articles/intel-xeon-phi-coprocessor-developers-quick-start-guide

  21. Pentium Processor Family Developers Manual Volume 3: Architecture and Programming Manual. vol. 3, no. 241430 (1995)

    Google Scholar 

  22. Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: An Empirical Study of Intel Xeon Phi. arXiv preprint (2013)

    Google Scholar 

  23. Luk, C.K., Cohn, R., Muth, R., Patil, H., Klauser, A., Lowney, G., Wallace, S., Reddi, V.J., Hazelwood, K.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN Symposium on Programming Language Design & Implementation (PLDI). Chicago, Illinois, USA (2005)

    Google Scholar 

  24. Hazelwood, K., Lueck, G., Cohn, R.: Scalable support for multithreaded applications on dynamic binary instrumentation systems. In: Proceedings of the 2009 International Symposium on Memory Management (ISMM), Dublin, Ireland (2009)

    Google Scholar 

  25. Shao, Y.S., Brooks, D.: Energy characterization and instructionlevel energy model of Intels Xeon Phi processor. In: 2013 IEEE International Symposium on Low Power Electronics and Design (2013)

    Google Scholar 

  26. Czechowski, K., Lee, V.M., Grochowski, E., Ronen, R., Singhal, R., Vuduc, R., Dubey, P.: Improving the energy efficiency of big cores. In: Proceedings of the 41st Annual International Symposium on Computer Architecture (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported in part by the Natural Science Foundation of China (no. 41275098), the National Grand Fundamental Research 973 Program of China (no. 2013CB956603), and the Tsinghua University Initiative Scientific Research Program (no. 20131089356).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Cheng Zhang or Guangwen Yang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, C., Liu, L., Li, R., Yang, G. (2015). Performance Characterization and Optimization for Intel Xeon Phi Coprocessor. In: Wang, G., Zomaya, A., Martinez, G., Li, K. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2015. Lecture Notes in Computer Science(), vol 9528. Springer, Cham. https://doi.org/10.1007/978-3-319-27119-4_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27119-4_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27118-7

  • Online ISBN: 978-3-319-27119-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics