Skip to main content

Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Abstract

The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining the sustained peak memory bandwidth and floating-point performance for all levels of the memory hierarchy, in all the different KNL cluster modes. We then determine arithmetic intensity and performance for a suite of application kernels being targeted for the KNL based supercomputer Cori, and make comparisons to current Intel Xeon processors. Cori is the National Energy Research Scientific Computing Center’s (NERSC) next generation supercomputer. Scheduled for deployment mid-2016, it will be one of the earliest and largest KNL deployments in the world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We added a #pragma unroll (8) around the inner loop to enable vectorization.

References

  1. Aktulga, H.M., Buluc, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vector multiplication for nuclear configuration interaction calculations. In: International Parallel and Distributed Processing Symposium (IPDPS 2014), May 2014

    Google Scholar 

  2. Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P.: Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurrency Comput. Pract. Exp. 26(16), 2631–2651 (2014). doi:10.1002/cpe.3129

    Article  Google Scholar 

  3. Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16(6), 1768–1810 (1994). http://doi.acm.org/10.1145/197320.197366

    Article  Google Scholar 

  4. Birdsall, C.K., Langdon, A.B.: Plasma Physics Via Computer Simulation. Series in Plasma Physics. CRC Press, Boca Raton (2005)

    Google Scholar 

  5. Cray xc series supercomputers. http://www.cray.com/products/computing/xc-series

  6. Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: Berkeleygw: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012). http://www.sciencedirect.com/science/article/pii/S0010465511003912

    Article  Google Scholar 

  7. Doerfler, D.: Understanding application data movement characteristics using intel vtune amplifier and software development emulator tools. In: IXPUG 2015, Berkeley, CA, September 28 - October 2 2015

    Google Scholar 

  8. Hill, M.D., Smith, A.J.: Evaluating associativity in CPU caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)

    Article  Google Scholar 

  9. Lawrence Berkeley National Laboratory.: Warp website. http://warp.lbl.gov

  10. Ligocki, T.: Roofline toolkit. https://bitbucket.org/berkeleylab/cs-roofline-toolkit

  11. Malas, T., Kurth, T., Deslippe, J.: Optimization of the sparse matrix-vector products of an idr krylov iterative solver for the intel knl manycore processor (in preparation)

    Google Scholar 

  12. Maris, P., Aktulga, H.M., Caprio, M.A., Çatalyürek, Ü.V., Ng, E.G., Oryspayev, D., Potter, H., Saule, E., Sosonkina, M., Vary, J.P., Yang, C., Zhou, Z.: Large-scale ab initio configuration interaction calculations for light nuclei. J. Phys. Conf. Ser. 403(1), 012019 (2012). http://stacks.iop.org/1742-6596/403/i=1/a=012019

    Article  Google Scholar 

  13. NERSC: Cori. https://www.nersc.gov/systems/cori/

  14. NERSC: Measuring arithmetic intensity. https://www.nersc.gov/users/application-performance/measuring-arithmetic-intensity

  15. Nesap. http://www.nersc.gov/users/computational-systems/cori/nesap/

  16. Petrov, P.V., Newman, G.A.: 3d finite-difference modeling of elasticwave propagation in the laplace-fourier domain. Geophysics 77(4), T137–T155 (2012). doi:10.1190/geo2011-0238.1

    Article  Google Scholar 

  17. Raman, K.: Calculating “flop” using intel software developmentemulator (intelsde) (March 2015). https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde

  18. Sodani, A.: Knights landing (knl): 2nd generation intel xeon phiprocessor. In: Hot Chips 27. Flint Center, Cupertino, August 23rd-25th 2015. http://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.70-Processors-Epub/HC27.25.710-Knights-Landing-Sodani-Intel.pdf

  19. Tal, A.: Intel software development emulator. https://software.intel.com/en-us/articles/intel-software-development-emulator

  20. Vincenti, H., Lehe, R., Sasanka, R., Vay, J.: An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes. ArXiv e-prints, January 2016

    Google Scholar 

  21. Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008

    Google Scholar 

  22. Williams, S., Watterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  23. Williams, S., Stralen, B.V., Ligocki, T., Oliker, L., Cordery, M., Lo, L.: Roofline performance model. http://crd.lbl.gov/departments/computer-science/PAR/research/roofline/

Download references

Acknowledgments

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

This material is based upon work supported by the Advanced Scientific Computing Research Program in the U.S. Department of Energy, Office of Science, under Award Number DE-AC02-05CH11231.

J.D. was supported by the SciDAC Program on Excited State Phenomena in Energy Materials funded by the U. S. Department of Energy, Office of Basic Energy Sciences and of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Douglas Doerfler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Doerfler, D. et al. (2016). Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics