Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor

Doerfler, Douglas; Deslippe, Jack; Williams, Samuel; Oliker, Leonid; Cook, Brandon; Kurth, Thorsten; Lobet, Mathieu; Malas, Tareq; Vay, Jean-Luc; Vincenti, Henri

doi:10.1007/978-3-319-46079-6_24

Douglas Doerfler¹⁶,
Jack Deslippe¹⁶,
Samuel Williams¹⁶,
Leonid Oliker¹⁶,
Brandon Cook¹⁶,
Thorsten Kurth¹⁶,
Mathieu Lobet¹⁶,
Tareq Malas¹⁶,
Jean-Luc Vay¹⁶ &
…
Henri Vincenti¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2849 Accesses
35 Citations

Abstract

The Roofline Performance Model is a visually intuitive method used to bound the sustained peak floating-point performance of any given arithmetic kernel on any given processor architecture. In the Roofline, performance is nominally measured in floating-point operations per second as a function of arithmetic intensity (operations per byte of data). In this study we determine the Roofline for the Intel Knights Landing (KNL) processor, determining the sustained peak memory bandwidth and floating-point performance for all levels of the memory hierarchy, in all the different KNL cluster modes. We then determine arithmetic intensity and performance for a suite of application kernels being targeted for the KNL based supercomputer Cori, and make comparisons to current Intel Xeon processors. Cori is the National Energy Research Scientific Computing Center’s (NERSC) next generation supercomputer. Scheduled for deployment mid-2016, it will be one of the earliest and largest KNL deployments in the world.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We added a #pragma unroll (8) around the inner loop to enable vectorization.

References

Aktulga, H.M., Buluc, A., Williams, S., Yang, C.: Optimizing sparse matrix-multiple vector multiplication for nuclear configuration interaction calculations. In: International Parallel and Distributed Processing Symposium (IPDPS 2014), May 2014
Google Scholar
Aktulga, H.M., Yang, C., Ng, E.G., Maris, P., Vary, J.P.: Improving the scalability of a symmetric iterative eigensolver for multi-core platforms. Concurrency Comput. Pract. Exp. 26(16), 2631–2651 (2014). doi:10.1002/cpe.3129
Article Google Scholar
Carr, S., Kennedy, K.: Improving the ratio of memory operations to floating-point operations in loops. ACM Trans. Program. Lang. Syst. 16(6), 1768–1810 (1994). http://doi.acm.org/10.1145/197320.197366
Article Google Scholar
Birdsall, C.K., Langdon, A.B.: Plasma Physics Via Computer Simulation. Series in Plasma Physics. CRC Press, Boca Raton (2005)
Google Scholar
Cray xc series supercomputers. http://www.cray.com/products/computing/xc-series
Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: Berkeleygw: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012). http://www.sciencedirect.com/science/article/pii/S0010465511003912
Article Google Scholar
Doerfler, D.: Understanding application data movement characteristics using intel vtune amplifier and software development emulator tools. In: IXPUG 2015, Berkeley, CA, September 28 - October 2 2015
Google Scholar
Hill, M.D., Smith, A.J.: Evaluating associativity in CPU caches. IEEE Trans. Comput. 38(12), 1612–1630 (1989)
Article Google Scholar
Lawrence Berkeley National Laboratory.: Warp website. http://warp.lbl.gov
Ligocki, T.: Roofline toolkit. https://bitbucket.org/berkeleylab/cs-roofline-toolkit
Malas, T., Kurth, T., Deslippe, J.: Optimization of the sparse matrix-vector products of an idr krylov iterative solver for the intel knl manycore processor (in preparation)
Google Scholar
Maris, P., Aktulga, H.M., Caprio, M.A., Çatalyürek, Ü.V., Ng, E.G., Oryspayev, D., Potter, H., Saule, E., Sosonkina, M., Vary, J.P., Yang, C., Zhou, Z.: Large-scale ab initio configuration interaction calculations for light nuclei. J. Phys. Conf. Ser. 403(1), 012019 (2012). http://stacks.iop.org/1742-6596/403/i=1/a=012019
Article Google Scholar
NERSC: Cori. https://www.nersc.gov/systems/cori/
NERSC: Measuring arithmetic intensity. https://www.nersc.gov/users/application-performance/measuring-arithmetic-intensity
Nesap. http://www.nersc.gov/users/computational-systems/cori/nesap/
Petrov, P.V., Newman, G.A.: 3d finite-difference modeling of elasticwave propagation in the laplace-fourier domain. Geophysics 77(4), T137–T155 (2012). doi:10.1190/geo2011-0238.1
Article Google Scholar
Raman, K.: Calculating “flop” using intel software developmentemulator (intelsde) (March 2015). https://software.intel.com/en-us/articles/calculating-flop-using-intel-software-development-emulator-intel-sde
Sodani, A.: Knights landing (knl): 2nd generation intel xeon phiprocessor. In: Hot Chips 27. Flint Center, Cupertino, August 23rd-25th 2015. http://www.hotchips.org/wp-content/uploads/hc_archives/hc27/HC27.25-Tuesday-Epub/HC27.25.70-Processors-Epub/HC27.25.710-Knights-Landing-Sodani-Intel.pdf
Tal, A.: Intel software development emulator. https://software.intel.com/en-us/articles/intel-software-development-emulator
Vincenti, H., Lehe, R., Sasanka, R., Vay, J.: An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes. ArXiv e-prints, January 2016
Google Scholar
Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008
Google Scholar
Williams, S., Watterman, A., Patterson, D.: Roofline: an insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Williams, S., Stralen, B.V., Ligocki, T., Oliker, L., Cordery, M., Lo, L.: Roofline performance model. http://crd.lbl.gov/departments/computer-science/PAR/research/roofline/

Download references

Acknowledgments

This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

This material is based upon work supported by the Advanced Scientific Computing Research Program in the U.S. Department of Energy, Office of Science, under Award Number DE-AC02-05CH11231.

J.D. was supported by the SciDAC Program on Excited State Phenomena in Energy Materials funded by the U. S. Department of Energy, Office of Basic Energy Sciences and of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory

Author information

Authors and Affiliations

Lawrence Berkeley National Laboratory, Berkeley, USA
Douglas Doerfler, Jack Deslippe, Samuel Williams, Leonid Oliker, Brandon Cook, Thorsten Kurth, Mathieu Lobet, Tareq Malas, Jean-Luc Vay & Henri Vincenti

Authors

Douglas Doerfler
View author publications
You can also search for this author in PubMed Google Scholar
Jack Deslippe
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Williams
View author publications
You can also search for this author in PubMed Google Scholar
Leonid Oliker
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Cook
View author publications
You can also search for this author in PubMed Google Scholar
Thorsten Kurth
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Lobet
View author publications
You can also search for this author in PubMed Google Scholar
Tareq Malas
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Luc Vay
View author publications
You can also search for this author in PubMed Google Scholar
Henri Vincenti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Douglas Doerfler .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Doerfler, D. et al. (2016). Applying the Roofline Performance Model to the Intel Xeon Phi Knights Landing Processor. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_24
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics