A Comparative Study of Application Performance and Scalability on the Intel Knights Landing Processor

  • Carlos RosalesEmail author
  • John Cazes
  • Kent Milfeld
  • Antonio Gómez-Iglesias
  • Lars Koesterke
  • Lei Huang
  • Jerome Vienne
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)


Intel Knights Landing represents a qualitative change in the Many Integrated Core architecture. It represents a self-hosted option and includes a high speed integrated memory together with a two dimensional mesh used to interconnect the cores. This leads to a number of possible runtime configurations with different characteristics and implications in the performance of applications. This paper presents a study of the performance differences observed when using the three MCDRAM configurations available in combination with the three possible memory access or cluster modes. We analyze the effects that memory affinity and process pinning have on different applications. The Mantevo suite of mini applications and NAS Parallel Benchmarks are used to analyze the behavior of very different application kernels, from molecular dynamics to CFD mini-applications. Two full applications, the Weather Research and Forecast (WRF) application and a Lattice Boltzman Suite (LBS3D) are also analyzed in detail to complete the study and present scalability results of a variety of applications.


KNL MCDRAM Scalability MIC 


  1. 1.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The NAS parallel benchmarks. Int. J. Supercomputer Appl. 5(3), 63–73 (1991)CrossRefGoogle Scholar
  2. 2.
    Birrittella, M.S., Debbage, M., Huggahalli, R., Kunz, J., Lovett, T., Rimmer, T., Underwood, K.D., Zak, R.C.: Intel\(^{\textregistered }\) omni-path architecture: enabling scalable, high performance fabrics. In: Hot Interconnects, pp. 1–9. IEEE (2015)Google Scholar
  3. 3.
    Cantalupo, C., Venkatesan, V., Hammond, J.R., Czuryło, K., Hammond, S.: User extensible heap manager for heterogeneous memory platforms and mixed memory policies (2015)Google Scholar
  4. 4.
    Duran, A., Klemm, M.: The Intel many integrated core architecture. In: 2012 International Conference on High Performance Computing and Simulation (HPCS), pp. 365–366, July 2012Google Scholar
  5. 5.
    Heroux, M.A., Doerfler, D.W., Crozier, P.S., Willenbring, J.M., Edwards, H.C., Williams, A., Rajan, M., Keiter, E.R., Thornquist, H.K., Numrich, R.W.: Improving performance via mini-applications. Technical report SAND2009-5574, Sandia National Laboratories, 3 (2009)Google Scholar
  6. 6.
    McCalpin, J.: Stream benchmark (1995). #what
  7. 7.
    Michalakes, J.: Optimizing weather models for Intel Xeon Phi. Intel Theater Presentation SC 2013 (2013)Google Scholar
  8. 8.
    NASA Advanced Supercomputing Division: NAS parallel benchmarks (2016). Accessed Jun 2016
  9. 9.
    Nourgaliev, R.R., Dinh, T.N., Theofanous, T., Joseph, D.: The lattice Boltzmann equation method: theoretical interpretation, numerics and implications. Int. J. Multiph. Flow 29(1), 117–169 (2003)CrossRefzbMATHGoogle Scholar
  10. 10.
    Rosales, C.: Porting to the Intel Xeon Phi: opportunities and challenges. In: Extreme Scaling Workshop (XSW 2013), pp. 1–7. IEEE (2013)Google Scholar
  11. 11.
    Sanders, J., Kandrot, E.: CUDA by Example: An Introduction to General-Purpose GPU Programming, Portable Documents. Addison-Wesley Professional, Reading (2010)Google Scholar
  12. 12.
    Seiler, L., Carmean, D., Sprangle, E., Forsyth, T., Abrash, M., Dubey, P., Junkins, S., Lake, A., Sugerman, J., Cavin, R., Espasa, R., Grochowski, E., Juan, T., Hanrahan, P.: Larrabee: A many-core x86 architecture for visual computing. In: ACM SIGGRAPH 2008 Papers, SIGGRAPH 2008, pp. 18:1–18:15. ACM, New York (2008).
  13. 13.
    Skamarock, W.C., Klemp, J.B., Dudhia, J., Gill, D.O., Barker, M., Duda, K.G., Huang, X.Y., Wang, W., Powers, J.G.: A description of the advanced research WRF version 3. Technical report, National Center for Atmospheric Research (2008)Google Scholar
  14. 14.
    Sodani, A., Gramunt, R., Corbal, J., Kim, H.S., Vinod, K., Chinthamani, S., Hutsell, S., Agarwal, R., Liu, Y.C.: Knights landing: second-generation Intel Xeon Phi product. IEEE Micro 36(2), 34–46 (2016)CrossRefGoogle Scholar
  15. 15.
    Succi, S.: The Lattice Boltzmann Equation: For Fluid Dynamics and Beyond. Oxford University Press, Oxford (2001)zbMATHGoogle Scholar
  16. 16.
    Wong, F.C., Martin, R.P., Arpaci-Dusseau, R.H., Culler, D.E.: Architectural Requirements and scalability of the NAS parallel benchmarks. In: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing (1999)Google Scholar
  17. 17.
    Wong, P.C., Shen, H.W., Johnson, C.R., Chen, C., Ross, R.B.: The top 10 challenges in extreme-scale visual analytics. IEEE Comput. Graph. Appl. 32(4), 63 (2012)CrossRefGoogle Scholar
  18. 18.
    Zheng, H., Shu, C., Chew, Y.T.: A lattice Boltzmann model for multiphase flows with large density ratio. J. Comput. Phys. 218(1), 353–371 (2006)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Carlos Rosales
    • 1
    Email author
  • John Cazes
    • 1
  • Kent Milfeld
    • 1
  • Antonio Gómez-Iglesias
    • 1
  • Lars Koesterke
    • 1
  • Lei Huang
    • 1
  • Jerome Vienne
    • 1
  1. 1.Texas Advanced Computing CenterThe University of Texas at AustinAustinUSA

Personalised recommendations