Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

  • Jack DeslippeEmail author
  • Felipe H. da Jornada
  • Derek Vigil-Fowler
  • Taylor Barnes
  • Nathan Wichmann
  • Karthik Raman
  • Ruchira Sasanka
  • Steven G. Louie
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9945)


We profile and optimize calculations performed with the BerkeleyGW [2, 3] code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW method is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.


Many Integrate Core Quantum Espresso Math Library Arithmetic Intensity Trip Count 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Supported by the SciDAC Program on Excited State Phenomena in Energy Materials funded by the U.S. Department of Energy, Office of Basic Energy Sciences and of Advanced Scientific Computing Research, under Contract No. DE-AC02-05CH11231 at Lawrence Berkeley National Laboratory. Derek Vigil-Fowler is support by NREL’s LDRD Director’s Postdoctoral Fellowship. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

We acknowledge helpful conversations with Mike Greenfield, Paul Kent, David Prendergast and Pierre Carrier.


  1. 1.
  2. 2.
    Deslippe, J., Samsonidze, G., Strubbe, D.A., Jain, M., Cohen, M.L., Louie, S.G.: BerkeleyGW: a massively parallel computer package for the calculation of the quasiparticle and optical properties of materials and nanostructures. Comput. Phys. Commun. 183(6), 1269–1289 (2012)CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Frigo, M., Steven, G.J.: FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 3, pp. 1381–1384. IEEE (1998)Google Scholar
  5. 5.
    Giannozzi, P., Baroni, S., Bonini, N., Calandra, M., Car, R., Cavazzoni, C., Ceresoli, D., Chiarotti, G.L., Cococcioni, M., Dabo, I., Dal Corso, A., Fabris, S., Fratesi, G., de Gironcoli, S., Gebauer, R., Gerstmann, U., Gougoussis, C., Kokalj, A., Lazzeri, M., Martin-Samos, L., Marzari, N., Mauri, F., Mazzarello, R., Paolini, S., Pasquarello, A., Paulatto, L., Sbraccia, C., Scandolo, S., Sclauzero, G., Seitsonen, A.P., Smogunov, A., Umari, P., Wentzcovitch, R.M.: J. Phys.: Condens. Matter 21, 395502 (2009). Google Scholar
  6. 6.
    Hybertsen, M.S., Louie, S.G.: Electron correlation in semiconductors and insulators: band gaps and quasiparticle energies. Phys. Rev. B 34(8), 5390 (1986)CrossRefGoogle Scholar
  7. 7.
    Hybertsen, M.S., Louie, S.G.: First-principles theory of quasiparticles: calculation of band gaps in semiconductors and insulators. Phys. Rev. Lett. 55(13), 1418 (1985)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Kronik, L., Makmal, A., Tiago, M.L., Alemany, M.M.G., Jain, M., Huang, X., Saad, Y., Chelikowsky, J.R.: PARSEC the pseudopotential algorithm for realspace electronic structure calculations: recent advances and novel applications to nanostructures. Phys. Status Solidi (b) 243(5), 1063–1079 (2006)CrossRefGoogle Scholar
  10. 10.
  11. 11.
  12. 12.
  13. 13.
    Pfrommer, B., Raczkowski, D., Canning, A., Louie. S.G.: PARATEC (PARAllel Total Energy Code), Lawrence Berkeley National Laboratory (with contributions from Mauri, F., Cote, M., Yoon, Y., Pickard, C., Heynes, P.). For more information see There is no corresponding record for this reference
  14. 14.
    Raman, K.: Calculating “flop” using intel software development emulator (intel sde), March 2015.
  15. 15.
    Soler, J.M., Artacho, E., Gale, J.D., Garca, A., Junquera, J., Ordejn, P., Snchez-Portal, D.: The SIESTA method for ab initio order-N materials simulation. J. Phys. Condens. Matter 14(11), 2745 (2002)CrossRefGoogle Scholar
  16. 16.
  17. 17.
    Williams, S.: Auto-tuning Performance on Multicore Computers. Ph.D. thesis, EECS Department, University of California, Berkeley, December 2008Google Scholar
  18. 18.
    Williams, S., Watterman, A., Patterson, D.: Roofline: An insightful visual performance model for floating-point programs and multicore architectures. Commun. ACM 52(4), April 2009Google Scholar
  19. 19.

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Jack Deslippe
    • 1
    Email author
  • Felipe H. da Jornada
    • 2
  • Derek Vigil-Fowler
    • 3
  • Taylor Barnes
    • 1
  • Nathan Wichmann
    • 4
  • Karthik Raman
    • 5
  • Ruchira Sasanka
    • 5
  • Steven G. Louie
    • 2
  1. 1.NERSCLawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.Department of PhysicsUniversity of California at Berkeley, and Materials Sciences Division, Lawrence Berkeley National LaboratoryBerkeleyUSA
  3. 3.National Renewable Energy LaboratoryGoldenUSA
  4. 4.CraySaint PaulUSA
  5. 5.IntelHillsboroUSA

Personalised recommendations