Performance Tuning of Fock Matrix and Two-Electron Integral Calculations for NWChem on Leading HPC Platforms

  • Hongzhang ShanEmail author
  • Brian Austin
  • Wibe De Jong
  • Leonid Oliker
  • N. J. Wright
  • Edoardo Apra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8551)


Attaining performance in the evaluation of two-electron repulsion integrals and constructing the Fock matrix is of considerable importance to the computational chemistry community. Due to its numerical complexity improving the performance behavior across a variety of leading supercomputing platforms is an increasing challenge due to the significant diversity in high-performance computing architectures. In this paper, we present our successful tuning methodology for these important numerical methods on the Cray XE6, the Cray XC30, the IBM BG/Q, as well as the Intel Xeon Phi. Our optimization schemes leverage key architectural features including vectorization and simultaneous multithreading, and results in speedups of up to 2.5x compared with the original implementation.


Performance Tune Dynamic Load Balance Hardware Thread Compiler Directive Task Granularity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
    Foster, I., Tilson, J., Wagner, A., Shepard, R., Harrison, R., Kendall, R., Littlefield, R.: Toward High-Performance Computational Chemistry: I. Scalable Fock Matrix Construction Algorithms. Journal of Computational Chemistry 17, 109–123 (1996)CrossRefGoogle Scholar
  4. 4.
    Global Arrays Toolkit,
  5. 5.
    Gill, P.M.W.: Molecular Integrals Over Gaussian Basis Functions. Advances in Quantum Chemistry 25, 141–205 (1994)CrossRefGoogle Scholar
  6. 6.
    Hammond, J., Krishnamoorthy, S., Shende, S., Romero, N.A., Malony, A.: Performance Characterization of Global Address Space Applications: A Case Study with NWChem. Concurrency and Computation: Practice and Experience, 1–17 (2010)Google Scholar
  7. 7.
    Harrison, R., Guest, M., Kendall, R., Bernholdt, D., Wong, A., Stave, M., Anchell, J., Hess, A., Littlefield, R., Fann, G., Nieplocha, J., Thomas, G., Elwood, D., Tilson, J., Shepard, R., Wagner, A., Foster, I., Lusk, E., Stevens, R.: Toward high-performance computational chemistry: II. a scalable self-consistent field program. Journal of Computational Chemistry 17, 124–132 (1996)CrossRefGoogle Scholar
  8. 8.
  9. 9.
    Hurley, J.N., Huestis, D.L., Goddard, W.A.: Optimized Two-Electron-Integral Transformation Procedures for Vector-Concurrent Computer Architecture. The Journal of Physical Chemistry 92, 4880–4883 (1988)CrossRefGoogle Scholar
  10. 10.
    Jong, W.A., Bylaska, E., Govind, N., Janssen, C.L., Kowalski, K., Muller, T., Nielsen, I.M., Dam, H.J., Veryazov, V., Lindh, R.: Utilizing High Performance Computing for Chemistry: Parallel Computational Chemistry. Physical Chemistry Chemical Physics 12, 6896–6920 (2010)CrossRefGoogle Scholar
  11. 11.
    Kumar, S., Aamidala, A.R., Faraj, D.A., Smith, B., Blocksome, M., Cernohous, B., Miller, D., Parker, J., Ratterman, J., Heidelberger, P., Chen, D., Steinmacher-Burrow, B.: PAMI: A Parallel Active Message Interface for the Blue Gene/Q Supercomputer. In: The 26th International Parallel and Distributed Processing Symposium (May 2012)Google Scholar
  12. 12.
  13. 13.
  14. 14.
    Obara, S., Saika, A.: Efficient Recursive Computation of Molecular Integrals Over Cartesian Gaussian Functions. The Journal of Chemical Physics 84, 3963–3975 (1986)CrossRefGoogle Scholar
  15. 15.
    Ozog, D., Shende, S., Malony, A., Hammond, J.R., Dinan, J., Balaji, P.: Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions. In: Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (May 2013)Google Scholar
  16. 16.
    Tilson, J.L., Minkoff, M., Wagner, A.F., Shepard, R., Sutton, P., Harrison, R.J., Kendall, R.A., Wong, A.T.: High-Performance Computational Chemistry: Hartree-Fock Electronic Structure Calculations on Massively Parallel Processors. International Journal of High Performance Computing Applications 13, 291–306 (1999)CrossRefGoogle Scholar
  17. 17.
    Top500 Supercomputer Sites,
  18. 18.
    Valiev, M., Bylaska, E., Govind, N., Kowalski, K., Straatsma, T., van Dam, H., Wang, D., Nieplocha, J., Apra, E., Windus, T., de Jong, W.: Nwchem: a comprehensive and scalable open-source solution for large scale molecular simulations. Computer Physics Communications 181, 1477–1489 (2010)CrossRefzbMATHGoogle Scholar
  19. 19.
    Wolinski, K., Hinton, J.F., Pulay, P.: Efficient Implementation of the Gauge-Independent Atomic Orbital Method for NMR Chemical Shift Calculations. Jounal of the American Chemical Society 112, 8251–8260 (1990)CrossRefGoogle Scholar
  20. 20.
    Helgaker, T., Olsen, J., Jorgensen, P.: Molecular Eletronic-Structure Theory. Wiley (2013),
  21. 21.
    Helgaker, T., Taylor, P.R.: Gaussian Basis Sets and Molecular Integrals. In: Modern Electronic Structure Theory (Advances in Physical Chemistry). World Scientific (1995),
  22. 22.
    Lindh, R., Ryu, U., Liu, B.: The Reduced Multiplication Scheme of the Rys Quadrature and New Recurrence Relations for Auxiliary Function Based Two Electron Integral Evaluation. The Journal of Chemical Physics 95, 5889–5892 (1991)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hongzhang Shan
    • 1
    Email author
  • Brian Austin
    • 1
  • Wibe De Jong
    • 1
  • Leonid Oliker
    • 1
  • N. J. Wright
    • 1
  • Edoardo Apra
    • 2
  1. 1.CRD and NERSC Lawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.WR Wiley Environmental Molecular Sciences Laboratory, Pacific Northwest National LaboratoryRichlandUSA

Personalised recommendations