Advertisement

Benefits from using mixed precision computations in the ELPA-AEO and ESSEX-II eigensolver projects

  • Andreas Alvermann
  • Achim Basermann
  • Hans-Joachim Bungartz
  • Christian Carbogno
  • Dominik Ernst
  • Holger Fehske
  • Yasunori Futamura
  • Martin Galgon
  • Georg Hager
  • Sarah Huber
  • Thomas Huckle
  • Akihiro Ida
  • Akira Imakura
  • Masatoshi Kawai
  • Simone Köcher
  • Moritz Kreutzer
  • Pavel Kus
  • Bruno LangEmail author
  • Hermann Lederer
  • Valeriy Manin
  • Andreas Marek
  • Kengo Nakajima
  • Lydia Nemec
  • Karsten Reuter
  • Michael Rippl
  • Melven Röhrig-Zöllner
  • Tetsuya Sakurai
  • Matthias Scheffler
  • Christoph Scheurer
  • Faisal Shahzad
  • Danilo Simoes Brambila
  • Jonas Thies
  • Gerhard Wellein
Special Feature: Original Paper International Workshop on Eigenvalue Problems: Algorithms; Software and Applications, in Petascale Computing (EPASA2018)
  • 80 Downloads

Abstract

We first briefly report on the status and recent achievements of the ELPA-AEO (Eigen value Solvers for Petaflop Applications—Algorithmic Extensions and Optimizations) and ESSEX II (Equipping Sparse Solvers for Exascale) projects. In both collaboratory efforts, scientists from the application areas, mathematicians, and computer scientists work together to develop and make available efficient highly parallel methods for the solution of eigenvalue problems. Then we focus on a topic addressed in both projects, the use of mixed precision computations to enhance efficiency. We give a more detailed description of our approaches for benefiting from either lower or higher precision in three selected contexts and of the results thus obtained.

Keywords

ELPA-AEO ESSEX Eigensolver Parallel Mixed precision 

Mathematics Subject Classification

65F15 65F25 65Y05 65Y99 

Notes

Acknowledgements

The authors thank the unknown referees for their valuable comments that helped to improve and clarify the presentation.

References

  1. 1.
    Alvermann, A., Basermann, A., Fehske, H., Galgon, M., Hager, G., Kreutzer, M., Krämer, L., Lang, B., Pieper, A., Röhrig-Zöllner, M., Shahzad, F., Thies, J., Wellein, G.: ESSEX: Equipping sparse solvers for exascale. In: Lopes, L., et al. (eds.) Euro-Par 2014: Parallel Processing Workshops, LNCS, Springer, vol. 8806, pp. 577–588 (2014)Google Scholar
  2. 2.
    Auckenthaler, T., Blum, V., Bungartz, H.J., Huckle, T., Johanni, R., Krämer, L., Lang, B., Lederer, H., Willems, P.R.: Parallel solution of partial symmetric eigenvalue problems from electronic structure calculations. Parallel Comput. 37(12), 783–794 (2011)CrossRefGoogle Scholar
  3. 3.
    Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Langou, J., Langou, J., Luszczek, P., Tomov, S.: Accelerating scientific computations with mixed precision algorithms. Comput. Phys. Comm. 180(12), 2526–2533 (2009)CrossRefzbMATHGoogle Scholar
  4. 4.
    Blum, V., Gehrke, R., Hanke, F., Havu, P., Havu, V., Ren, X., Reuter, K., Scheffler, M.: Ab initio molecular simulations with numeric atom-centered orbitals. Comput. Phys. Comm. 180, 2175–2196 (2009)CrossRefzbMATHGoogle Scholar
  5. 5.
    Cannon, L.E.: A cellular computer to implement the Kalman filter algorithm. Ph.D. thesis, Montana State University, Bozeman, MT (1969)Google Scholar
  6. 6.
    Carbogno, C., Levi, C.G., Van de Walle, C.G., Scheffler, M.: Ferroelastic switching of doped zirconia: modeling and understanding from first principles. Phys. Rev. B 90, 144109 (2014)CrossRefGoogle Scholar
  7. 7.
    Carbogno, C., Ramprasad, R., Scheffler, M.: Ab Initio Green–Kubo approach for the thermal conductivity of solids. Phys. Rev. Lett. 118(17), 175901 (2017)CrossRefGoogle Scholar
  8. 8.
    Demmel, J., Grigori, L., Hoemmen, M., Langou, J.: Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. 34(1), A206–A239 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Galgon, M., Krämer, L., Lang, B.: Improving projection-based eigensolvers via adaptive techniques. Numer. Linear Algebra Appl. 25(1), e2124 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Gavin, B., Polizzi, E.: Krylov eigenvalue strategy using the FEAST algorithm with inexact system solves. Numer. Linear Algebra Appl. p. e2188 (2018)Google Scholar
  11. 11.
    Havu, V., Blum, V., Havu, P., Scheffler, M.: Efficient \(O(N)\) integration for all-electron electronic structure calculation using numeric basis functions. J. Comput. Phys. 228(22), 8367–8379 (2009)CrossRefzbMATHGoogle Scholar
  12. 12.
    Hoemmen, M.: Communication-avoiding Krylov subspace methods. Ph.D. thesis, University of California, Berkeley (2010)Google Scholar
  13. 13.
    Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Kreutzer, M., Thies, J., Pieper, A., Alvermann, A., Galgon, M., Röhrig-Zöllner, M., Shahzad, F., Basermann, A., Bishop, A.R., Fehske, H., Hager, G., Lang, B., Wellein, G.: Performance engineering and energy efficiency of building blocks for large, sparse eigenvalue computations on heterogeneous supercomputers. In: Bungartz, H.J., Neumann, P., Nagel, W.E. (eds.) Software for Exascale Computing—SPPEXA 2013–2015, LNCSE, vol. 113, pp. 317–338. Springer, Switzerland (2016)Google Scholar
  15. 15.
    Kreutzer, M., Thies, J., Röhrig-Zöllner, M., Pieper, A., Shahzad, F., Galgon, M., Basermann, A., Fehske, H., Hager, G., Wellein, G.: GHOST: Building blocks for high performance sparse linear algebra on heterogeneous systems. Int. J. Parallel Prog. 45(5), 1046–1072 (2016)CrossRefGoogle Scholar
  16. 16.
    Kühne, T.D., Krack, M., Mohamed, F.R., Parrinello, M.: Efficient and accurate Car-Parrinello-like approach to Born-Oppenheimer molecular dynamics. Phys. Rev. Lett. 98(6), 066401 (2007)CrossRefGoogle Scholar
  17. 17.
    Lang, B.: Efficient reduction of banded hermitian positive definite generalized eigenvalue problems to banded standard eigenvalue problems. SIAM J. Sci. Comput. 41(1), C52–C72 (2019)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Manin, V., Lang, B.: Cannon-type triangular matrix multiplication for the reduction of generalized hpd eigenproblems to standard form (2018) (Submitted) Google Scholar
  19. 19.
    Marek, A., Blum, V., Johanni, R., Havu, V., Lang, B., Auckenthaler, T., Heinecke, A., Bungartz, H.J., Lederer, H.: The ELPA library: Scalable parallel eigenvalue solutions for electronic structure theory and computational science. J. Phys.: Condens. Matter 26(21), 213201 (2014)Google Scholar
  20. 20.
    Muller, J.M., Brisebarre, N., de Dinechin, F., Jeannerod, C.P., Lefèvre, V., Melquiond, G., Revol, N., Stehlé, D., Torres, S.: Handbook of Floating-Point Arithmetic. Springer, Berlin (2010)CrossRefzbMATHGoogle Scholar
  21. 21.
    Nemec, L., Blum, V., Rinke, P., Scheffler, M.: Thermodynamic equilibrium conditions of graphene films on SiC. Phys. Rev. Lett. 111(6), 065502 (2013)CrossRefGoogle Scholar
  22. 22.
    Pieper, A., Kreutzer, M., Alvermann, A., Galgon, M., Fehske, H., Hager, G., Lang, B., Wellein, G.: High-performance implementation of Chebyshev filter diagonalization for interior eigenvalue computations. J. Comput. Phys. 325, 226–243 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Polizzi, E.: Density-matrix-based algorithm for solving eigenvalue problems. Phys. Rev. B 79(11), 115112 (2009)CrossRefGoogle Scholar
  24. 24.
    Röhrig-Zöllner, M., Thies, J., Kreutzer, M., Alvermann, A., Pieper, A., Basermann, A., Hager, G., Wellein, G., Fehske, H.: Increasing the performance of the Jacobi–Davidson method by blocking. SIAM J. Sci. Comput. 37(6), C697–C722 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    Rouet, F.H., Li, X.S., Ghysels, P., Napov, A.: A distributed-memory package for dense hierarchically semi-separable matrix computations using randomization. ACM Trans. Math. Softw. 42(4), 27:1–27:35 (2016)Google Scholar
  26. 26.
    Saad, Y.: Numerical Methods for Large Eigenvalue Problems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2011)CrossRefzbMATHGoogle Scholar
  27. 27.
    Sakurai, T., Sugiura, H.: A projection method for generalized eigenvalue problems using numerical integration. J. Comput. Appl. Math. 159(1), 119–128 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Sakurai, T., Tadano, H.: CIRR: a Rayleigh-Ritz type method with contour integral for generalized eigenvalue problems. Hokkaido Math. J. 36, 745–757 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  29. 29.
    Schönemann, P.H.: A generalized solution of the orthogonal Procrustes problem. Psychometrika 31(1), 1–10 (1966)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Shahzad, F., Thies, J., Kreutzer, M., Zeiser, T., Hager, G., Wellein, G.: CRAFT: A library for easier application-level checkpoint/restart and automatic fault tolerance (2017). Preprint: arXiv:1708.02030 (Submitted)
  31. 31.
    Song, W., Wubs, F., Thies, J., Baars, S.: Numerical bifurcation analysis of a 3D turing-type reaction-diffusion model. Commun. Nonlinear Sci. Numer. Simul. 60, 145–164 (2018)MathSciNetCrossRefGoogle Scholar
  32. 32.
    Stathopoulos, A., Wu, K.: A block orthogonalization procedure with constant synchronization requirements. SIAM J. Sci. Comput. 23(6), 2165–2182 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Stewart, G.W.: Block Gram–Schmidt orthogonalization. SIAM J. Sci. Comput. 31(1), 761–775 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    Thies, J., Galgon, M., Shahzad, F., Alvermann, A., Kreutzer, M., Pieper, A., Röhrig-Zöllner, M., Basermann, A., Fehske, H., Hager, G., Lang, B., Wellein, G.: Towards an exascale enabled sparse solver repository. In: Bungartz, H.J., Neumann, P., Nagel, W.E. (eds.) Software for Exascale Computing—SPPEXA 2013–2015, LNCSE, vol. 113, pp. 295–316. Springer, Switzerland (2016)Google Scholar
  35. 35.
    Yamamoto, Y., Nakatsukasa, Y., Yanagisawa, Y., Fukaya, T.: Roundoff error analysis of the Cholesky QR2 algorithm. Electron. Trans. Numer. Anal. 44, 306–326 (2015)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Yamazaki, I., Tomov, S., Dong, T., Dongarra, J.: Mixed-precision orthogonalization scheme and adaptive step size for improving the stability and performance of CA-GMRES on GPUs. In: Daydé, M.J., Marques, O., Nakajima, K. (eds.) High Performance Computing for Computational Science—VECPAR 2014—11th International Conference, Eugene, OR, USA, June 30–July 3, 2014, Revised Selected Papers, Lecture Notes in Computer Science, vol. 8969, pp. 17–30. Springer (2014)Google Scholar
  37. 37.
    Yamazaki, I., Tomov, S., Dongarra, J.: Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput. 37(3), C307–C330 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    Yu, V.W., Corsetti, F., García, A., Huhn, W.P., Jacquelin, M., Jia, W., Lange, B., Lin, L., Lu, J., Mi, W., Seifitokaldani, A., Vázquez-Mayagoitia, Á., Yang, C., Yang, H., Blum, V.: ELSI: A unified software interface for Kohn-Sham electronic structure solvers. Comput. Phys. Comm. 222, 267–285 (2018)CrossRefGoogle Scholar

Copyright information

© The JJIAM Publishing Committee and Springer Japan KK, part of Springer Nature 2019

Authors and Affiliations

  • Andreas Alvermann
    • 1
  • Achim Basermann
    • 2
  • Hans-Joachim Bungartz
    • 3
  • Christian Carbogno
    • 4
  • Dominik Ernst
    • 5
  • Holger Fehske
    • 1
  • Yasunori Futamura
    • 6
  • Martin Galgon
    • 7
  • Georg Hager
    • 5
  • Sarah Huber
    • 7
  • Thomas Huckle
    • 3
  • Akihiro Ida
    • 8
  • Akira Imakura
    • 6
  • Masatoshi Kawai
    • 8
  • Simone Köcher
    • 9
  • Moritz Kreutzer
    • 5
  • Pavel Kus
    • 10
  • Bruno Lang
    • 7
    Email author
  • Hermann Lederer
    • 10
  • Valeriy Manin
    • 7
  • Andreas Marek
    • 10
  • Kengo Nakajima
    • 8
  • Lydia Nemec
    • 9
  • Karsten Reuter
    • 9
  • Michael Rippl
    • 3
  • Melven Röhrig-Zöllner
    • 2
  • Tetsuya Sakurai
    • 6
  • Matthias Scheffler
    • 4
  • Christoph Scheurer
    • 9
  • Faisal Shahzad
    • 5
  • Danilo Simoes Brambila
    • 4
  • Jonas Thies
    • 2
  • Gerhard Wellein
    • 5
  1. 1.Institute of PhysicsUniversity of GreifswaldGreifswaldGermany
  2. 2.German Aerospace Center (DLR)CologneGermany
  3. 3.Department of InformaticsTechnical University of MunichMunichGermany
  4. 4.Fritz Haber Institute of the Max Planck SocietyBerlinGermany
  5. 5.High Performance ComputingUniversity of Erlangen-NurembergErlangenGermany
  6. 6.Applied MathematicsUniversity of TsukubaTsukubaJapan
  7. 7.Mathematics and Natural SciencesUniversity of WuppertalWuppertalGermany
  8. 8.Computer ScienceThe University of TokyoTokyoJapan
  9. 9.Department of Theoretical ChemistryTechnical University of MunichMunichGermany
  10. 10.Max Planck Computing and Data FacilityGarchingGermany

Personalised recommendations