Skip to main content

Multigrid at Scale?

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 112))

Abstract

The reduced reliability of next generation exascale systems means that the resiliency properties of a numerical algorithm will become an important factor in both the choice of algorithm, and in its analysis. The multigrid algorithm is the workhorse for the distributed solution of linear systems but little is known about its resiliency properties and convergence behavior in a fault-prone environment. In the current work, we propose a probabilistic model for the effect of faults involving random diagonal matrices. We summarize results of the theoretical analysis of the model for the rate of convergence of fault-prone multigrid methods which show that the standard multigrid method will not be resilient. Finally, we present a modification of the standard multigrid algorithm that will be resilient.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. M. Ainsworth, C. Glusa, Is the multigrid method fault-tolerant? The Two Grid Case (Submitted)

    Google Scholar 

  2. M. Ainsworth, C. Glusa, Is the multigrid method fault-tolerant? The Multi Grid Case (In preparation)

    Google Scholar 

  3. A. Avižienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic concepts and taxonomy of dependable and secure computing. IEEE Trans. Dependable Secure Comput. 1 (1), 11–33 (2004)

    Article  Google Scholar 

  4. P. Bougerol, J. Lacroix, Products of Random Matrices with Applications to Schrödinger Operators. Progress in Probability and Statistics, vol. 8 (Birkhäuser Boston Inc., Boston, 1985)

    Google Scholar 

  5. J.H. Bramble, Multigrid Methods, vol. 294 (Longman Scientific & Technical, Harlow, 1993)

    Google Scholar 

  6. F. Cappello, Fault tolerance in petascale/exascale systems: current knowledge, challenges and research opportunities. Int. J. High Perform. Comput. Appl. 23 (3), 212–226 (2009)

    Article  Google Scholar 

  7. F. Cappello, A. Geist, B. Gropp, L. Kale, B. Kramer, M. Snir, Toward exascale resilience. Int. J. High Perform. Comput. Appl. 23, 374–388 (2009)

    Article  Google Scholar 

  8. F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, M. Snir, Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1 (1), 5–28 (2014)

    Google Scholar 

  9. M. Casas, B.R. de Supinski, G. Bronevetsky, M. Schulz, Fault Resilience of the Algebraic Multi-grid Solver (ICS’12) (ACM, New York, 2012), pp. 91–100

    Google Scholar 

  10. A. Crisanti, G. Paladin, A. Vulpiani, Products of Random Matrices (Springer, Berlin/Heidelberg, 1993)

    Book  MATH  Google Scholar 

  11. T. Cui, J. Xu, C.-S. Zhang, An Error-Resilient Redundant Subspace Correction Method, ArXiv e-prints (2013)

    Google Scholar 

  12. J. Elliott, F. Mueller, M. Stoyanov, C.G. Webster, Quantifying the impact of single bit flips on floating point arithmetic, Technical report ORNL/TM-2013/282, Oak Ridge National Laboratory, 2013

    Book  Google Scholar 

  13. M. Embree, L.N. Trefethen, Growth and decay of random Fibonacci sequences. Proc.: Math. Phys. Eng. Sci. 455 (1987), 2471–2485 (1999) (English)

    Google Scholar 

  14. H. Furstenberg, H. Kesten, Products of random matrices. Ann. Math. Stat. 31 (2), 457–469 (1960)

    Article  MathSciNet  MATH  Google Scholar 

  15. W. Hackbusch, Multi-grid Methods and Applications, vol. 4 (Springer, Berlin, 1985)

    MATH  Google Scholar 

  16. W. Hackbusch, Iterative Solution of Large Sparse Systems of Equations. Applied Mathematical Sciences, vol. 95 (Springer, New York, 1994). Translated and revised from the 1991 German original

    Google Scholar 

  17. T. Herault, Y. Robert, Fault-Tolerance Techniques for High-Performance Computing (Springer, Cham, 2015)

    Book  MATH  Google Scholar 

  18. K.-H. Huang, J. Abraham, Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 100 (6), 518–528 (1984)

    Article  MATH  Google Scholar 

  19. M. Huber, B. Gmeiner, U. Rüde, B. Wohlmuth, Resilience for multigrid software at the extreme scale, arXiv preprint arXiv:1506.06185 (2015)

    Google Scholar 

  20. R. Mainieri, Zeta function for the Lyapunov exponent of a product of random matrices. Phys. Rev. Lett. 68, 1965–1968 (1992)

    Article  MATH  Google Scholar 

  21. S.F. McCormick, W.L. Briggs, V.E. Henson, A Multigrid Tutorial (SIAM, Philadelphia, 2000)

    MATH  Google Scholar 

  22. M. Shantharam, S. Srinivasmurthy, P. Raghavan, Characterizing the Impact of Soft Errors on Iterative Methods in Scientific Computing (ICS’11) (ACM, New York, 2011), pp. 152–161

    Google Scholar 

  23. J. Sloan, R. Kumar, G. Bronevetsky, Algorithmic approaches to low overhead fault detection for sparse linear algebra, in 2012 42nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Boston (IEEE, 2012), pp. 1–12

    Google Scholar 

  24. M. Snir, R.W. Wisniewski, J.A. Abraham, S.V. Adve, S. Bagchi, P. Balaji, J. Belak, P. Bose, F. Cappello, B. Carlson, et al., Addressing failures in exascale computing. Int. J. High Perform. Comput. Appl. 28 (2), 129–173 (2014)

    Article  Google Scholar 

  25. M. Stoyanov, C. Webster, Numerical analysis of fixed point algorithms in the presence of hardware faults. SIAM J. Sci. Comput. 37 (5), C532–C553 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  26. U. Trottenberg, C.W. Oosterlee, A. Schüller, Multigrid (Academic Press Inc., San Diego, 2001). With contributions by A. Brandt, P. Oswald and K. Stüben

    Google Scholar 

  27. J.N. Tsitsiklis, V.D. Blondel, The Lyapunov exponent and joint spectral radius of pairs of matrices are hard-when not impossible-to compute and to approximate. Math. Control Signals Syst. 10 (1), 31–40 (1997) (English)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mark Ainsworth .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ainsworth, M., Glusa, C. (2016). Multigrid at Scale?. In: Karasözen, B., Manguoğlu, M., Tezer-Sezgin, M., Göktepe, S., Uğur, Ö. (eds) Numerical Mathematics and Advanced Applications ENUMATH 2015. Lecture Notes in Computational Science and Engineering, vol 112. Springer, Cham. https://doi.org/10.1007/978-3-319-39929-4_24

Download citation

Publish with us

Policies and ethics