Skip to main content

A Comparison of Soft-Fault Error Models in the Parallel Preconditioned Flexible GMRES

  • Conference paper
  • First Online:
Parallel Processing and Applied Mathematics (PPAM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10777))

Abstract

The effect of two soft fault error models on the convergence of the parallel flexible GMRES (FGMRES) iterative method solving an elliptical PDE problem on a regular grid is evaluated. We consider two types of preconditioners: an incomplete LU factorization with dual threshold (ILUT), and an algebraic recursive multilevel solver (ARMS) combined with random butterfly transformation (RBT). The experiments quantify the difference between two soft fault error models considered in this study and compare their potential impact on the convergence.

This work was supported in part by the Air Force Office of Scientific Research under the AFOSR award FA9550-12-1-0476 by the U.S. Department of Energy, Office of Advanced Scientific Computing Research, through the Ames Laboratory, operated by Iowa State University under contract No. DE-AC02-07CH11358, and by the U.S. Department of Defense High Performance Computing Modernization Program, through a HASI grant, and the ILIR/IAR program at NSWC Dahlgren. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 and of Old Dominion University operating the Turing High Performance Computing Cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.J., Husbands, P., Keutzer, K., Patterson, D.A., Plishker, W.L., Shalf, J., Williams, S.W., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report, UCB/EECS-2006-183, EECS Department, University of California, Berkeley (2006)

    Google Scholar 

  2. Baboulin, M., Dongarra, J., Herrmann, J., Tomov, S.: Accelerating linear system solutions using randomization techniques. ACM Trans. Math. Softw. 39(2), 8:1–8:13 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  3. Baboulin, M., Jamal, A., Sosonkina, M.: Using random butterfly transformations in parallel Schur complement-based preconditioning. In: 2015 Federated Conference on Computer Science and Information Systems, pp. 649–654 (2015)

    Google Scholar 

  4. Bridges, P.G., Ferreira, K.B., Heroux, M.A., Hoemmen, M.: Fault-tolerant linear solvers via selective reliability. arXiv preprint arXiv:1206.1390 (2012)

  5. Bronevetsky, G., de Supinski, B.: Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the of the 22nd Annual International Conference on Supercomputing, pp. 155–164. ACM (2008)

    Google Scholar 

  6. Cappello, F., Geist, A., Gropp, W., Kale, S., Kramer, B., Snir, M.: Toward exascale resilience: 2014 update. Supercomput. Front. Innov. 1(1), 5–28 (2014)

    Google Scholar 

  7. Coleman, E., Sosonkina, M.: Evaluating a persistent soft fault model on preconditioned iterative methods. In: Proceedings of the 22nd Annual International Conference on Parallel and Distributed Processing Techniques and Applications (2016)

    Google Scholar 

  8. Coleman, E., Sosonkina, M., Chow, E.: Fault tolerant variants of the fine-grained parallel incomplete LU factorization. In: Proceedings of the 2017 Spring Simulation Multiconference. Society for Computer Simulation International (2017)

    Google Scholar 

  9. Elliott, J., Hoemmen, M., Mueller, F.: Evaluating the impact of SDC on the GMRES iterative solver. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1193–1202. IEEE (2014)

    Google Scholar 

  10. Elliott, J., Hoemmen, M., Mueller, F.: Tolerating silent data corruption in opaque preconditioners (2014). arXiv:1404.5552

  11. Elliott, J., Hoemmen, M., Mueller, F.: A numerical soft fault model for iterative linear solvers. In: Proceedings of the 24nd International Symposium on High-Performance Parallel and Distributed Computing (2015)

    Google Scholar 

  12. Elliott, J., Mueller, F., Stoyanov, M., Webster, C.: Quantifying the impact of single bit flips on floating point arithmetic. preprint (2013)

    Google Scholar 

  13. Elliott, J., Hoemmen, M., Mueller, F.: Resilience in numerical methods: a position on fault models and methodologies (2014). arXiv:1401.3013

  14. Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M.: A hybrid CPU/GPU approach for the parallel algebraic recursive multilevel solver pARMS. In: 18th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC 2016, Timisoara, Romania, pp. 411–416, 24–27 Sept 2016

    Google Scholar 

  15. Li, Z., Saad, Y., Sosonkina, M.: pARMS: a parallel version of the algebraic recursive multilevel solver. Numer. Linear Algebra Appl. 10(5–6), 485–509 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  16. Saad, Y.: Iterative Methods for Sparse Linear Systems. Siam, Philadelphia (2003)

    Book  MATH  Google Scholar 

  17. Saad, Y., Suchomel, B.: ARMS: an algebraic recursive multilevel solver for general sparse linear systems. Numer. Linear Algebra Appl. 9(5), 359–378 (2002)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amal Khabou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Coleman, E., Jamal, A., Baboulin, M., Khabou, A., Sosonkina, M. (2018). A Comparison of Soft-Fault Error Models in the Parallel Preconditioned Flexible GMRES. In: Wyrzykowski, R., Dongarra, J., Deelman, E., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2017. Lecture Notes in Computer Science(), vol 10777. Springer, Cham. https://doi.org/10.1007/978-3-319-78024-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-78024-5_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-78023-8

  • Online ISBN: 978-3-319-78024-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics