Skip to main content

FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World

  • Conference paper
  • First Online:
Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1908))

Abstract

Initial versions of MPI were designed to work efficiently on multiprocessors which had very little job control and thus static process models, subsequently forcing them to support dynamic process operations would have effected their performance. As current HPC systems increase in size with higher potential levels of individual node failure, the need rises for new fault tolerant systems to be developed. Here we present a new implementation of MPI called FT-MPI1 that allows the semantics and associated failure modes to be completely controlled by the application. Given is an overview of the FT-MPI semantics, design and some performance issues as well as the HARNESS g_hcore implementation it is built upon.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beck, Dongarra, Fagg, Geist, Gray, Kohl, Migliardi, K. Moore, T. Moore, P. Papadopoulous, S. Scott, V. Sunderam, “HARNESS: a next generation distributed virtual machine”, Journal of Future Generation Computer Systems, (15), Elsevier Science B.V., 1999.

    Google Scholar 

  2. G. Stellner, “CoCheck: Checkpointing and Process Migration for MPI”, In Proceedings of the International Parallel Processing Symposium, pp 526–531, Honolulu, April 1996.

    Google Scholar 

  3. Adnan Agbaria and Roy Friedman, “Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations”, In the 8th IEEE International Symposium on High Performance Distributed Computing, 1999.

    Google Scholar 

  4. Graham E. Fagg, Keith Moore, Jack J. Dongarra, “Scalable networked information processing environment (SNIPE)”, Journal of Future Generation Computer Systems, (15), pp. 571–582, Elsevier Science B.V., 1999.

    Article  Google Scholar 

  5. Mauro Migliardi and Vaidy Sunderam, “PVM Emulation in the Harness MetaComputing System: A Plug-in Based Approach”, Lecture Notes in Computer Science (1697), pp 117–124, September 1999.

    Google Scholar 

  6. P. H. Worley, I. T. Foster, and B. Toonen, “Algorithm comparison and benchmarking using a parallel spectral transform shallow water model”, Proceedings of the Sixth Workshop on Parallel Processing in Meteorology, eds. G.-R. Hoffmann and N. Kreitz, World Scientific, Singapore, pp. 277–289, 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fagg, G.E., Dongarra, J.J. (2000). FT-MPI: Fault Tolerant MPI, Supporting Dynamic Applications in a Dynamic World. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2000. Lecture Notes in Computer Science, vol 1908. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45255-9_47

Download citation

  • DOI: https://doi.org/10.1007/3-540-45255-9_47

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41010-2

  • Online ISBN: 978-3-540-45255-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics