Skip to main content

Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling

  • Conference paper
Advanced Parallel Processing Technologies (APPT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

  • 706 Accesses

Abstract

To schedule precedence task graphs in a more realistic framework, we introduce an efficient fault tolerant scheduling algorithm that is both contention-aware and capable of supporting ε arbitrary fail-silent (fail-stop) processor failures. The design of the proposed algorithm which we call Iso-Level CAFT, is motivated by (i) the search for a better load-balance and (ii) the generation of fewer communications. These goals are achieved by scheduling a chunk of ready tasks simultaneously, which enables for a global view of the potential communications. Our goal is to minimize the total execution time, or latency, while tolerating an arbitrary number of processor failures. Our approach is based on an active replication scheme to mask failures, so that there is no need for detecting and handling such failures. Major achievements include a low complexity, and a drastic reduction of the number of additional communications induced by the replication mechanism. The experimental results fully demonstrate the usefulness of Iso-Level CAFT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Beaumont, O., Boudet, V., Robert, Y.: A realistic model and an efficient heuristic for scheduling with heterogeneous processors. In: Proc. of the 11th Heterogeneous Computing Workshop HCW 2002 (2002)

    Google Scholar 

  2. Benoit, A., Hakem, M., Robert, Y.: Fault tolerant scheduling of precedence task graphs on heterogeneous platforms. In: Proc. of the 10th Int. Workshop in Advances Parallel and Distributed Computational Models APDCM 2008, pp. 1–8 (2008), http://graal.ens-lyon.fr/~abenoit/

  3. Benoit, A., Hakem, M., Robert, Y.: Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling. In: RR 2008-25, LIP, ENS Lyon, France (July 2008), http://graal.ens-lyon.fr/~mhakem/

  4. Benoit, A., Hakem, M., Robert, Y.: Realistic models and efficient algorithms for fault tolerance scheduling on heterogeneous platforms. In: Proc. of the 37th IEEE Int. Conference on Parallel Processing ICPP 2008, pp. 246–253 (2008), http://graal.ens-lyon.fr/~abenoit/

  5. Sinnen, O., Sousa, L.: Experimental evaluation of task scheduling accuracy: Implications for the scheduling model. IEICE Transactions on Information and Systems E86-D(9), 1620–1627 (2003)

    Google Scholar 

  6. Sinnen, O., Sousa, L.: Communication contention in task scheduling. IEEE Trans. on Parallel and Distributed Systems 16(6), 503–515 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hakem, M. (2009). Iso-Level CAFT: How to Tackle the Combination of Communication Overhead Reduction and Fault Tolerance Scheduling. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03644-6_20

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03643-9

  • Online ISBN: 978-3-642-03644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics