Skip to main content

A Fault Tolerant Scheduling Algorithm for DAG Applications in Cluster Environments

  • Conference paper
Digital Information Processing and Communications (ICDIPC 2011)

Abstract

Fault tolerance is an essential requirement in systems running applications which need a technique to continue execution where some system components are subject to failure. In this paper, a fault tolerant task scheduling algorithm is proposed for mapping task graphs to heterogeneous processing nodes in cluster computing systems. The starting point of the algorithm is a DAG representing an application with information about the tasks. This information consists of the execution time of the tasks on the target system processors, communication times between the tasks having data dependencies, and the number of the processor failures (ε) which should be tolerated by the scheduling algorithm. The algorithm is based on the active replication scheme, and it schedules ε+1 replicas of each task to achieve the required fault tolerance. Simulation results show the efficiency of the proposed algorithm in spite of its lower complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buyya, R.: High Performance Cluster Computing: Architectures and Systems, 1st edn. Prentice Hall PTR, Upper Saddle River (1999)

    Google Scholar 

  2. Buyya, R.: High Performance Cluster Computing: Programming and Applications, 1st edn. Prentice Hall PTR, Upper Saddle River (1999)

    Google Scholar 

  3. Sinnen, O.: Task Scheduling for Parallel Systems, 1st edn. John Wiley and Sons Inc., New Jersey (2007)

    Book  Google Scholar 

  4. Entezari-Maleki, R., Movaghar, A.: A genetic-based scheduling algorithm to minimize the makespan of the grid applications. In: Kim, T., Yau, S., Gervasi, O., Kang, B., Stoica, A. (eds.) Grid and Distributed Computing, Control and Automation. CCIS, vol. 121, pp. 22–31. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Parsa, S., Entezari-Maleki, R.: RASA: A new grid task scheduling algorithm. International Journal of Digital Content Technology and its Applications 3(4), 91–99 (2009)

    Google Scholar 

  6. Sathya, S.S., Babu, K.S.: Survey of fault tolerant techniques for grid. Computer Science Review 4(2), 101–120 (2010)

    Article  Google Scholar 

  7. Oh, Y., Son, S.H.: Scheduling real-time tasks for dependability. Journal of Operational Research Society 48(6), 629–639 (1997)

    Article  MATH  Google Scholar 

  8. Ghosh, S., Melhem, R., Mosse, D.: Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems 8(3), 272–284 (1997)

    Article  Google Scholar 

  9. Manimaran, G., Murthy, C.S.R.: A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis. IEEE Transactions on Parallel and Distributed Systems 9(11), 1137–1152 (1998)

    Article  Google Scholar 

  10. Al-Omari, R., Somani, A., Manimaran, G.: A new fault-tolerant technique for improving schedulability in multiprocessor real-time systems. In: The 15th International Parallel and Distributed Processing Symposium, pp. 32–39 (2001)

    Google Scholar 

  11. Zheng, Q., Veeravalli, B., Tham, C.K.: Fault-tolerant scheduling of independent tasks in computational grid. In: The 10th IEEE International Conference on Communications Systems, pp. 1–5 (2006)

    Google Scholar 

  12. Zheng, Q., Veeravalli, B., Tham, C.K.: On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Transactions on Computers 58(3), 380–393 (2009)

    Article  MathSciNet  Google Scholar 

  13. Hashimito, K., Tsuchiya, T., Kikuno, T.: A new approach to realizing fault-tolerant multiprocessor scheduling by exploiting implicit redundancy. In: The 27th International Symposium on Fault-Tolerant Computing, pp. 174–183 (1997)

    Google Scholar 

  14. Girault, A., Kalla, H., Sighireanu, M., Sore, Y.: An algorithm for automatically obtaining distributed and fault-tolerant static schedules. In: International Conference on Dependable Systems and Networks, pp. 159–168 (2003)

    Google Scholar 

  15. Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys 31(4), 406–471 (1999)

    Article  Google Scholar 

  16. Cordeiro, D., Mouni, G., Perarnau, S., Trystram, D., Vincent, J.M., Wagner, F.: Random graph generation for scheduling simulations. In: The 3rd International ICST Conference on Simulation Tools and Techniques, pp. 60:1-60:10 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tabbaa, N., Entezari-Maleki, R., Movaghar, A. (2011). A Fault Tolerant Scheduling Algorithm for DAG Applications in Cluster Environments. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds) Digital Information Processing and Communications. ICDIPC 2011. Communications in Computer and Information Science, vol 188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22389-1_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22389-1_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22388-4

  • Online ISBN: 978-3-642-22389-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics