A Fault Tolerant Scheduling Algorithm for DAG Applications in Cluster Environments

Tabbaa, Nabil; Entezari-Maleki, Reza; Movaghar, Ali

doi:10.1007/978-3-642-22389-1_18

Nabil Tabbaa³,
Reza Entezari-Maleki³ &
Ali Movaghar³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 188))

Included in the following conference series:

International Conference on Digital Information Processing and Communications

1126 Accesses
7 Citations

Abstract

Fault tolerance is an essential requirement in systems running applications which need a technique to continue execution where some system components are subject to failure. In this paper, a fault tolerant task scheduling algorithm is proposed for mapping task graphs to heterogeneous processing nodes in cluster computing systems. The starting point of the algorithm is a DAG representing an application with information about the tasks. This information consists of the execution time of the tasks on the target system processors, communication times between the tasks having data dependencies, and the number of the processor failures (ε) which should be tolerated by the scheduling algorithm. The algorithm is based on the active replication scheme, and it schedules ε+1 replicas of each task to achieve the required fault tolerance. Simulation results show the efficiency of the proposed algorithm in spite of its lower complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buyya, R.: High Performance Cluster Computing: Architectures and Systems, 1st edn. Prentice Hall PTR, Upper Saddle River (1999)
Google Scholar
Buyya, R.: High Performance Cluster Computing: Programming and Applications, 1st edn. Prentice Hall PTR, Upper Saddle River (1999)
Google Scholar
Sinnen, O.: Task Scheduling for Parallel Systems, 1st edn. John Wiley and Sons Inc., New Jersey (2007)
Book Google Scholar
Entezari-Maleki, R., Movaghar, A.: A genetic-based scheduling algorithm to minimize the makespan of the grid applications. In: Kim, T., Yau, S., Gervasi, O., Kang, B., Stoica, A. (eds.) Grid and Distributed Computing, Control and Automation. CCIS, vol. 121, pp. 22–31. Springer, Heidelberg (2010)
Chapter Google Scholar
Parsa, S., Entezari-Maleki, R.: RASA: A new grid task scheduling algorithm. International Journal of Digital Content Technology and its Applications 3(4), 91–99 (2009)
Google Scholar
Sathya, S.S., Babu, K.S.: Survey of fault tolerant techniques for grid. Computer Science Review 4(2), 101–120 (2010)
Article Google Scholar
Oh, Y., Son, S.H.: Scheduling real-time tasks for dependability. Journal of Operational Research Society 48(6), 629–639 (1997)
Article MATH Google Scholar
Ghosh, S., Melhem, R., Mosse, D.: Fault-tolerance through scheduling of aperiodic tasks in hard real-time multiprocessor systems. IEEE Transactions on Parallel and Distributed Systems 8(3), 272–284 (1997)
Article Google Scholar
Manimaran, G., Murthy, C.S.R.: A fault-tolerant dynamic scheduling algorithm for multiprocessor real-time systems and its analysis. IEEE Transactions on Parallel and Distributed Systems 9(11), 1137–1152 (1998)
Article Google Scholar
Al-Omari, R., Somani, A., Manimaran, G.: A new fault-tolerant technique for improving schedulability in multiprocessor real-time systems. In: The 15th International Parallel and Distributed Processing Symposium, pp. 32–39 (2001)
Google Scholar
Zheng, Q., Veeravalli, B., Tham, C.K.: Fault-tolerant scheduling of independent tasks in computational grid. In: The 10th IEEE International Conference on Communications Systems, pp. 1–5 (2006)
Google Scholar
Zheng, Q., Veeravalli, B., Tham, C.K.: On the design of fault-tolerant scheduling strategies using primary-backup approach for computational grids with low replication costs. IEEE Transactions on Computers 58(3), 380–393 (2009)
Article MathSciNet Google Scholar
Hashimito, K., Tsuchiya, T., Kikuno, T.: A new approach to realizing fault-tolerant multiprocessor scheduling by exploiting implicit redundancy. In: The 27th International Symposium on Fault-Tolerant Computing, pp. 174–183 (1997)
Google Scholar
Girault, A., Kalla, H., Sighireanu, M., Sore, Y.: An algorithm for automatically obtaining distributed and fault-tolerant static schedules. In: International Conference on Dependable Systems and Networks, pp. 159–168 (2003)
Google Scholar
Kwok, Y.K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys 31(4), 406–471 (1999)
Article Google Scholar
Cordeiro, D., Mouni, G., Perarnau, S., Trystram, D., Vincent, J.M., Wagner, F.: Random graph generation for scheduling simulations. In: The 3rd International ICST Conference on Simulation Tools and Techniques, pp. 60:1-60:10 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Nabil Tabbaa, Reza Entezari-Maleki & Ali Movaghar

Authors

Nabil Tabbaa
View author publications
You can also search for this author in PubMed Google Scholar
Reza Entezari-Maleki
View author publications
You can also search for this author in PubMed Google Scholar
Ali Movaghar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, VŠB-TUO, 17. listopadu 15, 708 33, Ostrava-Poruba, Czech Republic
Vaclav Snasel & Jan Platos &
Information Systems Department, King Saud University, 11543, Riyadh, Saudi Arabia
Eyas El-Qawasmeh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tabbaa, N., Entezari-Maleki, R., Movaghar, A. (2011). A Fault Tolerant Scheduling Algorithm for DAG Applications in Cluster Environments. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds) Digital Information Processing and Communications. ICDIPC 2011. Communications in Computer and Information Science, vol 188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22389-1_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-22389-1_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22388-4
Online ISBN: 978-3-642-22389-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics