FTM2: Fault Tolerant Batch Mode Heuristics in Computational Grid

Panda, Sanjaya Kumar; Khilar, Pabitra Mohan; Mohapatra, Durga Prasad

doi:10.1007/978-3-319-04483-5_11

Sanjaya Kumar Panda¹⁷,
Pabitra Mohan Khilar¹⁸ &
Durga Prasad Mohapatra¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8337))

Included in the following conference series:

International Conference on Distributed Computing and Internet Technology

1448 Accesses
7 Citations

Abstract

Task scheduling is a complicated work in a grid computing environment because the resources are extremely unpredictable. In addition, there are many resources with varying functionalities. More importantly both resources and users are generally in different domains. They may join or leave at any period of time due to administrative reason, network failure or machine failure. It may degrade the performance of computational grid. So, we need a fault tolerant approach to work smoothly in the presence of failure. In this paper, we address the problem of machine failure in computational grid. The proposed system model uses the Round Trip Time (RTT) to detect the failure and the checkpointing strategy to recover from the failure. This model is applied to the traditional batch mode heuristics such as Min-Min and Max-Min. The proposed Fault Tolerant Min-Min (FTMin-Min) heuristic and Fault Tolerant Max-Min heuristic (FTMax-Min) (combinedly FTM²) are simulated using MATLAB. The experimental results are discussed and compared with the traditional heuristics. The results show that these approaches bypass the permanent machine failure and reduce the makespan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Buyya, R.: High Performance Cluster Computing. Pearson Education (2008)
Google Scholar
Murshed, M., Buyya, R., Abramson, D.: GridSim: A Toolkit for the Modeling and Simulation of Global Grids, pp.1–15. Monash University
Google Scholar
Duarte, E.P., Weber, A., Fonseca, K.V.O.: Distributed Diagnosis of Dynamic Events in Partitionable Arbitrary Topology Networks. IEEE Transactions on Parallel and Distributed Systems 23, 1415–1426 (2012)
Article Google Scholar
BOINC, http://boinc.berkeley.edu/ (accessed on September 1, 2013)
SETI@home, http://setiathome.berkeley.edu/ (accessed on September 1, 2013)
Medeiros, R., Cirne, W., Brasileiro, F., Sauve, J.: Faults in Grids: Why are they so Bad and What can be done about it. In: Proceedings of the Fourth International Workshop on Grid Computing (2003)
Google Scholar
Nazir, B., Khan, T.: Fault Tolerant Job Scheduling in Computational Grid. In: IEEE 2nd International Conference on Emerging Technologies, pp. 708–713 (2006)
Google Scholar
Priya, S.B., Prakash, M., Dhawan, K.K.: Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point. In: IEEE 6th International Conference on Grid and Cooperative Computing, pp. 676–680 (2007)
Google Scholar
Khanli, L.M., Far, M.E., Rahmani, A.M.: RFOH: A New Fault Tolerant Job Scheduler in Grid Computing. In: IEEE 2nd International Conference on Computer Engineering and Applications, pp. 422–425 (2010)
Google Scholar
Egwutuoha, I.P., Levy, D., Selic, B., Chen, S.: A Survey of Fault Tolerance Mechanisms and Checkpoint / Restart Implementations for High Performance Computing Systems. Journal of Supercomputering 65, 1302–1326 (2013)
Article Google Scholar
Abawajy, J.H.: Fault-Tolerant Scheduling Policy for Grid Computing Systems. In: 18th International Parallel and Distributed Processing Symposium. IEEE (2004)
Google Scholar
Anglano, C., Brevik, J., Canonico, M., Nurmi, D., Wolski, R.: Fault-aware Scheduling for Bag-of-Tasks Applications on Desktop Grids. In: Grid Computing Conference. IEEE (2006)
Google Scholar
Upadhyay, N., Misra, M.: Incorporating Fault Tolerance in GA-based Scheduling in Grid Environment. In: IEEE World Congress Information and Communication Technologies, pp. 772–777 (2011)
Google Scholar
Guo, S., Huang, H., Wang, Z., Xie, M.: Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery. IEEE Transactions on Reliability 60, 263–274 (2011)
Article Google Scholar
Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B.: A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems. Journal of Parallel and Distributed Computing 61, 810–837 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Indian School of Mines, Dhanbad, India
Sanjaya Kumar Panda
Department of Computer Science and Engineering, National Institute of Technology, Rourkela, India
Pabitra Mohan Khilar & Durga Prasad Mohapatra

Authors

Sanjaya Kumar Panda
View author publications
You can also search for this author in PubMed Google Scholar
Pabitra Mohan Khilar
View author publications
You can also search for this author in PubMed Google Scholar
Durga Prasad Mohapatra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Technology & Computer Science, Tata Institute of Fundamental Research, Homi Bhabha Road, Colaba, 400005, Mumbai, India
Raja Natarajan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Panda, S.K., Khilar, P.M., Mohapatra, D.P. (2014). FTM²: Fault Tolerant Batch Mode Heuristics in Computational Grid. In: Natarajan, R. (eds) Distributed Computing and Internet Technology. ICDCIT 2014. Lecture Notes in Computer Science, vol 8337. Springer, Cham. https://doi.org/10.1007/978-3-319-04483-5_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-04483-5_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04482-8
Online ISBN: 978-3-319-04483-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics