Abstract
Task scheduling is a complicated work in a grid computing environment because the resources are extremely unpredictable. In addition, there are many resources with varying functionalities. More importantly both resources and users are generally in different domains. They may join or leave at any period of time due to administrative reason, network failure or machine failure. It may degrade the performance of computational grid. So, we need a fault tolerant approach to work smoothly in the presence of failure. In this paper, we address the problem of machine failure in computational grid. The proposed system model uses the Round Trip Time (RTT) to detect the failure and the checkpointing strategy to recover from the failure. This model is applied to the traditional batch mode heuristics such as Min-Min and Max-Min. The proposed Fault Tolerant Min-Min (FTMin-Min) heuristic and Fault Tolerant Max-Min heuristic (FTMax-Min) (combinedly FTM2) are simulated using MATLAB. The experimental results are discussed and compared with the traditional heuristics. The results show that these approaches bypass the permanent machine failure and reduce the makespan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buyya, R.: High Performance Cluster Computing. Pearson Education (2008)
Murshed, M., Buyya, R., Abramson, D.: GridSim: A Toolkit for the Modeling and Simulation of Global Grids, pp.1–15. Monash University
Duarte, E.P., Weber, A., Fonseca, K.V.O.: Distributed Diagnosis of Dynamic Events in Partitionable Arbitrary Topology Networks. IEEE Transactions on Parallel and Distributed Systems 23, 1415–1426 (2012)
BOINC, http://boinc.berkeley.edu/ (accessed on September 1, 2013)
SETI@home, http://setiathome.berkeley.edu/ (accessed on September 1, 2013)
Medeiros, R., Cirne, W., Brasileiro, F., Sauve, J.: Faults in Grids: Why are they so Bad and What can be done about it. In: Proceedings of the Fourth International Workshop on Grid Computing (2003)
Nazir, B., Khan, T.: Fault Tolerant Job Scheduling in Computational Grid. In: IEEE 2nd International Conference on Emerging Technologies, pp. 708–713 (2006)
Priya, S.B., Prakash, M., Dhawan, K.K.: Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point. In: IEEE 6th International Conference on Grid and Cooperative Computing, pp. 676–680 (2007)
Khanli, L.M., Far, M.E., Rahmani, A.M.: RFOH: A New Fault Tolerant Job Scheduler in Grid Computing. In: IEEE 2nd International Conference on Computer Engineering and Applications, pp. 422–425 (2010)
Egwutuoha, I.P., Levy, D., Selic, B., Chen, S.: A Survey of Fault Tolerance Mechanisms and Checkpoint / Restart Implementations for High Performance Computing Systems. Journal of Supercomputering 65, 1302–1326 (2013)
Abawajy, J.H.: Fault-Tolerant Scheduling Policy for Grid Computing Systems. In: 18th International Parallel and Distributed Processing Symposium. IEEE (2004)
Anglano, C., Brevik, J., Canonico, M., Nurmi, D., Wolski, R.: Fault-aware Scheduling for Bag-of-Tasks Applications on Desktop Grids. In: Grid Computing Conference. IEEE (2006)
Upadhyay, N., Misra, M.: Incorporating Fault Tolerance in GA-based Scheduling in Grid Environment. In: IEEE World Congress Information and Communication Technologies, pp. 772–777 (2011)
Guo, S., Huang, H., Wang, Z., Xie, M.: Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery. IEEE Transactions on Reliability 60, 263–274 (2011)
Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B.: A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems. Journal of Parallel and Distributed Computing 61, 810–837 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Panda, S.K., Khilar, P.M., Mohapatra, D.P. (2014). FTM2: Fault Tolerant Batch Mode Heuristics in Computational Grid. In: Natarajan, R. (eds) Distributed Computing and Internet Technology. ICDCIT 2014. Lecture Notes in Computer Science, vol 8337. Springer, Cham. https://doi.org/10.1007/978-3-319-04483-5_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-04483-5_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-04482-8
Online ISBN: 978-3-319-04483-5
eBook Packages: Computer ScienceComputer Science (R0)