Abstract
Fault tolerance plays a key role in computational grid. It enables a system to work smoothly in the presence of one or more failure components. The components are failing due to some unavoidable reasons like power failure, network failure, system failure, etc. In this chapter, we address the problem of machine failure in computational grid. The proposed system model uses the round trip time to detect the failure, and it uses the checkpointing strategy to recover from the failure. This model is applied to the traditional immediate mode heuristics such as minimum execution time (MET) and minimum completion time (MCT) (defined as MXT). The proposed Fault-Tolerant MET (FTMET) and Fault-Tolerant MCT (FTMCT) heuristics (defined as FTMXT) are simulated using MATLAB. The experimental results are discussed and compared with the traditional heuristics. The results show that the proposed approaches bypass the permanent failure and reduce the makespan.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Medeiros, R., Cirne, W., Brasileiro, F., Sauve, J.: Faults in Grids: why are they so bad and what can be done about it. In: Proceedings of the Fourth International Workshop on Grid Computing. (2003)
Murshed, M., Buyya, R., Abramson, D.: GridSim: A Toolkit for the Modeling and Simulation of Global Grids, pp. 1–15. Monash University Journal. (2001)
Vasques, J., Veiga, L.: A decentralized utility-based grid scheduling algorithm. In: 28th Annual ACM Symposium on Applied Computing, pp. 619–624. (2013)
Li, M., Xiong, N., Yang, B., Li, Z., Park, J.H., Lee, C.: Posted price model based on GRS and its optimization for improving grid resource sharing efficiency. Telecommun. Syst. 55(1), 71–79 (2014)
Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic mapping of a class of independent tasks onto heterogeneous computing systems. J. Parallel Distrib. Comput. 59, 107–131 (1999)
Sadashiv, N., Kumar, S.M.D.: Cluster, grid and Cloud computing: a detailed comparison. In: IEEE 6th International Conference on Computer Science and Education, Singapore, pp. 477–482. (2011)
Ergu, D., Kou, G., Peng, Y., Shi, Y., Shi, Y.: The analytic hierarchy process: task scheduling and resource allocation in cloud computing environment. J. Supercomput. 64, 835–848 (2013). Springer
Mushtaq, H., Al-Ars, Z., Bertels, K.: Survey of fault tolerance techniques for shared memory multicore/multiprocessor systems. In: IEEE 6th International Design and Test Workshop, Beirut, Lebanon, pp. 12–17. (2011)
Treaster, M.: A Survey of Fault-Tolerance and Fault-Recovery Techniques in Parallel Systems. National Center for Supercomputing Applications. University of Illinois. (2005)
Nazir, B., Khan, T.: Fault tolerant job scheduling in computational grid. In: IEEE 2nd International Conference on Emerging Technologies, Peshawar, Pakistan, pp. 708–713. (2006)
Guo, S., Huang, H., Wang, Z., Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. IEEE Trans. Reliab. 60, 263–274 (2011)
Khanli, L.M., Far, M.E., Rahmani, A.M.: RFOH: a new fault tolerant job scheduler in grid computing. In: IEEE 2nd International Conference on Computer Engineering and Applications, Bali, Indonesia, pp. 422–425. (2010)
Upadhyay, N., Misra, M.: Incorporating fault tolerance in GA-based scheduling in grid environment. In: IEEE World Congress Information and Communication Technologies, Mumbai, India, pp. 772–777. (2011)
Nanthiya, D., Keerthika, P.: Load balancing GridSim architecture with fault tolerance. In: International Conference on Information Communication and Embedded Systems, Chennai, India, pp. 425–428. (2013)
Duarte, E.P., Weber, A., Fonseca, K.V.O.: Distributed diagnosis of dynamic events in partitionable arbitrary topology networks. IEEE Trans. Parallel Distrib. Syst. 23, 1415–1426 (2012)
Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B.: A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61, 810–837 (2001)
Panda, S.K., Khilar, P.M., Mohapatra, D.P.: FTM2: fault tolerant batch mode heuristics in computational grid. In: 10th International Conference on Distributed Computing and Internet Technology. Lecture Notes in Computer Science, vol. 8337, pp. 98–104. (2013)
Panda, S.K.: Efficient scheduling heuristics for independent tasks in computational grids. M. Tech. thesis, National Institute of Technology Rourkela (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer India
About this paper
Cite this paper
Panda, S.K., Khilar, P.M., Mohapatra, D.P. (2015). FTMXT: Fault-Tolerant Immediate Mode Heuristics in Computational Grid. In: Rajsingh, E., Bhojan, A., Peter, J. (eds) Informatics and Communication Technologies for Societal Development. Springer, New Delhi. https://doi.org/10.1007/978-81-322-1916-3_11
Download citation
DOI: https://doi.org/10.1007/978-81-322-1916-3_11
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-1915-6
Online ISBN: 978-81-322-1916-3
eBook Packages: EngineeringEngineering (R0)