Skip to main content

FTM2: Fault Tolerant Batch Mode Heuristics in Computational Grid

  • Conference paper
Distributed Computing and Internet Technology (ICDCIT 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8337))

Abstract

Task scheduling is a complicated work in a grid computing environment because the resources are extremely unpredictable. In addition, there are many resources with varying functionalities. More importantly both resources and users are generally in different domains. They may join or leave at any period of time due to administrative reason, network failure or machine failure. It may degrade the performance of computational grid. So, we need a fault tolerant approach to work smoothly in the presence of failure. In this paper, we address the problem of machine failure in computational grid. The proposed system model uses the Round Trip Time (RTT) to detect the failure and the checkpointing strategy to recover from the failure. This model is applied to the traditional batch mode heuristics such as Min-Min and Max-Min. The proposed Fault Tolerant Min-Min (FTMin-Min) heuristic and Fault Tolerant Max-Min heuristic (FTMax-Min) (combinedly FTM2) are simulated using MATLAB. The experimental results are discussed and compared with the traditional heuristics. The results show that these approaches bypass the permanent machine failure and reduce the makespan.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buyya, R.: High Performance Cluster Computing. Pearson Education (2008)

    Google Scholar 

  2. Murshed, M., Buyya, R., Abramson, D.: GridSim: A Toolkit for the Modeling and Simulation of Global Grids, pp.1–15. Monash University

    Google Scholar 

  3. Duarte, E.P., Weber, A., Fonseca, K.V.O.: Distributed Diagnosis of Dynamic Events in Partitionable Arbitrary Topology Networks. IEEE Transactions on Parallel and Distributed Systems 23, 1415–1426 (2012)

    Article  Google Scholar 

  4. BOINC, http://boinc.berkeley.edu/ (accessed on September 1, 2013)

  5. SETI@home, http://setiathome.berkeley.edu/ (accessed on September 1, 2013)

  6. Medeiros, R., Cirne, W., Brasileiro, F., Sauve, J.: Faults in Grids: Why are they so Bad and What can be done about it. In: Proceedings of the Fourth International Workshop on Grid Computing (2003)

    Google Scholar 

  7. Nazir, B., Khan, T.: Fault Tolerant Job Scheduling in Computational Grid. In: IEEE 2nd International Conference on Emerging Technologies, pp. 708–713 (2006)

    Google Scholar 

  8. Priya, S.B., Prakash, M., Dhawan, K.K.: Fault Tolerance-Genetic Algorithm for Grid Task Scheduling using Check Point. In: IEEE 6th International Conference on Grid and Cooperative Computing, pp. 676–680 (2007)

    Google Scholar 

  9. Khanli, L.M., Far, M.E., Rahmani, A.M.: RFOH: A New Fault Tolerant Job Scheduler in Grid Computing. In: IEEE 2nd International Conference on Computer Engineering and Applications, pp. 422–425 (2010)

    Google Scholar 

  10. Egwutuoha, I.P., Levy, D., Selic, B., Chen, S.: A Survey of Fault Tolerance Mechanisms and Checkpoint / Restart Implementations for High Performance Computing Systems. Journal of Supercomputering 65, 1302–1326 (2013)

    Article  Google Scholar 

  11. Abawajy, J.H.: Fault-Tolerant Scheduling Policy for Grid Computing Systems. In: 18th International Parallel and Distributed Processing Symposium. IEEE (2004)

    Google Scholar 

  12. Anglano, C., Brevik, J., Canonico, M., Nurmi, D., Wolski, R.: Fault-aware Scheduling for Bag-of-Tasks Applications on Desktop Grids. In: Grid Computing Conference. IEEE (2006)

    Google Scholar 

  13. Upadhyay, N., Misra, M.: Incorporating Fault Tolerance in GA-based Scheduling in Grid Environment. In: IEEE World Congress Information and Communication Technologies, pp. 772–777 (2011)

    Google Scholar 

  14. Guo, S., Huang, H., Wang, Z., Xie, M.: Grid Service Reliability Modeling and Optimal Task Scheduling Considering Fault Recovery. IEEE Transactions on Reliability 60, 263–274 (2011)

    Article  Google Scholar 

  15. Braun, T.D., Siegel, H.J., Beck, N., Boloni, L.L., Maheswaran, M., Reuther, A.I., Robertson, J.P., Theys, M.D., Yao, B.: A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems. Journal of Parallel and Distributed Computing 61, 810–837 (2001)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Panda, S.K., Khilar, P.M., Mohapatra, D.P. (2014). FTM2: Fault Tolerant Batch Mode Heuristics in Computational Grid. In: Natarajan, R. (eds) Distributed Computing and Internet Technology. ICDCIT 2014. Lecture Notes in Computer Science, vol 8337. Springer, Cham. https://doi.org/10.1007/978-3-319-04483-5_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-04483-5_11

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-04482-8

  • Online ISBN: 978-3-319-04483-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics