Skip to main content

Reliability-Aware Workflow Scheduling Using Monte Carlo Failure Estimation in Cloud

  • Conference paper
  • First Online:
Proceedings of International Conference on Communication and Networks

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 508))

Abstract

Cloud Computing is a novel paradigm which offers large-scale resources and services through the Internet. These services are supported by huge data centers with thousands of servers. One of the core issues in Cloud is the proper utilization of computation power. Efficient task scheduling can help utilize the cloud resources up to their capacity. Moreover, in real-world scenarios, it is important to consider the reliability of computation resources at the time of scheduling since the failure of tasks can be critical to both the cloud service provider and the user. In this paper, we proposed a Cloud computing framework to model the failure characteristics of a cloud environment. We developed a Monte Carlo Failure Estimation (MCFE) algorithm that considers Weibull distributed failures in cloud, using Monte Carlo simulation method to determine the probable occurrence of failures and a Failure-Aware Resource Scheduling (FARS) algorithm that considers the reliability of task execution while assigning tasks in a workflow application to virtual machines. In order to analyze the performance of our algorithm, we compared it with the popular scheduling algorithm namely HEFT. For simulation analysis, randomly generated task graphs and task graphs for numerical real world problems like Gaussian Elimination (GE) and Fast Fourier Transformation (FFT) were considered. The simulation results show that the proposed algorithm performs better in real world scenarios where reliability is a critical issue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sadiku, M. N., Musa, S. M., & Momoh, O. D.: Cloud computing: Opportunities and challenges. Potentials, IEEE, 33(1), 34–36. (2014).

    Google Scholar 

  2. Mell, P., & Grance, T.: The NIST definition of cloud computing. (2011).

    Google Scholar 

  3. Garey, M. R., & Johnson, D. S.: Computers and intractability (Vol. 29). wh freeman. (2002).

    Google Scholar 

  4. Topcuoglu, H., Hariri, S., & Wu, M. Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 13(3), 260–274. (2002).

    Google Scholar 

  5. He, X., Sun, X., & Von Laszewski, G.: QoS guided min-min heuristic for grid task scheduling. Journal of Computer Science and Technology, 18(4), 442–451. (2003).

    Google Scholar 

  6. Tang, X., Li, K., Li, R., & Veeravalli, B.: Reliability-aware scheduling strategy for heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 70(9), 941–952. (2010).

    Google Scholar 

  7. Tang, X., Li, K., Qiu, M., & Sha, E. H. M.: A hierarchical reliability-driven scheduling algorithm in grid systems. Journal of Parallel and Distributed Computing, 72(4), 525–535. (2012).

    Google Scholar 

  8. Garraghan, P., Townend, P., & Xu, J.: An empirical failure-analysis of a large-scale cloud computing environment. In High-Assurance Systems Engineering (HASE), 2014 IEEE 15th International Symposium on (pp. 113–120). IEEE. (2014).

    Google Scholar 

  9. Fiondella, L., Gokhale, S. S., & Mendiratta, V. B.: Cloud Incident Data: An Empirical Analysis. In Cloud Engineering (IC2E), 2013 IEEE International Conference on (pp. 241–249). IEEE. (2013).

    Google Scholar 

  10. Mei, J., Li, K., Zhou, X., & Li, K.: Fault-Tolerant Dynamic Rescheduling for Heterogeneous Computing Systems. Journal of Grid Computing, 1–19. (2015).

    Google Scholar 

  11. Guo, S., Huang, H. Z., Wang, Z., & Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. Reliability, IEEE Transactions on, 60(1), 263–274. (2011).

    Google Scholar 

  12. Zio, E.: The Monte Carlo simulation method for system reliability and risk analysis (p. 198p). London: Springer. (2013).

    Google Scholar 

  13. Camarasu-Pop, S., Glatard, T., Da Silva, R. F., Gueth, P., Sarrut, D., & Benoit-Cattin, H.: Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies. Future Generation Computer Systems, 29(3), 728–738. (2013).

    Google Scholar 

  14. Alexander, D.: Application of Monte Carlo simulations to system reliability analysis. In Proceedings of the Twentieth International Pump Users Symposium (pp. 91–94). (2003).

    Google Scholar 

  15. Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1), 23–50. (2011).

    Google Scholar 

  16. Cosnard, M., Marrakchi, M., Robert, Y., & Trystram, D.: Parallel Gaussian elimination on an MIMD computer. Parallel Computing, 6(3), 275–296. (1988).

    Google Scholar 

  17. Chung, Y. C., & Ranka, S.: Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors. In Super computing’92., Proceedings (pp. 512–521). IEEE. (1992).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nidhi Rehani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Rehani, N., Garg, R. (2017). Reliability-Aware Workflow Scheduling Using Monte Carlo Failure Estimation in Cloud. In: Modi, N., Verma, P., Trivedi, B. (eds) Proceedings of International Conference on Communication and Networks. Advances in Intelligent Systems and Computing, vol 508. Springer, Singapore. https://doi.org/10.1007/978-981-10-2750-5_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2750-5_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2749-9

  • Online ISBN: 978-981-10-2750-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics