Reliability-Aware Workflow Scheduling Using Monte Carlo Failure Estimation in Cloud

Rehani, Nidhi; Garg, Ritu

doi:10.1007/978-981-10-2750-5_15

Nidhi Rehani¹⁷ &
Ritu Garg¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 508))

1328 Accesses
4 Citations

Abstract

Cloud Computing is a novel paradigm which offers large-scale resources and services through the Internet. These services are supported by huge data centers with thousands of servers. One of the core issues in Cloud is the proper utilization of computation power. Efficient task scheduling can help utilize the cloud resources up to their capacity. Moreover, in real-world scenarios, it is important to consider the reliability of computation resources at the time of scheduling since the failure of tasks can be critical to both the cloud service provider and the user. In this paper, we proposed a Cloud computing framework to model the failure characteristics of a cloud environment. We developed a Monte Carlo Failure Estimation (MCFE) algorithm that considers Weibull distributed failures in cloud, using Monte Carlo simulation method to determine the probable occurrence of failures and a Failure-Aware Resource Scheduling (FARS) algorithm that considers the reliability of task execution while assigning tasks in a workflow application to virtual machines. In order to analyze the performance of our algorithm, we compared it with the popular scheduling algorithm namely HEFT. For simulation analysis, randomly generated task graphs and task graphs for numerical real world problems like Gaussian Elimination (GE) and Fast Fourier Transformation (FFT) were considered. The simulation results show that the proposed algorithm performs better in real world scenarios where reliability is a critical issue.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Sadiku, M. N., Musa, S. M., & Momoh, O. D.: Cloud computing: Opportunities and challenges. Potentials, IEEE, 33(1), 34–36. (2014).
Google Scholar
Mell, P., & Grance, T.: The NIST definition of cloud computing. (2011).
Google Scholar
Garey, M. R., & Johnson, D. S.: Computers and intractability (Vol. 29). wh freeman. (2002).
Google Scholar
Topcuoglu, H., Hariri, S., & Wu, M. Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 13(3), 260–274. (2002).
Google Scholar
He, X., Sun, X., & Von Laszewski, G.: QoS guided min-min heuristic for grid task scheduling. Journal of Computer Science and Technology, 18(4), 442–451. (2003).
Google Scholar
Tang, X., Li, K., Li, R., & Veeravalli, B.: Reliability-aware scheduling strategy for heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 70(9), 941–952. (2010).
Google Scholar
Tang, X., Li, K., Qiu, M., & Sha, E. H. M.: A hierarchical reliability-driven scheduling algorithm in grid systems. Journal of Parallel and Distributed Computing, 72(4), 525–535. (2012).
Google Scholar
Garraghan, P., Townend, P., & Xu, J.: An empirical failure-analysis of a large-scale cloud computing environment. In High-Assurance Systems Engineering (HASE), 2014 IEEE 15th International Symposium on (pp. 113–120). IEEE. (2014).
Google Scholar
Fiondella, L., Gokhale, S. S., & Mendiratta, V. B.: Cloud Incident Data: An Empirical Analysis. In Cloud Engineering (IC2E), 2013 IEEE International Conference on (pp. 241–249). IEEE. (2013).
Google Scholar
Mei, J., Li, K., Zhou, X., & Li, K.: Fault-Tolerant Dynamic Rescheduling for Heterogeneous Computing Systems. Journal of Grid Computing, 1–19. (2015).
Google Scholar
Guo, S., Huang, H. Z., Wang, Z., & Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. Reliability, IEEE Transactions on, 60(1), 263–274. (2011).
Google Scholar
Zio, E.: The Monte Carlo simulation method for system reliability and risk analysis (p. 198p). London: Springer. (2013).
Google Scholar
Camarasu-Pop, S., Glatard, T., Da Silva, R. F., Gueth, P., Sarrut, D., & Benoit-Cattin, H.: Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies. Future Generation Computer Systems, 29(3), 728–738. (2013).
Google Scholar
Alexander, D.: Application of Monte Carlo simulations to system reliability analysis. In Proceedings of the Twentieth International Pump Users Symposium (pp. 91–94). (2003).
Google Scholar
Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1), 23–50. (2011).
Google Scholar
Cosnard, M., Marrakchi, M., Robert, Y., & Trystram, D.: Parallel Gaussian elimination on an MIMD computer. Parallel Computing, 6(3), 275–296. (1988).
Google Scholar
Chung, Y. C., & Ranka, S.: Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors. In Super computing’92., Proceedings (pp. 512–521). IEEE. (1992).
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Engineering Department, National Institute of Technology, Kurukshetra, Haryana, India
Nidhi Rehani & Ritu Garg

Authors

Nidhi Rehani
View author publications
You can also search for this author in PubMed Google Scholar
Ritu Garg
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nidhi Rehani .

Editor information

Editors and Affiliations

Narsinhbhai Inst. of Comp. Stud. & Mngmt, Professor and Head Narsinhbhai Inst. of Comp. Stud. & Mngmt, Kadi, Mehsana, Gujarat, India
Nilesh Modi
The University of Oklahoma, Director, Telecommunication Engineering The University of Oklahoma, Norman, Oklahoma, Oklahoma, USA
Pramode Verma
GLS University, Dean & Faculty of Computer Technology GLS University, Ahmedabad, Gujarat, India
Bhushan Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rehani, N., Garg, R. (2017). Reliability-Aware Workflow Scheduling Using Monte Carlo Failure Estimation in Cloud. In: Modi, N., Verma, P., Trivedi, B. (eds) Proceedings of International Conference on Communication and Networks. Advances in Intelligent Systems and Computing, vol 508. Springer, Singapore. https://doi.org/10.1007/978-981-10-2750-5_15

Download citation

DOI: https://doi.org/10.1007/978-981-10-2750-5_15
Published: 08 April 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2749-9
Online ISBN: 978-981-10-2750-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics