Abstract
Cloud Computing is a novel paradigm which offers large-scale resources and services through the Internet. These services are supported by huge data centers with thousands of servers. One of the core issues in Cloud is the proper utilization of computation power. Efficient task scheduling can help utilize the cloud resources up to their capacity. Moreover, in real-world scenarios, it is important to consider the reliability of computation resources at the time of scheduling since the failure of tasks can be critical to both the cloud service provider and the user. In this paper, we proposed a Cloud computing framework to model the failure characteristics of a cloud environment. We developed a Monte Carlo Failure Estimation (MCFE) algorithm that considers Weibull distributed failures in cloud, using Monte Carlo simulation method to determine the probable occurrence of failures and a Failure-Aware Resource Scheduling (FARS) algorithm that considers the reliability of task execution while assigning tasks in a workflow application to virtual machines. In order to analyze the performance of our algorithm, we compared it with the popular scheduling algorithm namely HEFT. For simulation analysis, randomly generated task graphs and task graphs for numerical real world problems like Gaussian Elimination (GE) and Fast Fourier Transformation (FFT) were considered. The simulation results show that the proposed algorithm performs better in real world scenarios where reliability is a critical issue.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Sadiku, M. N., Musa, S. M., & Momoh, O. D.: Cloud computing: Opportunities and challenges. Potentials, IEEE, 33(1), 34–36. (2014).
Mell, P., & Grance, T.: The NIST definition of cloud computing. (2011).
Garey, M. R., & Johnson, D. S.: Computers and intractability (Vol. 29). wh freeman. (2002).
Topcuoglu, H., Hariri, S., & Wu, M. Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. Parallel and Distributed Systems, IEEE Transactions on, 13(3), 260–274. (2002).
He, X., Sun, X., & Von Laszewski, G.: QoS guided min-min heuristic for grid task scheduling. Journal of Computer Science and Technology, 18(4), 442–451. (2003).
Tang, X., Li, K., Li, R., & Veeravalli, B.: Reliability-aware scheduling strategy for heterogeneous distributed computing systems. Journal of Parallel and Distributed Computing, 70(9), 941–952. (2010).
Tang, X., Li, K., Qiu, M., & Sha, E. H. M.: A hierarchical reliability-driven scheduling algorithm in grid systems. Journal of Parallel and Distributed Computing, 72(4), 525–535. (2012).
Garraghan, P., Townend, P., & Xu, J.: An empirical failure-analysis of a large-scale cloud computing environment. In High-Assurance Systems Engineering (HASE), 2014 IEEE 15th International Symposium on (pp. 113–120). IEEE. (2014).
Fiondella, L., Gokhale, S. S., & Mendiratta, V. B.: Cloud Incident Data: An Empirical Analysis. In Cloud Engineering (IC2E), 2013 IEEE International Conference on (pp. 241–249). IEEE. (2013).
Mei, J., Li, K., Zhou, X., & Li, K.: Fault-Tolerant Dynamic Rescheduling for Heterogeneous Computing Systems. Journal of Grid Computing, 1–19. (2015).
Guo, S., Huang, H. Z., Wang, Z., & Xie, M.: Grid service reliability modeling and optimal task scheduling considering fault recovery. Reliability, IEEE Transactions on, 60(1), 263–274. (2011).
Zio, E.: The Monte Carlo simulation method for system reliability and risk analysis (p. 198p). London: Springer. (2013).
Camarasu-Pop, S., Glatard, T., Da Silva, R. F., Gueth, P., Sarrut, D., & Benoit-Cattin, H.: Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies. Future Generation Computer Systems, 29(3), 728–738. (2013).
Alexander, D.: Application of Monte Carlo simulations to system reliability analysis. In Proceedings of the Twentieth International Pump Users Symposium (pp. 91–94). (2003).
Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A., & Buyya, R.: CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software: Practice and Experience, 41(1), 23–50. (2011).
Cosnard, M., Marrakchi, M., Robert, Y., & Trystram, D.: Parallel Gaussian elimination on an MIMD computer. Parallel Computing, 6(3), 275–296. (1988).
Chung, Y. C., & Ranka, S.: Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors. In Super computing’92., Proceedings (pp. 512–521). IEEE. (1992).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Rehani, N., Garg, R. (2017). Reliability-Aware Workflow Scheduling Using Monte Carlo Failure Estimation in Cloud. In: Modi, N., Verma, P., Trivedi, B. (eds) Proceedings of International Conference on Communication and Networks. Advances in Intelligent Systems and Computing, vol 508. Springer, Singapore. https://doi.org/10.1007/978-981-10-2750-5_15
Download citation
DOI: https://doi.org/10.1007/978-981-10-2750-5_15
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2749-9
Online ISBN: 978-981-10-2750-5
eBook Packages: EngineeringEngineering (R0)