Abstract
Edge computing extends computation and storage resources to the edge of the network, which largely improve the performance problem of cloud computing incurred by the bandwidth limitation. And it still needs to address the challenges of energy and reliability. In this paper, we propose an energy-aware fault-tolerant resource scheduling algorithm to improve system reliability while minimizing the energy consumption. We allocate resources by reliability and energy-aware resource scheduling method for tasks firstly. Then, CPU temperature prediction and time between failures (TBF) prediction are used to trigger proactive fault tolerance mechanism (VM migration). The experimental results show that the reliability is greatly improved and energy consumption generated by VM migration is not very large compared to other methods.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Recently, edge computing is seen as an effective solution to the problem of more larger data, which has the advantages of shorter response time and service quality [1]. However, the problems of reliability are still urgent to be solved. The existing fault-tolerant methods can be divided into two categories: reactive and proactive methods. It is well known that reactive schemes will produce low average utilization of resources when the application behavior is highly dynamic. Instead of a reactive scheme, the proactive scheme that adopts a scheme of fault prediction [2,3,4,5] can effectively improve the utilization of resources. However, they only consider a single factor when predicting failures, which greatly affects the accuracy of the prediction results.
In this paper, we jointly consider the CPU temperature and time between failures (TBF) of the host to achieve fault prediction and propose an energy-aware fault-tolerant resource scheduling algorithm to improve the reliability while reducing the energy consumption. Specifically, we use the reliability and energy-aware resource scheduling [2] to allocate resources for tasks firstly. During the tasks execution, the fault tolerance mechanism (VM migration) will be triggered once the temperature reaches the upper threshold or the predicted failure time.
The rest of this paper is organized as follows. The system model is presented in Sect. 2 and follow is the resource scheduling algorithm. The simulation experiments are conducted in Sect. 4. Section 5 summarizes the paper.
2 Fault-Tolerance Resource Scheduling Model
As shown in Fig. 1, the system is mainly divided into two layers. The Users Layer is the producer and consumer of data. The Edge Cloud Layer is the data processing layer that consists of physical resources. Users submit their application to Edge Cloud layer. Then, the physical resources are allocated to tasks by resource management system (RSM). And in order to improve the reliability of system, the system can migrate the running VM from the deteriorating host to other host by RSM.
In this paper, we use the Bag-of-Task (BoT) application which consists of a set of independent tasks. The tasks in each BoT are defined as \(T = \left\{ {tas{k_i}|1 \le i \le n} \right\} \). \(l_i\) is the length of the task \({task}_i\), which directly affects the execution time, \(T_i^{ex}\). Each task \({task}_i\) is allocated to a virtual machine \({vm_j} \in VM\). Each virtual machine \(vm_j\) run a set of tasks \({T_j} \in T\). In addition, \(N = \left\{ {nod{e_k}|1 \le k \le x} \right\} \) denotes the set of the physical hosts on the edge cloud.
2.1 Failure Prediction Model
CPU Temperature Prediction: We use the simulation prediction function model of CPU temperature [3] as one of the methods to predict the host failure time as follow:
where i is the positive integer set; \(t_i\) is a fixed value calculated by \(e^{t_i}=35\); \(e^{t_i}\) is the temperature when CPU is idle, which is always \({35{\,}^\circ }\mathrm{{C}}\); \(t_{i+1}\) is a random value; \(t_{i+2}\) is calculated by \(t_{i+2}=\pi /\omega + t_{i+1}\); A is the amplitude(lower than \({68{\,}^\circ }\mathrm{{C}}\)); \(\omega \) represents the duration of the CPU execution load.
Time Between Failures Prediction: In addition to the CPU temperature prediction, the method called exponential smoothing [2] is used to predict the TBF. Suppose there is a set of TBFs for the host \({node}_k\), \({TBF}_k = \left\{ {{tbf}_t|1 \le t \le n} \right\} \). Then, the prediction corresponding to \({tbf}_{t+1}\) can be calculated as :
where \({({tbf}_k)}_t\) is the actual value of the TBF, \({({tbf}_k)}_t^{\prime }\) is the predicted value of the TBF at time t. \(\alpha \) is the smoothing constant.
2.2 Energy Consumption Model
Let \({vm}_j\) be the VM running on \({node}_k\) with utilization \(u_j\). Then the energy consumption of the task \({task}_i\) running on \({vm}_j\) can be calculated as
where \(E_{{extra}_{ij}}\) is the energy generated by VM migration, which can be calculated by the VM migration overhead model in [2], Similar to [6], \({P_k}({u_j})\) can be calculated by,
where \(P_{min}\) and \(P_{max}\) is the power of node at minimum utilization and maximum utilization, respectively. The utilization \(u_j\) of the VM \({vm}_j\) is the sum of the tasks utilization \(u_i\) which is calculated by normalizing the task length \(l_i\) with the maximum length \(l_{max}\) in B.
3 Energy-Aware Fault-Tolerant Resource Scheduling Algorithm
Given the set of tasks BoT B and the resource configurations of data center. Algorithm 1 is used to configure resources for tasks. Firstly, the Best Fit Bin Packing algorithm [2] is used to allocate the tasks to the VM. Then, the reliability and energy-aware strategy is used to configure physical resources for VMs (lines 1–10). During task execution, once the temperature of the node reaches the upper threshold or the predicted fault time, the VM migration will be triggered. The VM running on deteriorating node selects another node through Algorithm 1 to implement the migration.
4 Performance Evaluation
We do the simulation experiments by extending the simulator ‘CloudSim’ [3] and download the Grid5000 failure dataset from Fault Tracking Archive (FTA) [2] and select the clusters, G1/site1/c1, as the edge cloud data center. Parameter configuration model in [2] is used to match the configuration for each node and generation the BoTs workload which consist of tasks between 2000 and 3000. In order to evaluate the performance of the proposed algorithm (Tem/Tbf), we compare our method with other fault-tolerant strategies. Specifically, we denote ‘NoFT’ as the method with no fault tolerance mechanism. ‘Restr’, ‘Pre-Tem’, ‘Pre-Tbf’, ‘Tem/Tbf’ as the method with resubmission, CPU temperature, TBF, CPU temperature and TBF prediction as the fault tolerant strategy, respectively.
4.1 Experimental Results
Figure 2 shows the task completion rate and the energy consumption is given in Fig. 3. We can see that the task completion rate and energy consumption is the highest when using Restr method. And among using fault prediction as the fault-tolerant strategy, the extra energy by using Tem/Tbf prediction is only 30 Kwh higher than the other two cases. If using task completion rate to measure the reliability of the system, it is the most reliable by using Restr method, but the excessive energy which will greatly influence interests of operators. And when using Tem/Tbf method, the reliability is much higher than the other two proactive strategies and the increased energy is not large. Therefore, the method we proposed(Tem/Tbf) is more effective.
5 Conclusions
In this paper, we study how to improve the reliability of the edge cloud system while reducing energy consumption as much as possible. We use the reliability and energy-aware resource scheduling algorithm to allocate physical resources for tasks firstly. Then, CPU temperature prediction and time between failures prediction are used to achieve fault tolerance. Comparison with other fault-tolerant strategies, the method we proposed is more effective.
References
Mukherjee, M., Shu, L., Wang, D., et al.: Survey of fog computing: fundamental, network applications, and research challenges. IEEE Commun. Surv. Tutorials 20(3), 1826–1857 (2018)
Sharma, Y., Si, W., Sun, D., et al.: Failure-aware energy-efficient VM consolidation in cloud computing systems. Future Gener. Comput. Syst. 94, 620–633 (2019)
Liu, J., Wang, S., Zhou, A., et al.: Using proactive fault-tolerance approach to enhance cloud service reliability. IEEE Trans. Cloud Comput. 6(4), 1191–1202 (2018)
Charity, T.J., Hua, G.C.: Resource reliability using fault tolerance in cloud computing. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 65–71 (2016)
Liu, J., Wang, S., Zhou, A., et al.: PFT-CCKP: a proactive fault tolerance mechanism for data center network. In: 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS), Portland, pp. 79–80 (2015)
Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing. Future Gener. Comput. Syst. 28(5), 755–768 (2012)
Acknowledgements
This work was partially supported by the NSF of China under Grant nos. 61702334 and 61772200, Shanghai Pujiang Talent Program under Grant no. 17PJ1401900, Shanghai Municipal Natural Science Foundation under Grant nos. 17ZR1406900 and 17ZR1429700, Action Plan for Innovation on Science and Technology Projects of Shanghai under Grant no. 16511101000, Collaborative Innovation Foundation of Shanghai Institute of Technology under Grant no. XTCX2016-20, and Educational Research Fund of ECUST under Grant no. ZH1726108.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 IFIP International Federation for Information Processing
About this paper
Cite this paper
Xue, Y., Fan, G., Yu, H., Sun, H. (2019). Energy-Aware Resource Scheduling with Fault-Tolerance in Edge Computing. In: Tang, X., Chen, Q., Bose, P., Zheng, W., Gaudiot, JL. (eds) Network and Parallel Computing. NPC 2019. Lecture Notes in Computer Science(), vol 11783. Springer, Cham. https://doi.org/10.1007/978-3-030-30709-7_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-30709-7_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30708-0
Online ISBN: 978-3-030-30709-7
eBook Packages: Computer ScienceComputer Science (R0)