Energy-Aware Resource Scheduling with Fault-Tolerance in Edge Computing

Xue, Yanfen; Fan, Guisheng; Yu, Huiqun; Sun, Huaiying

doi:10.1007/978-3-030-30709-7_28

Yanfen Xue^13,14,
Guisheng Fan¹³,
Huiqun Yu¹³ &
…
Huaiying Sun¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11783))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1197 Accesses
1 Citations

Abstract

Edge computing extends computation and storage resources to the edge of the network, which largely improve the performance problem of cloud computing incurred by the bandwidth limitation. And it still needs to address the challenges of energy and reliability. In this paper, we propose an energy-aware fault-tolerant resource scheduling algorithm to improve system reliability while minimizing the energy consumption. We allocate resources by reliability and energy-aware resource scheduling method for tasks firstly. Then, CPU temperature prediction and time between failures (TBF) prediction are used to trigger proactive fault tolerance mechanism (VM migration). The experimental results show that the reliability is greatly improved and energy consumption generated by VM migration is not very large compared to other methods.

You have full access to this open access chapter, Download conference paper PDF

Reliability-Aware Green Scheduling Algorithm in Cloud Computing

A novel method for adaptive fault tolerance during load balancing in cloud computing

Article 13 July 2017

Fault-Tolerant Cloud System Based on Fault Tree Analysis

Keywords

1 Introduction

Recently, edge computing is seen as an effective solution to the problem of more larger data, which has the advantages of shorter response time and service quality [1]. However, the problems of reliability are still urgent to be solved. The existing fault-tolerant methods can be divided into two categories: reactive and proactive methods. It is well known that reactive schemes will produce low average utilization of resources when the application behavior is highly dynamic. Instead of a reactive scheme, the proactive scheme that adopts a scheme of fault prediction [2,3,4,5] can effectively improve the utilization of resources. However, they only consider a single factor when predicting failures, which greatly affects the accuracy of the prediction results.

In this paper, we jointly consider the CPU temperature and time between failures (TBF) of the host to achieve fault prediction and propose an energy-aware fault-tolerant resource scheduling algorithm to improve the reliability while reducing the energy consumption. Specifically, we use the reliability and energy-aware resource scheduling [2] to allocate resources for tasks firstly. During the tasks execution, the fault tolerance mechanism (VM migration) will be triggered once the temperature reaches the upper threshold or the predicted failure time.

The rest of this paper is organized as follows. The system model is presented in Sect. 2 and follow is the resource scheduling algorithm. The simulation experiments are conducted in Sect. 4. Section 5 summarizes the paper.

2 Fault-Tolerance Resource Scheduling Model

As shown in Fig. 1, the system is mainly divided into two layers. The Users Layer is the producer and consumer of data. The Edge Cloud Layer is the data processing layer that consists of physical resources. Users submit their application to Edge Cloud layer. Then, the physical resources are allocated to tasks by resource management system (RSM). And in order to improve the reliability of system, the system can migrate the running VM from the deteriorating host to other host by RSM.

In this paper, we use the Bag-of-Task (BoT) application which consists of a set of independent tasks. The tasks in each BoT are defined as $T = \left\{ {tas{k_i}|1 \le i \le n} \right\} $. $l_i$ is the length of the task ${task}_i$, which directly affects the execution time, $T_i^{ex}$. Each task ${task}_i$ is allocated to a virtual machine ${vm_j} \in VM$. Each virtual machine $vm_j$ run a set of tasks ${T_j} \in T$. In addition, $N = \left\{ {nod{e_k}|1 \le k \le x} \right\} $ denotes the set of the physical hosts on the edge cloud.

2.1 Failure Prediction Model

CPU Temperature Prediction: We use the simulation prediction function model of CPU temperature [3] as one of the methods to predict the host failure time as follow:

$$\begin{aligned} f(t|A,\omega ,{t_i},{t_{i + 1}}) = \left\{ {\begin{array}{*{20}{c}} {{e^t}}&{}{0 \le t \le {t_i}}\\ {{e^{{t_i}}}}&{}{{t_i} \le t \le {t_{i + 1}}}\\ {A\sin (\omega t - \omega {t_{i + 1}}) + {e^{{t_i}}}}&{}{{t_{i + 1}} \le t \le {t_{i + 2}}} \end{array}} \right. \end{aligned}$$

(1)

where i is the positive integer set; $t_i$ is a fixed value calculated by $e^{t_i}=35$; $e^{t_i}$ is the temperature when CPU is idle, which is always ${35{\,}^\circ }\mathrm{{C}}$; $t_{i+1}$ is a random value; $t_{i+2}$ is calculated by $t_{i+2}=\pi /\omega + t_{i+1}$; A is the amplitude(lower than ${68{\,}^\circ }\mathrm{{C}}$); $\omega $ represents the duration of the CPU execution load.

Time Between Failures Prediction: In addition to the CPU temperature prediction, the method called exponential smoothing [2] is used to predict the TBF. Suppose there is a set of TBFs for the host ${node}_k$, ${TBF}_k = \left\{ {{tbf}_t|1 \le t \le n} \right\} $. Then, the prediction corresponding to ${tbf}_{t+1}$ can be calculated as :

$$\begin{aligned} \left( {tb{f_k}} \right) _{t + 1}^\prime = \left\{ {\begin{array}{*{20}{c}} {\alpha \times {{(tb{f_k})}_t} + ((1 - \alpha ) \times \left( {tb{f_k}} \right) _t^\prime ),}&{}{n > 1}\\ {\left( {tb{f_k}} \right) _t^\prime }&{}{otherwise} \end{array}} \right. \end{aligned}$$

(2)

where ${({tbf}_k)}_t$ is the actual value of the TBF, ${({tbf}_k)}_t^{\prime }$ is the predicted value of the TBF at time t. $\alpha $ is the smoothing constant.

2.2 Energy Consumption Model

Let ${vm}_j$ be the VM running on ${node}_k$ with utilization $u_j$. Then the energy consumption of the task ${task}_i$ running on ${vm}_j$ can be calculated as

$$\begin{aligned} {E_{ij}} = ({P_k}({u_j}) \times T_{ij}^{ex}) + {E_{extr{a_{ij}}}} \end{aligned}$$

(3)

where $E_{{extra}_{ij}}$ is the energy generated by VM migration, which can be calculated by the VM migration overhead model in [2], Similar to [6], ${P_k}({u_j})$ can be calculated by,

$$\begin{aligned} {P_k}({u_j}) = {P_{{{\min }_k}}} + ({P_{{{\max }_k}}} - {P_{{{\min }_k}}}) \times {u_j} \end{aligned}$$

(4)

where $P_{min}$ and $P_{max}$ is the power of node at minimum utilization and maximum utilization, respectively. The utilization $u_j$ of the VM ${vm}_j$ is the sum of the tasks utilization $u_i$ which is calculated by normalizing the task length $l_i$ with the maximum length $l_{max}$ in B.

3 Energy-Aware Fault-Tolerant Resource Scheduling Algorithm

Given the set of tasks BoT B and the resource configurations of data center. Algorithm 1 is used to configure resources for tasks. Firstly, the Best Fit Bin Packing algorithm [2] is used to allocate the tasks to the VM. Then, the reliability and energy-aware strategy is used to configure physical resources for VMs (lines 1–10). During task execution, once the temperature of the node reaches the upper threshold or the predicted fault time, the VM migration will be triggered. The VM running on deteriorating node selects another node through Algorithm 1 to implement the migration.

4 Performance Evaluation

We do the simulation experiments by extending the simulator ‘CloudSim’ [3] and download the Grid5000 failure dataset from Fault Tracking Archive (FTA) [2] and select the clusters, G1/site1/c1, as the edge cloud data center. Parameter configuration model in [2] is used to match the configuration for each node and generation the BoTs workload which consist of tasks between 2000 and 3000. In order to evaluate the performance of the proposed algorithm (Tem/Tbf), we compare our method with other fault-tolerant strategies. Specifically, we denote ‘NoFT’ as the method with no fault tolerance mechanism. ‘Restr’, ‘Pre-Tem’, ‘Pre-Tbf’, ‘Tem/Tbf’ as the method with resubmission, CPU temperature, TBF, CPU temperature and TBF prediction as the fault tolerant strategy, respectively.

4.1 Experimental Results

Figure 2 shows the task completion rate and the energy consumption is given in Fig. 3. We can see that the task completion rate and energy consumption is the highest when using Restr method. And among using fault prediction as the fault-tolerant strategy, the extra energy by using Tem/Tbf prediction is only 30 Kwh higher than the other two cases. If using task completion rate to measure the reliability of the system, it is the most reliable by using Restr method, but the excessive energy which will greatly influence interests of operators. And when using Tem/Tbf method, the reliability is much higher than the other two proactive strategies and the increased energy is not large. Therefore, the method we proposed(Tem/Tbf) is more effective.

5 Conclusions

In this paper, we study how to improve the reliability of the edge cloud system while reducing energy consumption as much as possible. We use the reliability and energy-aware resource scheduling algorithm to allocate physical resources for tasks firstly. Then, CPU temperature prediction and time between failures prediction are used to achieve fault tolerance. Comparison with other fault-tolerant strategies, the method we proposed is more effective.

References

Mukherjee, M., Shu, L., Wang, D., et al.: Survey of fog computing: fundamental, network applications, and research challenges. IEEE Commun. Surv. Tutorials 20(3), 1826–1857 (2018)
Article Google Scholar
Sharma, Y., Si, W., Sun, D., et al.: Failure-aware energy-efficient VM consolidation in cloud computing systems. Future Gener. Comput. Syst. 94, 620–633 (2019)
Article Google Scholar
Liu, J., Wang, S., Zhou, A., et al.: Using proactive fault-tolerance approach to enhance cloud service reliability. IEEE Trans. Cloud Comput. 6(4), 1191–1202 (2018)
Article Google Scholar
Charity, T.J., Hua, G.C.: Resource reliability using fault tolerance in cloud computing. In: 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, pp. 65–71 (2016)
Google Scholar
Liu, J., Wang, S., Zhou, A., et al.: PFT-CCKP: a proactive fault tolerance mechanism for data center network. In: 2015 IEEE 23rd International Symposium on Quality of Service (IWQoS), Portland, pp. 79–80 (2015)
Google Scholar
Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing. Future Gener. Comput. Syst. 28(5), 755–768 (2012)
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by the NSF of China under Grant nos. 61702334 and 61772200, Shanghai Pujiang Talent Program under Grant no. 17PJ1401900, Shanghai Municipal Natural Science Foundation under Grant nos. 17ZR1406900 and 17ZR1429700, Action Plan for Innovation on Science and Technology Projects of Shanghai under Grant no. 16511101000, Collaborative Innovation Foundation of Shanghai Institute of Technology under Grant no. XTCX2016-20, and Educational Research Fund of ECUST under Grant no. ZH1726108.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, China
Yanfen Xue, Guisheng Fan, Huiqun Yu & Huaiying Sun
Shanghai Key Laboratory of Computer Software Evaluating and Testing, Shanghai, China
Yanfen Xue

Authors

Yanfen Xue
View author publications
You can also search for this author in PubMed Google Scholar
Guisheng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Huiqun Yu
View author publications
You can also search for this author in PubMed Google Scholar
Huaiying Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guisheng Fan .

Editor information

Editors and Affiliations

Shanghai University of Finance and Economics, Shanghai, China
Xiaoxin Tang
Shanghai Jiao Tong University, Shanghai, China
Quan Chen
IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Pradip Bose
Tsinghua University, Beijing, China
Weiming Zheng
University of California, Irvine, CA, USA
Jean-Luc Gaudiot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, Y., Fan, G., Yu, H., Sun, H. (2019). Energy-Aware Resource Scheduling with Fault-Tolerance in Edge Computing. In: Tang, X., Chen, Q., Bose, P., Zheng, W., Gaudiot, JL. (eds) Network and Parallel Computing. NPC 2019. Lecture Notes in Computer Science(), vol 11783. Springer, Cham. https://doi.org/10.1007/978-3-030-30709-7_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-30709-7_28
Published: 29 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30708-0
Online ISBN: 978-3-030-30709-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)