Models and algorithms for energyefficient scheduling with immediate start of jobs
 733 Downloads
Abstract
We study a scheduling model with speed scaling for machines and the immediate start requirement for jobs. Speed scaling improves the system performance, but incurs the energy cost. The immediate start condition implies that each job should be started exactly at its release time. Such a condition is typical for modern Cloud computing systems with abundant resources. We consider two cost functions, one that represents the quality of service and the other that corresponds to the cost of running. We demonstrate that the basic scheduling model to minimize the aggregated cost function with n jobs is solvable in \(O(n\log n)\) time in the singlemachine case and in \(O(n^{2}m)\) time in the case of m parallel machines. We also address additional features, e.g., the cost of job rejection or the cost of initiating a machine. In the case of a single machine, we present algorithms for minimizing one of the cost functions subject to an upper bound on the value of the other, as well as for finding a Paretooptimal solution.
Keywords
Speed scaling Energy minimization Immediate start Bicriteria optimization1 Introduction
In this paper, we study scheduling models that address two important aspects of modern computing systems: machine speed scaling for time and energy optimization and the requirement to start jobs immediately at the time they are submitted to the system. The first aspect, speed scaling, has been the subject of intensive research since the 1990s, see Yao et al. (1995), and has become particularly important recently, with the increased attention to energysaving demands, see surveys Albers (2009, 2010a), Jing et al. (2013), Gerards et al. (2016). It reflects the ability of modern computing systems to change their clock speeds through the technique known as Dynamic Voltage and Frequency Scaling (DVFS). The higher the speed, the better the performance from users’ perspective, but the energy usage and other computation costs do increase. The goal is to select the right speed value from the full spectrum of speed to achieve a desired tradeoff between performance and energy. DVFS techniques have been successfully applied in Cloud data centers to reduce the energy usage, see, e.g., VonLaszewski et al. (2009), Wu et al. (2014), DoLago et al. (2011).
The second aspect, the immediate start condition, is motivated by the advancements of modern Cloud computing systems, and it is widely accepted by practitioners. This feature is not typical for the traditional scheduling research dealing with scenarios arising from manufacturing. In such systems, jobs compete for limited resources. They often have to wait until resources become available, and job starting times can be delayed if the system is busy.
In modern computing systems (Clouds and data centers), processing units are no longer scarce resources, but quite opposite, abundant resources; see, e.g., Kushida et al. (2015). Clouds give the illusion of infinite computing resources available on demand. Cloud providers agree with customers on a servicelevel agreement (SLA) and sell computing services to customers as utilities. Special mechanisms allow Cloud providers to ensure that the actual demand for resources is met at practically any point in time (Armbrust et al. 2009, 2010; Jennings and Stadler 2015). In the modern competitive market, Cloud providers achieve high availability of resources, promise their customers instant access to resources and allow customers to monitor how that promise is kept (Aceto et al. 2013). These features are unprecedented in the history of IT and have now become a standard.
The infrastructure for Cloud computing systems is provided by data centers. Data centers execute a large number of computing processes (which we call jobs from now on). In order to guarantee ondemand access, the execution of a job needs to be started immediately upon its submission to the system. Customers experiencing waiting times in order to get their jobs started become unsatisfied with the service and are likely to change the provider next time (Armbrust et al. 2010). It is therefore in the interest of providers to start job execution as soon as jobs are submitted to the system. This phenomenon is our motivation for what we call the immediate start condition.
The optimization criteria are typically of two types: those related to the system performance and the qualityofservice (QoS) provision, as well as those related to the operational cost of the processing system. The criteria of the first type may represent the mean flow times of jobs, total (or mean) tardiness or a more general function F defined as the sum of penalty functions of job completion times. The second objective G is the sum of operational costs for using individual resources, each of which depends on the time the resource is used. It can be linear, to model the monetary resource usage cost, or convex, to model the energy consumption cost.
Observing the immediate start condition is one of the key priorities for resource providers, and it is usually included in the QoS protocols. The case study presented by Garg et al. (2013) characterizes a possible waste of execution time due to the resource unavailability. It is estimated as low as 0.5% for customers of Amazon EC2 and 0.1% for Windows Azure customers. The Rackspace web hosting company guarantees 100% network availability and 99.95–99.99% platform availability; see Rackspace (2015); in reality Rackspace often achieves 100% availability of its resources (Garg et al. 2013). Nowadays, special software is being developed in order to strengthen servicelevel agreements (SLAs) for customers by fixing the maximum response time, which can be as small as a few seconds (Iqbal et al. 2009). Due to a strong competition in the area, the customers choose providers who are prepared to demonstrate that handling the submitted jobs is their top priority.
The immediate start requirement is not a seemingly strong assumption, but a fact of today’s life. It is widely accepted in distributed computing, but generally overlooked by the scheduling community, where the traditional perception remains, that of limited resources and acceptable delayed starting times.
In this paper, we initiate the study of the immediate start offline scheduling models, assuming that accurate job characteristics can be available in advance through historical analysis, predicting techniques or a combination of both; see, e.g., Moreno et al. (2014) for offline scenarios in Cloud computing. To satisfy the immediate start condition, we recommend a policy of changing the processing speeds, so that a certain measure of the schedule quality and the cost of speeds (normally understood as energy) are both taken into consideration. The owners of submitted tasks and the providers of processing facilities both want an early completion of tasks, as recorded in the SLAs, and this in practice leads to a nopreemption requirement; see Tian and Zhao (2015). We understand that the models that we address in this paper are rather ideal and simple, but we see our work as a necessary step that should be made before more advanced and practically relevant models are investigated.
In the remainder of this section, we provide a formal definition of the model under study and discuss the relevant literature.
1.1 Definitions and notation
Formally, in the models under consideration, we are given a set of jobs \( N=\left\{ 1,2,\ldots ,n\right\} \). A job \(j\in N\) can be understood as a computational task characterized by its volume or work \(\gamma _{j}\), measured in millions of instructions to be performed on a computing device. Each job \(j\in N\) is associated with a release date \(r_{j}\) before which it is not available. For completeness, assume that \(r_{n+1}=+\,\infty \).
It is also possible that job j is given a due date \(d_{j}\), before which it is desired to complete that job, and/or a deadline \(\bar{d}_{j}\). The due dates \(d_{j}\) are seen as “soft”, i.e., it is possible to violate them, and usually a certain penalty is associated with such a violation. On the other hand, the deadlines \(\bar{d}_{j}\) are “hard”, i.e., in any feasible schedule job j must be completed by time \(\bar{d}_{j}\). If for a job \(j\in N\) no deadline is given, we assume that \(\bar{d}_{j}=+\,\infty \). Additionally, job j can be given weight \(w_{j}\), which indicates its relative importance.
 (i)a traditional scheduling cost function \(f_{j}\left( C_{j}\right) \) , where \(f_{j}\) is a nondecreasing function, so that \(f_{j}\left( C_{j}\right) \) represents the penalty for completing job j at time \(C_{j}\) ; the total cost is then$$\begin{aligned} F=\sum _{j=1}^{n}f_{j}\left( C_{j}\right) =\sum _{j=1}^{n}f_{j}\left( r_{j}+p_{j}\right) ; \end{aligned}$$
 (ii)the speed cost function \(g_{j}\left( s_{j}\right) \), which is often interpreted as the energy that is required for running job j for one time unit at speed \(s_{j}\); the operational cost is then$$\begin{aligned} G=\sum _{j=1}^{n}\frac{\gamma _{j}}{s_{j}}g_{j}\left( s_{j}\right) =\sum _{j=1}^{n}p_{j}g_{j}\left( \frac{\gamma _{j}}{p_{j}}\right) . \end{aligned}$$

\(\Pi _{+}\): it is required to find a feasible schedule that minimizes the aggregated cost \(F+G\);

\(\Pi _{1}\): it is required to find a feasible schedule that minimizes one of the cost functions subject to an upper bound on the other function, e.g., to minimize total energy G subject to an upper bound on the value of F;

\(\Pi _{2}\): it is required to find feasible schedules that simultaneously minimize two cost components, e.g., to find the Paretooptimal solutions for the problem of minimizing total cost F and total energy G.
1.2 Related work
Both features, machine speed scaling and the immediate start condition, have a long history of study. However, so far they have been considered separately and in different contexts. One point of difference is related to preemption, the ability to interrupt and resume job processing at any time. This feature is typically accepted in speed scaling research in order to avoid intractable cases, while it is forbidden in the immediate start model on a single machine and on parallel identical machines. Notice that preemptive version with immediate start should have additional condition on immediate migration and restart, which makes preemption redundant. In what follows, we provide further details about the two streams of research.
The speed scaling research stems from the seminal paper by Yao et al. (1995), who developed an \(O(n^{3})\)time algorithm for preemptive scheduling of n jobs on a single machine within the time windows \(\left[ r_{j},\bar{d}_{j}\right] \) given for each job \(j\in N\). Note that in that paper time windows are treated in the traditional sense, without the immediate start requirement. Subsequent papers by Li et al. (2006, 2014), Albers et al. (2011, 2014) and Angel et al. (2012) proposed improved algorithms for the singlemachine problem and extended this line of research to the multimachine model. The running times of the current fastest algorithms are \(O(n^{2})\) and \(O(n^{4})\) for the singlemachine and parallelmachine cases, see Shioura et al. (2015).
Speed scaling problems which involve not only the speed cost function G, but also a scheduling cost function \(F=\sum _{j=1}^{n}f_{j}(C_{j})\) have been under study since the paper by Pruhs et al. (2008). The two most popular functions are the total completion time \(F_{1}=\sum _{j=1}^{n}C_{j}\) and the total rejection cost \(F_{2}=\sum _{j=1}^{n}w_{j}\mathrm {sgn}\left( \max \left\{ C_{j}\overline{d}_{j},0\right\} \right) \), where \(w_{j}\) is the cost incurred if job j cannot be processed before its deadline and therefore is rejected. Without the immediate start condition, the tractable cases of problems \(\Pi _{+}\) and \(\Pi _{1}\) with objectives \(F_{1}\) and \( F_{2}\) are very limited.
In the case of function \(F_{1}\), the version of problem \(\Pi _{+}\) with equal release dates is solvable in \(O\left( n^{2}m^{2}(n+\log m)\right) \) time, where m is the number of machines; see Bampis et al. (2015). Notice that preemptions are redundant in that model. If jobs are available at arbitrary release times \(r_{j}\), then problem \(\Pi _{1}\) is NPhard even if there is only one machine and preemption is allowed, see Barcelo (2015). For problems with arbitrary release dates and equalwork jobs, preemption allowance makes no difference to an optimal solution, and due to a nonlinear nature of the problem an optimal value of the objective can be found within a chosen accuracy \(\varepsilon \). For example, for problem \(\Pi _{1}\) on a single machine an algorithm by Pruhs et al. (2008) takes \(O(n^{2}\log \frac{\overline{G} }{\varepsilon })\) time, where \(\overline{G}\) is the upper bound on the speed cost function (energy), while for problem \(\Pi _{+}\) on parallel machines an algorithm by Albers and Fujiwara (2007) requires \(O(n^{3}\log \frac{1}{ \varepsilon })\) time. The difficulties associated with arbitrarylength jobs are discussed by Pruhs et al. (2008), Bunde (2009), Barcelo et al. (2013). For the problem of preemptive scheduling on a single discretely controllable machine, Antoniadis et al. (2014) provide an algorithm with time complexity \(O(n^{4}k)\), where k is the number of possible speed values of the processor.
In the speed scaling research, the problems of minimizing the total rejection cost \(F_{2}\) are typically studied as those of maximizing the throughput, defined as the number of jobs that can be processed by their deadlines. Polynomialtime algorithms are known only for special cases, where various conditions are imposed, in addition to the assumption that all jobs have equal weights \(w_{j}\). Notice that strict assumptions of those models make preemption redundant. The singlemachine problem \(\Pi _{1}\) with \(w_{j}=1\) for all \(j\in N\) is solvable in \(O\left( n^{4}\log n\log \left( \sum \gamma _{j}\right) \right) \) time and in \(O\left( n^{6}\log n\log \left( \sum \gamma _{j}\right) \right) \) time, depending on whether the jobs are available simultaneously (\(r_{j}=0\) for all \(j\in N\)) or not; in the latter case, it is further required that release dates and deadlines are agreeable, see Angel et al. (2013). The parallelmachine problem \(\Pi _{1}\) with the jobs of equal size and equal weight (\(\gamma _{j}=w_{j}=1\), \(j\in N\)) is solvable in \( O(n^{12m+9})\) time or in \(O(n^{4m+4}m)\) time, if additionally release dates and deadlines are agreeable, see Angel et al. (2016).
Research on speed scaling problems extends to the design of approximation algorithms and the study of their online versions. Without providing a comprehensive list of results of this type, we refer an interested reader to the survey papers by Albers (2009, 2010a, b) and Bampis (2016).
As far as the immediate start condition is concerned, the most relevant problems studied in the literature fall into the category of interval scheduling. In such models, each job is characterized by time intervals where it can be processed (Kovalyov et al. 2007). One of the most wellstudied versions of interval scheduling assumes that there is only one interval per job \([r_j,\overline{d}_j]\). In interval scheduling, there is no freedom in selecting job starting times and in making preemption: every job \(j\in N\) should start precisely at a given time \(r_{j}\) and complete at a given deadline \(\bar{d}_{j}\). There is also no control over machine speeds, which are fixed and cannot be changed. The decision making consists in (i) selecting a subset of jobs that can be processed within their time intervals and (ii) assigning them to the machines for processing without preemption. The two typical objectives are the job rejection cost, which is defined similarly to the function \(F_{2}\), and the machine usage cost defined typically as the (weighted) number of machines which are selected to process the jobs. Note that unlike the operational cost function G used in our model, the machine usage cost in interval scheduling does not take into account the actual time of using a machine.
Within the broad range of interval scheduling results (see the survey papers by Kolen et al. (2007) and Kovalyov et al. (2007)), those relevant to our study deal with identical parallel machines or uniform machines. In the case of identical parallel machines, the fastest algorithms for minimizing the job rejection cost have time complexity \(O(n\log n)\) if all jobs have equal weights (Carlisle and Lloyd 1995) and \(O(mn\log n)\) if job weights are allowed to be different (Bouzina and Emmons 1996); the fastest algorithm for minimizing the machine usage cost is of time complexity \(O(n\log n)\) if machine weights are equal (Gupta et al. 1979).
The version of the problem with uniform machines is less studied. For uniform machines, both problems, with job rejection cost and machine usage cost, are strongly NPhard; see Nakajima et al. (1982) and Bekki and Azizoğlu (2008). Polynomialtime algorithms, all of time complexity \(O(n\log n)\), are known for the problem of minimizing the machine usage cost, if there are only two types of machines, slow and fast (Nakajima et al. 1982), and for the problem of minimizing the job rejection cost, in one of the following two cases: if all jobs are available simultaneously and have equal weights, or if all jobs have equal volume and there are only two processing machines (Bekki and Azizoğlu 2008).
One more problem related to our study is a relaxed version of interval scheduling, where the jobs are allowed to start at any time after their release dates \(r_{j}\), but they are required to complete exactly at their deadlines \(\overline{d}_{j}\). Such a problem can be considered as a counterpart of our problem, where the jobs are required to start at release dates \(r_{j}\), but they are allowed to complete at any time before deadlines \(\overline{d}_{j}\).
As demonstrated in Leyvand et al. (2010), the counterpart of problem \(\Pi _{+}\) with fixed completion times can be solved in \(O(mn^{2})\) time, while the counterparts of problems \(\Pi _{1}\) and \(\Pi _{2}\) are NPhard. For discrete versions of NPhard problems, Leyvand et al. (2010) develop algorithms of time complexity \(O(mn^{m+1}X_{\max })\), where \(X_{\max }\) is the maximum resource usage cost, \(X_{\max }=\sum _{j=1}^{n}\beta _{j}^{ \mathrm {contr}}\max \left\{ x_{j}\right\} \), assuming that resource amounts \( x_{j}\) are allowed to take only discrete values from a given range.
We study the most general versions of \(\Pi _{+}\), \(\Pi _{1}\) and \(\Pi _{2}\) with arbitrary functions \(f_{j}\left( C_{j}\right) \), reflecting diverse needs of customeroriented qualityofservice provisioning in distributed systems. Problem \(\Pi _{+}\) is solvable in O(n) time on a single machine (Sect. 2), and in \(O(n^{2}m)\) on m parallel machines (Sect. ). The \(\Pi _{1}\) model of minimizing energy G on a single machine subject to an upper bound on the total flow time is handled in Sect. 4; we formulate it as a nonlinear resource allocation problem with continuous variables and explain how it can be solved in \( O(n\log n)\) time. In Sect. 5, we present a method, also of time complexity \(O(n\log n)\), for finding Paretooptimal solutions for the \( \Pi _{2}\) model, in which the functions F and G have to be simultaneously minimized on a single machine. Conclusions are presented in Sect. 6.
2 Problem \(\Pi _{+}\) on a single machine
In this section, we consider the problem of minimizing the sum of the performance cost function F and total energy G on a single machine, provided that each job \(j\in N\) starts immediately at time \(r_{j}\).
For most practically relevant cases, we may assume that for each \(j\in N\) problem (6) can be solved in constant time. Under this assumption, we obtain the following statement.
Theorem 1
The problem \(\Pi _{+}\) of minimizing the sum of total cost F and total energy G on a single machine is solvable in \(O\left( n\right) \) time, provided that the jobs are numbered in accordance with (2) and for each \(j\in N\) problem (6) can be solved in constant time.
Below we present several illustrations, taking two popular scheduling performance measures and, as agreed in Sect. 1, a cubic speed cost function (3). Notice that for the latter function, \( pg_{j}\left( \frac{\gamma _{j}}{p}\right) =\frac{\beta _{j}\gamma _{j}^{3}}{p^{2}}\), \(j\in N\).
For another illustration, assume that job \(j\in N\) is given a “soft” due date \(d_{j}\), but no “hard” deadline \(\bar{d}_{j}\), i.e., \( D_{j}=r_{j+1},\) \(1\le j\le n1\). Suppose that \(f_{j}\left( C_{j}\right) =w_{j}\max \left\{ C_{j}d_{j},0\right\} \), i.e., F represents total weighted tardiness.
In the presented examples, which can be extended to most traditionally used objective functions, the actual processing time \(p_{j}^{*}\) of each job is essentially written in closed form, which justifies our assumption that each problem (6) can be solved in constant time.
3 Problem \(\Pi _{+}\) on parallel machines
Suppose a flow of value m in network H is found. Since the network is acyclic, the arcs with a flow equal to 1 will form m paths from s to t , and the order of arcs of set \(T_{BA}\) in each path defines the sequence of jobs on a machine. A path starts with an arc \(\left( s,A_{j}\right) \), proceeds with pairs of arcs of the form \((A_{j},B_{j})\), \((B_{j},A_{k})\), and concludes with the final pair \((A_{\ell },B_{\ell })\), \((B_{\ell },t)\). An arc \(\left( s,A_{j}\right) \) implies that job j is the first on some machine. A pair \((A_{j},B_{j})\), \((B_{j},A_{k})\) corresponds to scheduling two jobs, j and k, one after another on the same machine, while a pair \( (A_{\ell },B_{\ell })\), \((B_{\ell },t)\) corresponds to assigning job \(\ell \) as the last job on a machine.
The arc costs reflect the selected sequence of jobs on a machine. If a job \( j\in N\) has no “hard” deadline, define \( \bar{d}_{j}=+\,\infty \). For the final pair of the chain \((A_{\ell },B_{\ell }) \), \((B_{\ell },t)\), the cost of scheduling job \(\ell \) as the last job on a machine is equal to the contribution of job \(\ell \in N\) to the objective function. It can be found as the optimal value \(Z_{\ell }^{*}\) for the problem (6) with \(j=\ell \) and \(u_{j}=\bar{d}_{j}\). Thus, for each \(j\in N\), we compute the value \(Z_{j}^{*}\) and assign this value as a cost of the arc \((B_{j},t)\).
For each arc \((A_{j},B_{j})\in T_{AB}\), the cost is set equal to \(M\), where M is a large positive number. This guarantees that every arc \( (A_{j},B_{j})\in T_{AB}\) receives a flow of 1, so that each job \(j\in N\) will be scheduled. If we ignore the costs of the arcs \((A_{j},B_{j})\in T_{AB} \), the total cost of the found flow is equal to the optimal value of the function \(F+G\).
Thus, if one of the paths from s to t visits the sequence of nodes \( (s,A_{j_{1}},B_{j_{1}},A_{j_{2}},B_{j_{2}}, \ldots , \) \(A_{j_{y}}, B_{j_{y}},t) \), then in the associated schedule on some machine the sequence of jobs \((j_{1},j_{2},\ldots ,\) \(j_{y}) \) is processed. The actual processing time \(p_{j_{i}}^{*}\) of job \(j_{i}\), \(1\le i\le y1\), is equal to the value of p that delivers the smallest value of \( Z_{j_{i},j_{i+1}}^{*}\), while for the last job \(j_{y}\) the actual processing \(p_{j_{y}}^{*}\) is defined by the value of p that delivers the smallest value of \(Z_{j_{y}}^{*}\).
As in Sect. 2, we may assume that determining the cost of each arc of network H takes constant time, so that all the costs will be found in \(O\left( n^{2}\right) \) time. The required flow can be found in \( O\left( n^{2}m\right) \) by applying the successive shortest path algorithm, similar to the Ford–Fulkerson algorithm; see Ahuja et al. (1993).
Theorem 2
The problem \(\Pi _{+}\) of minimizing the sum of total cost F and total energy G on m parallel machines is solvable in \(O\left( n^{2}m\right) \) time by finding the minimumcost flow of value m in network H, provided that the cost of each arc of H can be computed in constant time.
The described approach can be extended to the problem of determining the optimal number of parallel machines to be used. This aspect is particularly important in modern computing systems, as there are overheads related to initialization of virtual machines in Clouds, and overheads for activating the machines which are in the sleep mode.
Suppose that using v parallel machines incurs cost \(\sigma _{v}\), \(1\le v\le m\), and we are interested in minimizing \(F+G\) plus additionally the cost \(\sigma _{v}\) of all used machines. This can be done by solving the sequence of flow problems in network H, trying flow values 1, then 2, etc. up to an upper bound m on the machine number. For each tried value of v, \(1\le v\le m\), the function \(F+G+\sigma _{v}\) is evaluated and the best option is taken. The running time for solving the resulting problem remains \( O\left( n^{2}m\right) \), since the successive shortest path algorithm for finding the mincost flow of value m will iteratively find the mincost flows with all required intermediate values 1, \(2,\ldots ,m1\).
Theorem 3
The problem \(\Pi _{+}\) of minimizing the sum of total cost F, total energy G and the cost \(\sigma _{v }\) for using \(v \le m\) machines, where v is a decision variable, is solvable in \(O\left( n^{2}m\right) \) time, under the assumptions of Theorem 2.
A drawback of the model with the aggregated objective function is that it schedules all arrived jobs. In the case of a rather short interval available for processing a job, this can only be achieved if a very high speed is applied, which may be unacceptably expensive. It may appear to be beneficial not to accept certain jobs and to pay an agreed rejection fee.
For example, suppose that the minimumcost flow of value \(\ell \), \(\ell \le m\), in the modified network is found, and one of the paths from s to t visits the sequence of nodes \((s,A_{j_{1}},B_{j_{1}},A_{j_{2}},B_{j_{2}}, \ldots ,A_{j_{y}},B_{j_{y}},t)\). Then, the sequence of accepted jobs \(\left( j_{1},j_{2},\ldots ,j_{y}\right) \) is processed on some machine, and the contribution of job \(j_{i}\) is equal to the cost of the arc that leaves node \(B_{j_{i}}\), found by solving problem (9), plus the cost \(\,\delta _{j_{i}}\) of the arc that enters node \(B_{j_{i}}\), \(1\le i\le y\). The described adjustments do not change the time complexity of the approach.
Theorem 4
The problem \(\Pi _{+}\) in which it is required to determine the set \(N_{R}\) of rejected jobs to minimize the sum of total cost F, total energy G and the cost \(\sum _{j\in N_{R}}\delta _{j}\) is solvable in \(O\left( n^{2}m\right) \) time, under the assumptions of Theorem 2.
4 Problem \(\Pi _{1}\) on a single machine
In this section, we consider the problem of minimizing total energy G subject to a constraint on total cost F on a single machine. The presented solution approach is based on Karush–Kuhn–Tucker (KKT) reasoning in relation to the associated Lagrange function. This approach works for a wide range of functions G and F; however, below for simplicity it is presented for the case that \(F=\sum _{j\in N}\left( C_{j}r_{j}\right) \), i.e., F represents total flow time. Moreover, a natural interpretation of the obtained results occurs if for each \(j\in N\) the energy function \(g_{j}\) is polynomial, strictly convex, decreasing in \( p_{j}\) and jobindependent, e.g., satisfies (3) with \(\beta _{j}=1 \).
The KKT conditions guarantee that there exists a value \(\lambda ^{*}\) such that \(Q^{\prime }\left( \lambda ^{*}\right) =0\). Such a multiplier \( \lambda ^{*}\) and vector \({\mathbf {p}}\left( \lambda ^{*}\right) \) deliver the minimum to the Lagrangian function, so that vector \({\mathbf {p}} \left( \lambda ^{*}\right) \) is a solution to problem (10), i.e., defines the optimal values of the actual processing times.
Theorem 5
The problem \(\Pi _{1}\) of minimizing total energy G on a single machine, subject to the bounded total flow time \(F\le P\), reduces to the nonlinear resource allocation problem and can be solved in \(O\left( n\log n\right) \) time, provided that energy functions \(g_{j}\) are polynomial, strictly convex, decreasing in \(p_{j}\) and jobindependent.
The following remark is useful for justifying the solution method for the bicriteria problem, presented in the next section. Simultaneous equations ( 13) imply that in an optimal solution for each job \(\pi \left( j\right) \), \(1\le j\le k^{*}\), the equality \(p_{\pi \left( j\right) }\left( \lambda ^{*}\right) =u_{\pi \left( j\right) }\) holds, i.e., each of these jobs fully uses the interval \(\left[ r_{\pi \left( j\right) }, r_{\pi \left( j\right) }+u_{\pi \left( j\right) }\right] \) available for its processing. The processing speed of job \(\pi \left( j\right) \), \(1\le j\le k^{*}\), is \(\frac{\gamma _{\pi \left( j\right) }}{u_{\pi \left( j\right) }}=\root 3 \of {\lambda _{\pi \left( j\right) }/2}\). Besides, for \(k^{*}+1\le j\le n\), due to (13), it follows that \(G_{\pi \left( j\right) }^{\prime }\left( p_{\pi \left( j\right) }^{*}\right) =\,\lambda ^{*}\), so that all jobs \(\pi \left( j\right) \), \(k^{*}+1\le j\le n\) , are processed at the same speed \(\root 3 \of {\lambda ^{*}/2}\) and none of these jobs fully uses the available interval. Moreover, since \(\lambda _{\pi \left( 1\right) }\ge \cdots \ge \lambda _{\pi \left( k^{*}\right) }>\lambda ^{*}>\lambda _{\pi \left( k^{*}+1\right) }\ge \cdots \ge \lambda _{\pi \left( n\right) }\), we conclude that the common speed at which each job \(\pi \left( j\right) \), \(k^{*}+1\le j\le n,\) is processed is less than the processing speed of the jobs \(\pi \left( j\right) \), \(1\le j\le k^{*}\).
5 Problem \(\Pi _{2}\) on a single machine
In this section, we describe an approach to solving the bicriteria problem, in which it is required to simultaneously minimize total cost F and total energy G on a single machine. Recall that a schedule \(S^{\prime }\) is called Paretooptimal if there exists no schedule \(S^{\prime \prime }\) such that \(F(S^{\prime \prime })\le F(S^{\prime })\) and \(G(S^{\prime \prime })\le G(S^{\prime })\), where at least one of these inequalities is strict.
Although the outlined approach can be extended to deal with rather general cost functions, below we present it for \(F=\sum _{j=1}^{n}\left( C_{j}r_{j}\right) \) and \(G=\sum _{j=1}^{n}p_{j}g_{j}\left( \frac{\gamma _{j} }{p_{j}}\right) =\sum _{j=1}^{n}\frac{\gamma _{j}^{3}}{p_{j}^{2}}\). The solution of the problem of finding the Pareto optimum is given in the space of variables F and G by (i) a sequence of breakpoints \( F_{0},F_{1},F_{2},\ldots ,F_{\nu }\) of the variable F and (ii) an explicit formula that expresses variable G as a function of variable \(F\in \left[ F_{k},F_{k+1}\right] \) for all \(k=0,1,\ldots .,\nu 1\). As we show below, \( \nu = n\).
Theorem 6
Proof
The fact that the values \(F_{k}\), \(0\le k\le \nu \), are indeed breakpoints and that \(\nu = n\) follows from the structure of an optimal solution of the problem of minimizing total energy G subject to an upper bound on the sum of actual processing times; see (11) and (14) from Sect. 4. For \(F\in (F_{k},F_{k+1}]\) considering the jobs in accordance with the permutation \( \pi \), the actual processing times of the first k jobs are fixed to their upper bounds, while the actual processing times of the remaining jobs are obtained by running these jobs at a common speed s, that decreases starting from \(s_{\pi \left( k\right) }\). The next breakpoint \(F_{k+1}\) occurs when s becomes equal to \(s_{\pi \left( k+1\right) }\). Note that breakpoints \(F_{k}\) and \(F_{k+1}\) coincide if \(s_{\pi \left( k\right) } = s_{\pi \left( k+1\right) }\), but we count them separately so that indeed \( \nu = n\). The last breakpoint \(F_{n}\) corresponds to the situation that the actual processing time of job \(\pi (n)\) is equal to its largest possible value \(u_{\pi (n)}\).
6 Conclusions
In this paper, we address several versions of the scheduling model that combines a wellestablished feature of speed scaling and a requirement of immediate job starting times, that is typical for modern Cloud computing systems. Both objectives are of the minsum type, one depending on the job completion times, and another one on the machine usage cost. We show that the singlemachine model with n jobs can be solved in \(O(n\log n)\) time for two single criterion versions of our problem, \(\Pi _{+}\) and \(\Pi _{1}\), or for the most general bicriteria version \(\Pi _{2}\). The single criterion version \(\Pi _{+}\) of the multimachine model with n jobs and m machines is solvable in \(O(n^{2}m)\) time.

For problem \(\Pi _{1}^{\max }\) (minimizing energy G subject to an upper bound \(\overline{F}\) on the value of \(F^{\max }\)), define deadlines induced by a given value of \(\overline{F}\), eliminate \(F^{\max }\) from consideration by setting \(f_{j}(C_{j})=0\), \(j\in N\), and solve problem \( \Pi _{+}\) to minimize \(G+0\) using the techniques from Sects. 2, 3.

As far as problem \(\Pi _{+}^{\max }\) is concerned, function \(F_{\max }\) is convex in \(p_{j}\) for the most popular minmax scheduling objectives, such as \(F_{\max }\in \left\{ C_{\max },L_{\max }\right\} \). Since the energy component G is also convex in \(p_{j}\), it follows that the objective \(F^{\max }+G\) is convex and its minimum can be found by a numerical method of convex optimization.
Notes
Acknowledgements
This research was supported by the EPSRC funded project EP/J019755/1 “Submodular Optimisation Techniques for Scheduling with Controllable Parameters”. The first author was partially supported by JSPS KAKENHI Grant Numbers 15K00030 and 15H00848.
References
 Aceto, G., Botta, A., de Donato, W., & Pescapè, A. (2013). Cloud monitoring: A survey. Computer Networks, 57, 2093–2115.CrossRefGoogle Scholar
 Ahuja, R. K., Magnanti, T. L., & Orlin, J. B. (1993). Network flows: Theory, algorithms and applications. Englewood Cliffs: Prentice Hall.Google Scholar
 Albers, S. (2009). Algorithms for energy saving. Lecture Notes in Computer Science, 5760, 173–186.CrossRefGoogle Scholar
 Albers, S. (2010a). Energyefficient algorithms. Communications of the ACM, 53, 86–96.CrossRefGoogle Scholar
 Albers, S. (2010b). Algorithms for energy management. Lecture Notes in Computer Science, 6072, 1–11.Google Scholar
 Albers, S., & Fujiwara, H. (2007). Energyefficient algorithms for flow time minimization. ACM Transactions on Algorithms, 3, 49:1–49:17.CrossRefGoogle Scholar
 Albers, S., Antoniadis, A., & Geiner, G. (2011). On multiprocessor speed scaling with migration. In Proceedings of the symposium on parallelism in algorithms and architectures (SPAA) (pp. 279–288).Google Scholar
 Albers, S., Müller, F., & Schmelzer, S. (2014). Speed scaling on parallel processors. Algorithmica, 68, 404–425.CrossRefGoogle Scholar
 Angel, E., Bampis, E., Kacem, F., & Letsios, D. (2012). Speed scaling on parallel processors with migration. Lecture Notes in Computer Science, 7484, 128–140.CrossRefGoogle Scholar
 Angel, E., Bampis, E., Chau, V., & Letsios, D. (2013). Throughput maximization for speedscaling with agreeable deadlines. Lecture Notes in Computer Science, 7876, 10–19.CrossRefGoogle Scholar
 Angel, E., Bampis, E., Chau, V., & Thang, N. K. (2016). Throughput maximization in multiprocessor speedscaling. Theoretical Computer Science, 630, 1–12.CrossRefGoogle Scholar
 Antoniadis, A., Barcelo, N., Consuegra, M., Kling, P., Nugen, M., Pruhs, K., & Scquizzatok, M. (2014). Efficient computation of optimal energy and fractional weighted flow tradeoff schedules. In Proceedings of the 31st international symposium on theoretical aspects of computer science (STACS ’14) (pp. 63–74).Google Scholar
 Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., et al. (2009). Above the clouds: A Berkeley view of cloud computing (p. 28). UCB/EECS, vol: Technical Report, EECS Department, University of California, Berkeley.Google Scholar
 Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., et al. (2010). A view of cloud computing. Communications of the ACM, 53, 55–58.CrossRefGoogle Scholar
 Bampis, E. (2016). Algorithmic issues in energyefficient computation. Lecture Notes in Computer Science, 9869, 3–14.CrossRefGoogle Scholar
 Bampis, E., Letsios, D., & Lucarelli, G. (2015). Green scheduling, flows and matchings. Theoretical Computer Science, 579, 126–136.CrossRefGoogle Scholar
 Bansal, N., Pruhs, K., & Stein, C. (2010). Speed scaling for weighted flow time. SIAM Journal on Computing, 39, 1294–1308.CrossRefGoogle Scholar
 Barcelo, N. (2015). The complexity of speedscaling. Ph.D. thesis, University of Pittsburgh.Google Scholar
 Barcelo, N., Cole, D., Letsios, D., Nugent, M., & Pruhs, K. (2013). Optimal energy tradeoff schedules. Sustainable Computing: Informatics and Systems, 3, 207–217.Google Scholar
 Bekki, Ö. B., & Azizoğlu, M. (2008). Operational fixed interval scheduling problem on uniform parallel machines. International Journal of Production Economics, 112, 756–768.CrossRefGoogle Scholar
 Bouzina, K. I., & Emmons, H. (1996). Interval scheduling on identical machines. Journal of Global Optimization, 9, 379–393.CrossRefGoogle Scholar
 Brooks, D. M., Bose, P., Schuster, S. E., Jacobson, H., Kudva, P. N., Buyuktosunoglu, A., et al. (2000). Poweraware microarchitecture: Design and modeling challenges for nextgeneration microprocessors. IEEE Micro, 20, 26–44.CrossRefGoogle Scholar
 Bunde, D. P. (2009). Poweraware scheduling for makespan and flow. Journal of Scheduling, 12, 489–500.CrossRefGoogle Scholar
 Carlisle, M. C., & Lloyd, E. L. (1995). On the \(k\)coloring of intervals. Discrete Applied Mathematics, 59, 225–235.CrossRefGoogle Scholar
 Chan, S.H., Lam, T.W., & Lee, L.K. (2013). Scheduling for weighted flow time and energy with rejection penalty. Theoretical Computer Science, 470, 93–104.CrossRefGoogle Scholar
 Do Lago, D.G., Madeira, E.R.M., & Bittencourt, L.F. (2011). Poweraware virtual machine scheduling on clouds using active cooling control and DVFS. In Proceedings of the 9th international workshop on middleware for grids, clouds and escience (MGC ’11) (pp. 1–6).Google Scholar
 Garg, S. K., Versteeg, S., & Buyya, R. (2013). A framework for ranking of cloud computing services. Future Generation Computer Systems, 29, 1012–1023.CrossRefGoogle Scholar
 Gerards, M. E. T., Hurink, J. L., & Hölzenspies, P. K. F. (2016). A survey of offline algorithms for energy minimization under deadline constraints. Journal of Scheduling, 19, 3–19.CrossRefGoogle Scholar
 Gupta, U. I., Lee, D. T., & Leung, J. Y.T. (1979). An optimal solution for the channelassignment problem. IEEE Transactions on Computers, 28, 807–810.CrossRefGoogle Scholar
 Hiraishi, K., Levner, E., & Vlach, M. (2002). Scheduling of parallel identical machines to maximize the weighted number of justintime jobs. Computers and Operations Research, 29, 841–848.CrossRefGoogle Scholar
 Iqbal, W., Dailey, M., & Carrera, D. (2009). SLAdriven adaptive resource management for web applications on a heterogeneous compute cloud. Lecture Notes in Computer Science, 5931, 243–253.CrossRefGoogle Scholar
 Jennings, B., & Stadler, R. (2015). Resource management in clouds: Survey and research challenges. Journal of Network and Systems Management, 23, 567–619.CrossRefGoogle Scholar
 Jing, S.Y., Ali, S., She, K., & Zhong, Y. (2013). Stateoftheart research study for green cloud computing. Journal of Supercomputing, 65, 445–468.CrossRefGoogle Scholar
 Kolen, A. W. J., Lenstra, J. K., Papadimitriou, C. H., & Spieksma, F. C. R. (2007). Interval scheduling: A survey. Naval Research Logistics, 54, 530–543.CrossRefGoogle Scholar
 Kovalyov, M. Y., Ng, C. T., & Cheng, T. C. E. (2007). Fixed interval scheduling: Models, applications, computational complexity and algorithms. European Journal of Operational Research, 178, 331–342.CrossRefGoogle Scholar
 Kushida, K. E., Murray, J., & Zysman, J. (2015). Cloud computing: From scarcity to abundance. Journal of Industry, Competition and Trade, 15, 5–19.CrossRefGoogle Scholar
 Lam, T.W., Lee, L.K., To, I. K. K., & Wong, P. W. H. (2008). Speed scaling functions for flow time scheduling based on active job count. Lecture Notes in Computer Science, 5193, 647–659.CrossRefGoogle Scholar
 Lam, T. W., Lee, L. K., To, I. K. K., & Wong, P. W. H. (2012). Improved multiprocessor scheduling for flow time and energy. Journal of Scheduling, 15, 105–116.CrossRefGoogle Scholar
 Lann, A., & Mosheiov, G. (2003). A note on the maximum number of ontime jobs on parallel identical machines. Computers and Operations Research, 30, 1745–1749.CrossRefGoogle Scholar
 Leyvand, Y., Shabtay, D., Steiner, G., & Yedidsion, L. (2010). Justintime scheduling with controllable processing times on parallel machines. Journal of Combinatorial Optimization, 19, 347–368.CrossRefGoogle Scholar
 Li, M., Yao, A. C., & Yao, F. F. (2006). Discrete and continuous minenergy schedules for variable voltage processors. Proceedings of the National Academy of Sciences of the United States of America, 103, 3983–3987.CrossRefGoogle Scholar
 Li, M., Yao, F. F., & Yuan, H. (2014). An \(O(n^{2})\) algorithm for computing optimal continuous voltage schedules. arXiv:1408.5995v1.
 Moreno, I. S., Garraghan, P., Townend, P., & Xu, J. (2014). Analysis, modeling and simulation of workload patterns in a largescale utility cloud. IEEE Transactions on Cloud Computing, 2, 208–221.CrossRefGoogle Scholar
 Nakajima, K., Hakimi, S. L., & Lenstra, J. K. (1982). Complexity results for scheduling tasks in fixed intervals on two types of machines. SIAM Journal on Computing, 11, 512–520.CrossRefGoogle Scholar
 Patriksson, M. (2008). A survey on the continuous nonlinear resource allocation. European Journal of Operational Research, 185, 1–46.CrossRefGoogle Scholar
 Pruhs, K., Uthaisombut, P., & Woeginger, G. (2008). Getting the best response for your erg. ACM Transactions on Algorithms, 4, 38:1–38:17.CrossRefGoogle Scholar
 Rackspace: Our 100% Network Uptime Guarantee, Resource Document. Rackspace US Inc. http://www.rackspace.co.uk/aboutus/datacentres. Accessed December 17, 2015.
 Shabtay, D., Bensoussan, Y., & Kaspi, M. (2012). A bicriteria approach to maximize the weighted number of justintime jobs and to minimize the total resource consumption cost in a twomachine flowshop scheduling system. International Journal of Production Economics, 136, 67–74.CrossRefGoogle Scholar
 Shabtay, D., & Steiner, G. (2007). A survey of scheduling with controllable processing times. Discrete Applied Mathematics, 155, 1643–1666.CrossRefGoogle Scholar
 Shioura, A., Shakhlevich, N.V., & Strusevich, V.A. (2015). Energy saving computational models with speed scaling via submodular optimization. In Proceedings of the 3rd international conference on green computing, technology and innovation (ICGCTI2015) Google Scholar
 Tian, W., & Zhao, Y. (2015). Optimized cloud resource management and scheduling: Theory and practices. Los Altos: Morgan Kaufmann.Google Scholar
 Von Laszewski, G., Wang, L., Younge, A. J., & He, X. (2009). Poweraware scheduling of virtual machines in DVFSenabled clusters. In IEEE international conference on cluster computing and workshops (CLUSTER ’09) (pp. 1–10).Google Scholar
 Wu, C.M., Chang, R.S., & Chan, H.Y. (2014). A green energyefficient scheduling algorithm using the DVFS technique for cloud datacenters. Future Generation Computer Systems, 37, 141–147.CrossRefGoogle Scholar
 Yao, F. F., Demers, A. J., & Shenker, S. (1995). A scheduling model for reduced CPU energy. In Proceedings of the 36th IEEE symposium on foundations of computer science (FOCS ’95) (pp. 374–382).Google Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.