Keywords

1 Introduction

Service composition has been adopted as a standard computing paradigm for rapidly constructing large-scale distributed applications within and across organizational boundaries. Many modern enterprises have chosen to construct flexible applications by dynamically composing atomic services of wide variety and different quality of service (QoS) attributes.

Recently, with the bloom of cloud computing, the importance of affordable access to reliable high-performance hardware and software resources and avoiding maintenance costs has encouraged many decision makers to migrate enterprise applications partially or entirely from traditional distributed computing platforms (e.g., grid) to clouds.

Fast development in the utilization of cloud computing leads to publishing more cloud services on the worldwide service pool. Because of the prevalent presence of complex and diverse applications, it is essential to have a group of simple services that work with each other to meet the practical requirements for real-world cases. Therefore, there is a strong need to deploy a service composition system in cloud computing [14].

Figure 1 gives a system architecture for modeling and executing composite service in a public cloud. At design phase the provider of the composite service defines the set of required atomic services and their relations using a workflow-like language such as WS-BPELFootnote 1 or YAWLFootnote 2 to fulfill the business goal. After that, service discovery is performed by exploiting the existing computing infrastructure (e.g. UDDI) to locate available atomic services for each task in the workflow. As a result, a collection of functionally equivalent atomic services (referred to as candidate services) is obtained for each task.

Fig. 1.
figure 1

Architecture of a cloud computing service composition system

At runtime phase, QoS-aware service selection is performed upon service request in order to select a set of appropriate atomic services from collections of candidate services such that the aggregated QoS values satisfy the client’s QoS requirements.

As shown in Fig. 1, there are lots of similar services that are located in different places, implemented by different service providers, and have distinct values in terms of the QoS parameters. Therefore, the cloud computing service composition (CCSC) problem [14], i.e. selecting appropriate and optimal atomic services to be combined together to provide composite services such that the obtained complex composite service satisfies both the functional and QoS requirements based on the end-user requirements, is one of the most important problems in providing cloud services. A substantial effort has been made to solve this problem, with a lot of research works produced.

More importantly, the open and shared natures of cloud computing bring a brand new challenge to the service composition system: it must has the ability to handle a lot of diverse requests from many concurrent clients. More concretely, users will look up and invoke their desired composite services independently, these requests are then received, queued, and processed by the service composition system. Since the response time is one of the most important QoS indicators for invoking composite services from the clients’ perspective, an efficient selection algorithm with high-throughput, i.e. the algorithm that can handle with more requests within a fixed period of time, is critical for a service composition system to fulfill the performance requirements.

To deal with numbers of requests, administrators of service composition systems typically apply some simple and ad hoc dispatching algorithms, such as packing, striping, load-balance etc., to distribute these requests to different solvers. These solvers select appropriate atomic services for their own dispatched composite services from the global service pool independently. Although this approach has some advantages in terms of simplicity and scalability, it suffers from the bad performance and totally overlooks the possible similarities and connections among a bunch of requests, which have been proven to be very useful to improve the performance of algorithms and the quality of solutions according to our experiments. Such observation motivates us to solve multiple CCSC tasks simultaneously by using a unique optimizer in a single run.

Multifactorial optimization (MFO) is a newly developed algorithmic paradigm in the field of optimization and evolutionary computation. Instead of solving a pool of similar optimization problems singly, it handles multiple optimization problems at the same time by using a shared evolving population [13]. Due to the perfect match between the MFO paradigm and the CCSC problem, we propose an evolutionary multitasking algorithm (EMA) in this paper to solve CCSC problem, in which we aim to test the feasibility of the simultaneous optimizations of multiple CCSC tasks. The main contributions of this paper thus can be summarized as follows:

  1. 1.

    We propose a novel CCSC solver named CCSC-EMA based on the evolutionary multitasking algorithm, and explain its detailed process including the representation of solutions, quality assignment, reproducing and selection of new solutions.

  2. 2.

    We design new experimental and analytical methodologies to make comparisons between evolutionary multitasking algorithm and conventional single objective CCSC solvers.

2 Related Work

2.1 Evolutionary Multitasking Algorithm and Its Application

Evolutionary multitasking algorithm (EMA) is a new optimization paradigm proposed recently by Ong [19]. In contrast to traditional evolutionary optimization approaches, which focus on solving only a single optimization problem at a time, EMA was proposed to solve multiple optimization problems simultaneously. Ong and Gupta [20] presented a simple evolutionary methodology capable of cross-domain multitask optimization in a unified genotype space, and show that there exist many potential benefits of its application in practical domains. Zhou et al. [38] proposed a permutation-based EMA to improve the multi-tasking performance in the context of vehicle routing problem. Gupta et al. [13] developed a cross-domain optimization platform that allows one to solve diverse problems concurrently. Gupta et al. [12] showed that the practicality of population-based bi-level optimization can be considerably enhanced by simply incorporating the novel concept of evolutionary multitasking into the search process. Yuan et al. [33] focused on the evolutionary multitasking of PCOPs. Four kinds of well-known PCOPs, i.e., TSP, QAP, LOP and JSP are considered. Da et al. [7] suggested to solve a target single-objective optimization (SOO) task in conjunction with a closely related (but artificially generated) multiobjective optimization (MOO) task in the form of evolutionary multitasking. Sagarna and Ong [22] focused on branch testing and explore the capability of EMA to guide the search by exploiting inter-branch information.

2.2 Algorithms for CCSC Problem

Service composition techniques were first applied in cloud computing systems by Kofler et al. [17] and Zeng et al. [34] in 2009 [14]. The CCSC problem is often defined as an instance of the classic multiple-choice multidimensional knapsack problem (MMKP) [32] which searches for the composition that has the best composite QoS values when satisfying QoS constraints of atomic services. Such problem is NP-hard [16] and usually includes numerous constraints, on which the exact approaches can not perform well when large amounts of services are involved. Zeng et al. [34] presented a matching algorithm that considers the semantic similarity of multiple input and output parameters based on WordNet in the process of service matching. Cui et al. [6] proposed a service graph constructing algorithm to transform the CCSC problem into the optimal path finding problem in graph. Torkashvan and Haghighi [28] proposed a greedy approach to solve the CCSC problem by mapping workflows to composed services considering all flow structures. Zhang et al. [37] proposed a cluster-based DSP scheduling problem algorithm for service composition in multi-domain environment with time constraint. Wang et al. [29] presented a graph model that takes both QoS of Web services and QoS of network into consideration. Ba [3] proposed the automation of service composition that takes the abstract specification of a composition and the definition of concrete services. Syu et al. [26] proposed an automatic composition approach through genetic algorithm (GA) to satisfy user’s functional requirements, QoS criteria and transactional requirements automatically. Guidara et al. [11] presented a heuristic based time-aware service selection approach to select a close-to-optimal combination of services. Deng et al. [9] utilizes a constraints based service filtering process to reduce the searching space and adopts a differential evolutionary based algorithm to form a service combination. Rodriguez-Mier et al. [21] presented an A* algorithm which solves the problem of semantic input-output message structure matching for service composition. Mabrouk et al. [18] presented QASSA, a service selection algorithm that provides the appropriate ground for QoS-aware service composition in ubiquitous environments.

According to our previous investigation, one of the most important issues in CCSC problem is how to improve the throughput of the optimizers, because there are often a large number of independent composite service requests in public clouds. This observation motivates us to explore the possibilities and solutions of optimizing multiple CCSC tasks concurrently.

3 Problem Definition

3.1 Quality Criteria of an Atomic Service

There are lots of services available in public clouds, some have distinct functions while others have similar functions. Services having similar capabilities are distinguishable via their QoS values. QoS defines the overall performance of a service, the QoS values of a service indicate whether it is reliable, available, or efficient.

Services are usually advertised with multiple QoS values, each value represents a quality aspect of the service called a QoS criterion. QoS criteria are generally the most commonly used characteristics in measuring the quality of services, this is because they indicate whether a service is capable of measuring up to user’s expectations [24]. Based on Web service benchmark [1, 2] and previous studies [23, 24], we identify six generic quality criteria [4], namely price(\(q_{p}(s)\)), duration(\(q_{d}(s)\)), availability(\(q_{a}(s)\)), throughput(\(q_{t}(s)\)), successful rate(\(q_{s}(s)\)), and reliability(\(q_{r}(s)\)), for an atomic service s in this paper.

3.2 Structures of a Composite Service

A composite service cs can be modeled as a directed acyclic graph (DAG) \(cs = (S,E)\), where vertices \(S = \{s_{1}, s_{2},...,s_{n}\}\) represent a set of atomic services and \(s_{1}, s_{n}\) represent the starting and ending services respectively. The dependency between a pair of adjacent services \(s_{i}, s_{j}\) is denoted by a directed edge \(e_{ij} \in E\) between them.

To construct a composite service, atomic services need to be connected by different structures. In this paper, we consider four service composition structures: Sequence, Concurrency, Condition and Loop. For more details, see [4].

3.3 Service Selection

Given a composite service cs containing n atomic services \(S = \{s_{1}, s_{2},\cdots ,s_{n}\}\), we must select a candidate for each \(s_{i}\in S\) from a candidate service set to implement \(s_{i}\). A candidate service set is the collection of atomic services having the same functionality but different QoS properties. The typical process of service selection for a composite service having three atomic services can be illustrated by Fig. 2. Note that we may choose candidates from the same candidate service set for different atomic services if they have the same function.

Fig. 2.
figure 2

Service selection for a composite service

Generally, let a composite service \(cs = (S,E)\) has m candidate service sets \(C = \{C_{1}, C_{2},\cdots ,C_{m}\}\), and for any \(C_{j}\in C\), it contains a set of k candidate services \(C_{j} = \{c^{j}_{1},c^{j}_{2},\cdots c^{j}_{k}\}\). We first define a function \(\mathcal {M}(s_{i}) : S\rightarrow C\) to map each atomic service \(s_{i}\in S\) to its corresponding candidate service set \(C_{j}\in C\), i.e. \(\mathcal {M}(s_{i})=C_{j}\). Note that because the mapping between an atomic service to its corresponding candidate service set is often determined manually by the designer of a composite service, we assume \(\mathcal {M}(\cdot )\) is given in this paper. Last, we define a matrix \(\varvec{x} = [x_{i,j}]_{|S|\times \max \limits _{k=1}^{n}(|\mathcal {M}(s_{k})|)}\) to represent the service selection scheme for each atomic service in S, where \(\varvec{x}\) has |S| rows and \(\max \limits _{k=1}^{|S|}(|\mathcal {M}(s_{k})|)\) columns, and the max function is applied to make the alignment on the columns of \(\varvec{x}\). For example, suppose we choose the candidate service \(c^{j}_{k} \in C_{j}\) to implement an atomic service \(s_{i} \in S\), then \(x_{i,k}=1\).

3.4 Execution of a Composite Service

As shown in Fig. 2, once we finish selecting the candidate for each atomic service in a composite service cs, we can generate an execution plan for cs. Given the execution plan, cs becomes executable and its QoS properties can be calculated by aggregating QoS properties of its component candidates. The detailed aggregation rules for four structures and six different QoS properties adopted in this work have been discussed in [4].

To represent the preference for QoS properties of different users, we define a utility function \(\mathcal {F}(cs, \varvec{x})\) for a composite service cs by a weighted sum of six quality criteria:

$$\begin{aligned} \begin{aligned} \mathcal {F}(cs,\varvec{x}) = \alpha _{1} \cdot -Q_{p}(cs,\varvec{x}) + \alpha _{2} \cdot -Q_{d}(cs,\varvec{x}) \\ +\, \alpha _{3} \cdot Q_{a}(cs,\varvec{x}) + \alpha _{4} \cdot Q_{t}(cs,\varvec{x}) \\ +\, \alpha _{5} \cdot Q_{s}(cs,\varvec{x}) + \alpha _{6} \cdot Q_{r}(cs,\varvec{x}) \end{aligned} \end{aligned}$$

where \(\alpha _{i}\) (\(0<\alpha _{i}<1\)) is the weight for each QoS property and \(\sum _{i=1}^{6}\alpha _{i}=1\), \(\varvec{x}\) denotes the service selection scheme for cs, and we have:

$$Q_{*}(cs,\varvec{x})=\mathop {aggr}\limits _{s_{i}\in S \wedge x_{i,j}=1}q_{*}(c_{j}^{i}),$$

the function \(Q_{*}(\cdot )\) calculates the six QoS values of cs by aggregating corresponding QoS values of all candidates selected for its atomic services [4].

3.5 Problem Formulation

Given a set of composite service \(CS = \{cs_{1},cs_{2},\cdots ,cs_{n}\}\), the cloud computing service composition (CCSC) problem is to select service candidates from a group of candidate service sets to construct the execution plan for each \(cs_{i}=(S_{i},E_{i})\in CS\), so that it can maximize the total QoS gain of CS.

Mathematically, the CCSC problem can be formulated as follows:

$$\begin{aligned} \text{ Maximize }&\sum \limits _{cs_{i}\in CS}\mathcal {F}(cs_{i},\varvec{x^{(i)}}) \end{aligned}$$
(1)

Subject to

$$\begin{aligned}&\varvec{x^{(i)}}=[x^{(i)}_{j,k}]_{|S_{i}|\times \max \limits _{l=1}^{|S_{i}|}(|\mathcal {M}(s_{l})|)} \end{aligned}$$
(2)
$$\begin{aligned}&\sum \limits _{k=1}^{\max \limits _{l=1}^{|S_{i}|}(|\mathcal {M}(s_{l})|)}x^{(i)}_{j,k}=1, \nonumber \\&j=1,2,...,|S_{i}|, \quad x^{(i)}_{j,k} \in \{0,1\} \end{aligned}$$
(3)

where Eq. (1) states that the goal of the CCSC problem is to maximize the sum of utility function \(\mathcal {F}\) for each composite service \(cs_{i}\in CS\). Equation (2) defines the service selection scheme, i.e. \(\varvec{x^{(i)}}\), for \(cs_{i}\), which has been explained before. Equation (3) ensures that there is one and only one candidate can be selected for each atomic service in the \(cs_{i}\).

According the definition, the CCSC problem aims to maximize the total QoS values of a set of composite services instead of a single composite service. The latter one is usually known as the classic QoS-aware service composition (QoS-SC) problem and has been well studied in previous work. Because [27] has proven that the QoS-SC is a NP-hard problem, our CCSC problem, consisting of a set of QoS-SC problems, is also considered to be NP-hard.

4 Evolutionary Multitasking Algorithm for CCSC Problem

Given an instance of CCSC problem consisting of n composite services, a conventional service composition system considers it as n independent optimization tasks (i.e. CCSC tasks), and focuses on solving only a single task at a time using a optimizer. In this paper, we propose a novel method, namely evolutionary multitasking algorithm for CCSC (EMA-CCSC), that considers two CCSC tasks at the same as suggested in [13]. The key motivation is to use evolutionary multitasking algorithm (EMA) for implicit knowledge transfer of useful traits across two optimization tasks, thereby enhancing the evolutionary search for problem-solving [38]. We first introduce some related terms, and then discuss the EMA-CCSC in detail.

Factorial Cost: For a given CCSC task \(T_j (j=1,2)\) and a solution \(x^i\) in the evolving population of EMA, the factorial cost of \(x^i\), denoted by \(\varPsi _j^i\), is defined as the QoS value of \(x^i\) with respect to \(T_j\).

Factorial Rank: The factorial rank of \(x^i\) on task \(T_j\), denoted by \(r_j^i\), refers to the index of \(x^i\) in the list of populations sorted in ascending order with respect to \(\varPsi _j^i\).

Scalar Fitness: Given the list of factorial ranks \(\{r_1^i, r_2^i\}\) of the solution \(x^i\), its scalar fitness \(\varphi _i\) can be defined as \(\varphi _i=1/min\{r_1^i, r_2^i\}\).

Skill Factor: The skill factor \(\tau _i\) of the solution \(x^i\) is one of the two CCSC tasks on which \(x^i\) achieves the best QoS value.

With these definitions above, the proposed EMA-CCSC shown in Algorithm 1 chooses two of the remainder CCSC tasks (randomly or with a certain pattern) at each run and works as follows.

figure a

In the above pseudo-code of the EMA-CCSC algorithm, line 1 initializes a population of N individuals at random. Suppose that the two composite services \(cs_1\) and \(cs_2\) contain \(n_1\) and \(n_2\) atomic services respectively, then every single individual \(\mathbf x ^i=\{x_1^i,...,x_n^i\}\) in the initial population is a n-dimensional real vector, where \(n=\max \{n_1,n_2\}\). Each dimension of \(\mathbf x ^i\), denoted by \(x_j^i (j=1,2,...,n)\), is a real value between 0 and 1. If the j-th atomic service \(s_{j}\) can be selected from a candidate collection \(\mathcal {M}(s_{i})= \{c_{1},c_{2}...c_{k}\}\) with k available services, then the r-th candidate service will be chosen to execute, where r can be determined by \(\lceil k \times \mathbf x _j^i \rceil \).

Line 2 of Algorithm 1 evaluates the individuals in the initial population by calculating their factorial costs, determining their factorial ranks, assigning their scalar finesses \(\varphi _i^i\) and skill factor \(\tau _i\) in sequence. Figure 3 illustrates this evaluation process with two tasks \(T_{1}\) and \(T_{2}\) and four individuals \(p_{1}\)\(p_{4}\). The algorithm calculates the factorial rank and skill factors for these individuals first, and then sets the value of factorial cost as “infinity” to the task that doesn’t have skill factor for each individual.

Fig. 3.
figure 3

Evaluate individuals by calculating their factorial costs

The reproduction step in line 4 of Algorithm 1 generates a offspring population of \(\{\mathbf{y }^1,\cdots , \mathbf{y }^N\}\) according to the assortative mating rules provided in Algorithm 2. In which, rand is a random number between 0 and 1, rmp is a prescribed random mating probability. Here we employ the single-point crossover in line 4 and the random mutation with mutation rate of \(1 \slash n\) in lines 6 and 7 in Algorithm 2.

figure b

In the selective evaluation step in line 5 of Algorithm 1, the offspring individuals are evaluated for only one selected task on which it is most likely to perform well. The selective evaluation step works as Algorithm 3. Figure 4 shows the reproduction and selective evaluation steps of \(T_{1}\) and \(T_{2}\) and their four individuals \(p_{1}\)\(p_{4}\), in which the algorithm performs random mutation on \(p_{1}\) and \(p_{4}\) and generates \(c_{1}\) and \(c_{4}\) respectively, and applies crossover operation on \(p_{2}\) and \(p_{3}\) and produces \(c_{2}\) and \(c_{3}\) separately.

Fig. 4.
figure 4

Reproduction and selective evaluation

figure c
Fig. 5.
figure 5

Selection step

The selection step in line 7 of Algorithm 1 selects N individuals from the union set of the parent population of \(\{\mathbf{x }^1,\cdots ,\mathbf{x }^N\}\) and the offspring population of \(\{\mathbf{y }^1,\cdots ,\mathbf{y }^N\}\), giving rise to a new evolving population for the next generation. The selection step in Algorithm 1 follows an elitist strategy which ensures that the best individuals survive through the generations. Figure 5 illustrates the selection process with \(p_{1}\)\(p_{4}\) and their four offsprings \(c_{1}\)\(c_{4}\).

5 Experiments

5.1 Experiment Setup

All experiments are conducted on a Lenovo ThinkServer RD530 server equipped with 2 Xeon E5-2609 CPUs, 1 TB disk and 6 * 16 G RAMs, running on Windows Server 2008.

A sequence of 1188 CCSC tasks with various atomic service numbers from 10 to 100 are investigated [4]. These CCSC tasks are generated by using a synthetic workflow generator developed in [15]. The collection of candidate services are provided by an updated QWS data set [1, 2] in which 2507 Web services with 233 categories are available.

5.2 Compared Algorithms and Control Parameters

Nine traditional CCSC solvers with different types are employed as the baseline algorithms. They are the genetic algorithms (GAs) including TGA [30], GAHS [31] and GASA [31], the particle swarm optimization (PSO) algorithms including ConstrictionPSO [5], FrankensteingPSO [10] and OLPSO [35], and the differential evolution (DE) algorithms including DE [25], DEGL [8] and JADE [36].

The parameter setting of the proposed EMA-CCSC algorithm are as follows. The population size is set to 30, which is the same in all the compared algorithms. The crossover probability and mutation probability are set to 0.3 and \(1 \slash n\) (n is the number of atomic services in the target CCSC) respectively. Except for the population size, all the other parameter settings are exactly the same as suggested by their authors. All the runs of the compared algorithms stop after 1000 iterations. When applying the EMA-CCSC algorithm, two CCSC tasks with the same number of atomic services are selected from the waiting queue, and submitted to the EMA-CCSC algorithm for scheduling.

5.3 Results and Comparison

Figures 6, 7 and 8 compared the performance of the baseline algorithms of GAs, PSOs and DEs respectively. In these figures, the symbol “++” indicates that EMA-CCSC improves the QoS values of both the two target CCSC tasks by optimizing them simultaneously, comparing with the performances of the compared algorithms which optimize them successively. In the pie charts, The symbol “+=” means one is improved while the other is unchanged. “==” means both the two instances are unchanged. “+−” is for one improved one degenerated. “−=” is for one degenerated one unchanged. “\({-}{-}\)” means both the two are degenerated. The bar charts in these figures illustrate the average rate of improvement achieved by the proposed EMA-CCSC algorithm versus the compared algorithms on all the investigated CCSC tasks. All the experimental results in these figures are statistic values of five independent runs of the compared algorithms.

As shown in Fig. 6, when comparing with GAHS, GASA and TGA, EMA-CCSC achieves “++” on 89.90%, 80.14% and 32.83% of the investigated CCSC tasks. On the other hand, EMA-CCSC achieves “\({-}{-}\)” and thus degenerates the performances on both the two target CCSC tasks only on 0.17%, 0.84% and 14.65% of the investigated CCSC tasks. When looking at the bar charts, EMA-CCSC improves the QoS values of CCSC tasks on 1127, 1065 and 660 out of 1188 tasks respectively. As a result, EMA-CCSC achieves average rate of improvement of 36.6675%, 18.4813% and −1.7650% respectively versus GAHS, GASA and TGA. Based on the total improvement rate, we can come to the conclusion that EMA-CCSC achieves better performances than GAHS and GASA with only half of their computing costs.

Fig. 6.
figure 6

Comparisons between EMA-CCSC and GAs

Similar conclusions can be drawn from Figs. 7 and 8 on the PSOs and DEs baseline algorithms. It can be seen that EMA-CCSC significantly outperforms the PSOs and achieves “++” on up to 92.76%, 90.74% and 88.55% of the investigated CCSC tasks. When comparing with the DEs baseline algorithms, EMA-CCSC outperforms two out of the three compared DEs algorithms in term of the total improvement rate.

Fig. 7.
figure 7

Comparisons between EMA-CCSC and PSOs

Fig. 8.
figure 8

Comparisons between EMA-CCSC and DEs

To conclude, our EMA-CCSC outperforms 7 out of 9 compared algorithms in term of the total improvement rate. It appears to be competitive against the baseline algorithms, even though it spends only half of their computing costs.

6 Conclusion

In this paper, we present an evolutionary multitasking algorithm based approach to efficiently solve CCSC problem. Unlike existing solvers, our EMA-CCSC algorithm can optimize two or more CCSC tasks concurrently. As a result, it has a greater throughput and is able to deal with more composite service requests given a fixed period of time, which is critical to improve the experiences of the cloud computing services. Our extensive experiments indicate that the proposed EMA-CCSC is competitive in both solution quality and time efficiency.