Keywords

1 Introduction

With the rapid development of big data and cloud computing technologies, cloud applications spring up quickly in the past decade. The increasing complexity of modern cloud applications has changed the perspective of software designers who now have to consider large-scale applications consisting of a colossal number of services and featuring complex interaction mechanisms. Nowadays, cloud applications are usually distributed, heterogeneous, decentralized and safety-critical, having complex concurrent behavior and are operating in unpredictable environments, and it is notoriously difficult to guarantee their trustworthy. Therefore, formal verification of cloud applications becomes the focus of attention in both academic and industry.

Although current researches on cloud computing are mostly focused on technical problems such as resource allocation [8] and task scheduling [17], there are some attempts to the formalization of fundamental notions in cloud computing. For example, an abstract formal model of cloud workflows was proposed in [7] using the Z notation. In [6], the hierarchical colored Petri Net model was adopted to specify the security mechanism in cloud computing. The Petri Net model is also used in [4] to build the fault tolerant model of cloud computing, and as the basis for a dynamic fault tolerant strategy in cloud computing. The agent paradigm was adopted in [14] to manage cloud resources and support cloud service discovery, negotiation and composition. A Bigraph model was proposed in [3] to formally specify cloud services and customers and their interaction schemes. In [11], Event-B is integrated with discrete-event simulation to analyze the performance and reliability of resilience of data stored in the cloud.

To support rigorous development of cloud applications and enhance their trustworthy, only providing the formal specification is certainly not enough and we need to further investigate the formal verification techniques that help for understanding and reasoning about cloud applications and ensuring their trustworthy. Model checking and theorem proving are two most-widely used verification techniques. In [1], the model checker UPPAAL is used to synthesize an optimal infinite scheduler for a given specification of Mobile Cloud Computing systems. In [13] a holistic approach was proposed to verify the correctness of Hadoop cloud architectures using model checking techniques. Although the model checking approach is powerful and fully automatic, the state-space explosion is an inherent problem for all the model checking approaches which is serious for large-scale systems like cloud applications. On the other hand, theorem proving technique is used in [12] to verify properties of cloud services and their compositions in PVS, which is based on the relational UTP (Unifying Theories of Programming) [9] semantics for cloud services that has been proposed in [16]. However, automatic verification in interactive theorem provers like PVS is a hard problem due to the undecidable algorithms and proof methods. Furthermore, sometimes users may fail to prove that a property holds or not, and can not produce a counter example for it using theorem provers.

SMT-based techniques have been used extensively for program verification [2]. In this paper, we propose a formal framework for verification of cloud applications using the SMT solver Z3 [5], which is also based on the relational UTP semantics for cloud services [16]. UTP aims to formalize the similar features of different languages in a similar style. It has been proved to be appropriate for formal semantics of various programming languages and specification languages like rCOS [10] and Reo [15]. Z3 is a state-of-the-art SMT solver, which can be used to check the satisfiability of logical formulas over one or more theories. It provides bindings for various programming languages. In this paper, we use Z3 python-bindings to specify the models and develop the verification framework for cloud applications. Unlike other theorem provers, such as PVS, Coq, etc., Z3 can automatically generate counter examples or prove the validity of a specific goal.

The paper is organized as follows: Sect. 2 presents how simple cloud services/applications are specified in Z3 based on observations on their input and output ports. The models of a family of composition patterns, which are used to construct more complex cloud services are presented in Sect. 3. In Sect. 4, we use the refinement-relation-check function for complex cloud services as an example to show how to verify properties of cloud services and applications, and provide some case studies. Finally, Sect. 5 summarizes the paper.

2 Formalization of Cloud Services in Z3

Usually the computing services in a cloud application are distributed over the internet and far from its clients. Clients have no knowledge about the implementation details of the services and configuration of the application, and can access the cloud application regardless of their locations or what device being used. The only possible way that clients can know about a cloud application is via observations on the services provided by the application at corresponding input/output ports.

2.1 Observations as Timed Data Streams

A cloud service \(\mathbf C\) is interpreted as a relation between an initial observation on inputs to \(\mathbf C\) and a subsequent observation of the behavior of \(\mathbf C\). we use \(in_\mathbf{C}\) and \(out_\mathbf{C}\) to denote what happen as inputs and outputs of a cloud service \(\mathbf C\), respectively.

For every port of a cloud service \(\mathbf C\), the corresponding observation on it is given by a timed data stream (TDS), which is defined as follows:

Definition 1

Let D be a set of data elements and \({\mathbb R}_+\) be the set of non-negative real numbers which is used to represent time moments. Let \(DS=D^\omega \) be the set of data streams, that is, the set of all streams \(\alpha =(\alpha (0),\alpha (1),\alpha (2),\ldots )\) over D, and \({\mathbb R}_+^\omega \) be the set of all streams \(a=(a(0),a(1),a(2),\ldots )\) over \({\mathbb R}_+\). The set of time streams is defined by the following subset of \({\mathbb R}_+^\omega \):

$$\begin{aligned} TS=\{a\in {\mathbb R}_+^\omega \,|\,\forall n\ge 0.a(n)<a(n+1)\wedge \forall t\in {\mathbb R}_+.\exists k\in {\mathbb N}.a(k)>t\} \end{aligned}$$

For two time streams a and b, \(a<b\,\equiv \,\forall n\ge 0.a(n)<b(n)\). A timed data stream is defined as a pair \(\langle \alpha , a\rangle \) consisting of a data stream \(\alpha \in DS\) and a time stream \(a\in TS\). We use TDS to denote the set of timed data streams.

Let \(I_\mathbf{C}\) and \(O_\mathbf{C}\) be the set of input and output port names of \(\mathbf C\), the specification focuses on the constraints on the timed data streams of the corresponding ports:

$$\begin{aligned} in_\mathbf{C}:&I_\mathbf{C}\rightarrow TDS \\ out_\mathbf{C}:&O_\mathbf{C}\rightarrow TDS \end{aligned}$$

We use relations on timed data streams to model cloud services. Every cloud service \(\mathbf C\) can be represented by a pair of predicates P and as follows:

where \(\mathbf C\) is the name of the cloud service, \(P(in_\mathbf{C})\) is the condition that should be satisfied by inputs \(in_\mathbf{C}\) of the cloud service, and is the condition that should be satisfied by outputs \(out_\mathbf{C}\) of \(\mathbf C\).

2.2 Modeling of Simple Cloud Services

We first present the formal definitions of simple cloud services, which are used as the basis for the specification of compositions and development of more complex cloud services. The simple cloud services are dealt with as object instances of Class Service. An object instance is attached with four attributes: service type, pre-condition, nodes (represent timed data streams), and its corresponding method. We specify the time and data constraints of relations between inputs and outputs in the service definitions (methods). Basically, the behavior constraints of cloud services consist of two parts: data constraints and time constraints.

SyncSer has an input port whose corresponding timed data stream is specified as nodesParam[0] and a set of output ports (at least one, guaranteed by the assertion of the method). The behavior pattern of this cloud service follows like this: every item in both data stream and time stream at the output ports should be exactly equal to the items in the input data stream and time stream. All the conditions are specified as constraints. Finally, we take the conjunction of every constraint in the constraint list if the pre-condition of the service is True.

figure a

BufferSer has one input port and one output port. The time constraints of this service type is more complex than SyncSer, while the data-related constraints are the same as SyncSer. The time when the current data item gets transferred into the service through the input port is strictly earlier than the time when the data item gets taken/read through the output port. Besides, the time of the next request (input data item) should be strictly later than the previous returned result (last output data item).

figure b
Fig. 1.
figure 1

Remote printing service

Example 1

Consider a simple example where a remote printer offers its printing service to two clients, which compete for the use of this shared resource. Each client can send out multiple printing requests to the printer and the requests from different clients are placed in a queue to be processed by the printer in a first-come first-served manner. After a file is printed out it can be collected by the client later. In order to keep the example simple to expose without considering priority for scheduling different printing tasks, we assume that requests from different clients never arrive simultaneously.

The cloud service \(\mathbf M\) in Fig. 1 receives requests from different clients at ports A and B, and delivers a sequence of requests through port C to a queue on the printer side. Such a service is called MergeSer. It has two input ports and one output port. When there is only one timed data stream from one specifc input port, the MergeSer is reduced to the SyncSer with one input and one output ports. When there exists two different timed data streams from the input ports, the MergeSer is captured through the method Merge, which is defined recursively. Every time one of the timed data items from these two input ports is chosen to be fed into the service. The specification of this service is in the following:

figure c

where Merge is the method to deal with the case when the two input ports both have requests to provide for the MergeSer. It captures the behavior of merging two input timed data streams into one output stream and the detailed definition is omitted here due to the length limitation.

RouterSer has one input port and two output ports. The behavior of this service type is that the input timed data items are nondeterministically taken through the two output ports, which can be specified using the Merge method with the input timed data item handled as the output result in Merge and the two output timed data items dealt with as the two input requests in Merge.

figure d

3 Composition of Cloud Services

Different cloud services can be composed together to build more complex services/applications. Simple cloud services are defined as object instances of class Service, therefore their composition can be naturally modeled by composition on Service instances, which leads to a new class Composition capturing the behavior of the composed cloud service/application. In this section, we introduce a family of composition operations for two cloud services \(\mathbf{serv}_i (i=1,2)\):

Sequential Composition. Suppose one output port O of \(\mathbf{serv}_1\) and one input port I of \(\mathbf{serv}_2\) can be joined together and the timed data stream that happens on O thus can be taken as the input on I for \(\mathbf{serv}_2\). After joining these two ports, the extra constraints restricted on O (and I) are about the equality between the output timed data stream of \(\mathbf{serv}_1\) and the input timed data stream of \(\mathbf{serv}_2\). Since there can exist more than one output ports in \(\mathbf{serv}_1\), the attribute index is designed to indicate the correct output port being joined. The output timed data stream in \(\mathbf{serv}_1\) is thus represented by nodesParam1[index] and the input timed data stream in \(\mathbf{serv}_2\) is represented by nodesParam2[0]. Then the result specification of the cloud service by sequentially composing \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) is:

figure e

Note that when two cloud services \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) are sequentially composed, we can certainly join more than one pair of ports together and the definition of the resulting service is similar, but it is not necessary to join all the output ports of \(\mathbf{serv}_1\) to all the input ports of \(\mathbf{serv}_2\). Some ports in the services can be left as the input/output ports for the resulting service. The definition for the general situation is similar and can be easily obtained.

External, Internal and Conditional Choices. Cloud services can be aggregated in a number of different ways, besides the sequential composition. In the following we consider a few such combinators. A typical composition pattern being widely used is external choice. For the two cloud services \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\), when they are put together and interacting with the environment, clients from the environment are allowed to choose either to input on the input ports of \(\mathbf{serv}_1\), or on input ports of \(\mathbf{serv}_2\), which will trigger the corresponding cloud service \(\mathbf{serv}_1\) or \(\mathbf{serv}_2\), respectively, and produce the associated output on the corresponding output ports. Formally, the result specification of the cloud service as an external choice of \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) is defined as:

figure f

Sometimes it is possible that both cloud services might have input ports in common so that there is no clear prescription as to which route is followed when one of these common ports is chosen. In the implementation, either service can be chosen to be executed. This case is captured by the internal choice pattern, which is formally defined as follows:

figure g

Besides the external and internal choices, a further form of choice, the conditional choice which is based on the value of a boolean expression, is also needed for combination of cloud services. This case is formally defined by the following definition which means that if the boolean expression is satisfied then the cloud service \(\mathbf{serv}_1\) is executed, and otherwise, \(\mathbf{serv}_2\) is executed:

figure h

Parallel Composition. The simplest form of parallel combinator captures the case that both cloud services \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) are invoked and executed in parallel when triggered by a pair of inputs on the corresponding input ports of both \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\). Therefore, to make it possible to execute the parallel combination of \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\), both pre-conditions of the two services should be satisfied and the execution will lead to the result that the post-conditions of both \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) should be satisfied.

figure i

In the parallel composition defined above, when two cloud services are put into parallel, they may evolve completely autonomously, i.e., we have no restriction on the inputs for the two services and they can arrive at any time. Sometimes we may hope to have some inputs for \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) arrive only simultaneously. For simplicity, we assume that the data can only arrive at the input ports \(I_1\) and \(I_2\) simultaneously, where \(I_1\) and \(I_2\) belong to the input ports of \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) respectively. And the data arriving at all the other input ports except \(I_1\) and \(I_2\) are independent. In this case, extra constraints on the input timed data streams of the two services should be captured. Then we have the following specification:

figure j

In many cases, a family of cloud services may exist and behave in parallel in a pairwise fashion. To model this, the n-ary version of these two parallel combinators are very helpful. The corresponding specification can be easily generalized to the case for composing multiple services.

A similar situation we consider is the case of merging two input ports of cloud services \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) into one port. Let nodesParami for \(i=1,2\) be the timed data streams on the input port \(I_i\) in \(\mathbf{serv}_i\), respectively. By merging \(I_1\) and \(I_2\) into one port I, when the resulting service receives a request on I, it will behave in a broadcasting way. In other words, the request will be replicated on I and sent to both \(\mathbf{serv}_1\) and \(\mathbf{serv}_2\) to trigger their execution simultaneously. This operation can be realized by renaming without defining a specific method.

4 Verification of Cloud Applications in Z3

Based on the specification of simple cloud services and the family of composition patterns, we can develop more complex cloud services for different needs, which are defined as a new class Cloud. In this class, we provide two methods config and compose to develop complex cloud services out of the simple cloud services and the composition patterns. Meanwhile, we encode the refinement and equivalence relations between cloud services as propositions under analysis. Based on the Z3 SMT solver, we defined the main method Refine to check the validity of the refinement and equivalence relations or to generate counter examples for the dissatisfaction of relations.

figure k

The predicate capturing the refinement relation between two cloud services is an implication statement. Assume we have two complex cloud services \({cServ}_1\) and \({cServ}_2\), \({cServ}_1\) is a refinement of \({cServ}_2\) (or \({ cServ}_1\) refines \({cServ}_2\)) if and only if the specification of \({cServ}_2\) is further restricted by \({cServ}_1\), i.e., \({cServ}_1\) \(\rightarrow \) \({cServ}_2\). Furthermore, two cloud services are in equivalence relation, i.e. \({cServ}_1\) \(\leftrightarrow \) \({cServ}_2\), if and only if they refine each other. Such proposition is valid if it is always true whatever the assignment of variables is. It is satisfiable if there exists an evidence (an assignment to the variables) under which this proposition evaluates true. If the proposition \({cServ}_1\) \(\rightarrow \) \({cServ}_2\) is valid, i.e. always true under any assignment of values, then its negation will not have any witness (any satisfying assignment). In other words, the negation of the implication is unsatisfiable. On the other hand, if a solution is found for the negation of the proposition, then this solution is actually the counter example for the satisfaction of the relation.

Algorithm 1 provides the pseudocode for the definition of the Refine method, in which a solver is created and the constraints of the negation of the target implication are added into the solver. If the solver returns unsat (corresponding to True in Algorithm 1), then it acts as a proof for the validity of the refinement relation between the two cloud services under analysis. If the solver returns sat (corresponding to False in Algorithm 1), the model (witness) of the negation is the counter example we need for proving the dissatisfaction of the relation.

In the beginning, the command Solver() is used to create a general purpose Z3 Solver. The dictionary nodes stores the uninterpreted variables, which can be further used to generate node parameters for invocation of simple cloud services. The first double for loop generates the time constraints (for well-definedness) and constraints for each specific service in \(cServ_{1}\) and the next double for loop generates the time constraints and constraints for each specific composition in \(cServ_{1}\). The procedures up to now are used to handle \({cServ}_1\). The subsequent procedures are for the refined cloud service \({cServ}_2\). The generation of time constraints and constraints for each simple cloud service and composition is similar. Besides, the uninterpreted symbols that need to be under universal quantifier are stored in UniVar.

Example 2

Consider the cloud applications shown in Fig. 2, where users can check the information about flights and can order flight tickets. Different companies offer flights to the same location and the ticket availability and price for each flight varies and may change at any time. If a user wants to make a trip between two places and make a query for the flight information, he/she hopes to collect the information for all the available flights at the latest time.

Fig. 2.
figure 2

Flight query services

The cloud service QF1 in Fig. 2 has a buffer on the server side for each flight company. It accepts queries from users and store queries in every buffer. Based on the simple cloud services, the model of QF1 can be developed in Z3 in the following way:

figure l

In cloud service QF2, the location of the buffer is on the user’s side. QF2 can accept the queries from the user, place the queries in the buffer and simultaneously send a copy of the queries to each flight company. The model of QF2 is constructed like this:

figure m

Next we invoke the Refine function in two directions to check the refinement and equivalence relation between QF1 and QF2. The result of QF2.Refine(QF1, 10) is True and None, which means that QF2 is indeed a refinement of QF1 and therefore no counter example is provided.

figure n

On the other hand, the result of QF1.Refine(QF2, 10) is False, which indicates the equivalence relation doesn’t hold between QF1 and QF2. The counter example actually presents a solution which satisfies the constraints of QF1 but doesn’t satisfy the constraints of QF2. indicates the data item on the corresponding port indicated by the capitalized letter while indicates the time moment. Time-related constraints for well-definedness are clearly satisfied. The data streams in this specific counter example satisfy the data-related constraints in both QF1 and QF2. However, the time stream on the three output ports of QF2 should be exactly equal, which are not satisfied. Time constraints in QF1 are relaxed a bit. Only delay of the data item transfer in three BufferSer needs to be satisfied and this counter example provides a feasible and correct solution.

figure q

Example 3

Consider a travel service scenario which involves reserving hotel and booking transportation (flight or train). The service package processes clients’ requests in the following way: The service is initiated through a request from some client, then the request is first handled by the hotel service through a SyncSer. Next the request is further transferred to the transportation service, where the RouterSer operates and sends it to the flight service and train service. Meanwhile, flight and train booking are both monitored by a government service, which is aggregated through a MergeSer. The transportation reservation succeeds only if the government service accepts the reservation. Besides, the hotel service and the transportation service are composed through a sequential composition. This service package can be modeled in Z3 as follows.

figure r

Another service package can be simpler, which involves a hotel service and a government service in charge of transportation reservation. Moreover, these two services are also composed through a sequential composition. The model of this simplified service package is presented in the following.

figure s

Intuitively, these two service packages provide the clients with the same result: hotel and transportation reservation. Therefore, next we check if the equivalence relation exists between them. After invoking the Refine function in two directions, we get the returned results shown as follows:

figure t

The result shows that the TravelPack1 is indeed a refinement of TravelPack2 while the refinement relation holds also in the other direction. Finally, the equivalence relation between them gets proved.

5 Conclusion and Future Work

This paper extends our previous work on the design model for cloud services and proposes a framework on formal specification of cloud services and compositions in SMT solver Z3. The composition of cloud services is given by a family of composition operators which are specified in a class capturing the behavior of the composition. The framework naturally preserves the original choreography of cloud applications, and thus makes the description of cloud services and applications reasonably readable. This work also provides a complement of the verification by theorem proving approach in our previous work. In fact, sometimes users of theorem provers like Coq or PVS need to construct proofs or even build counter examples manually first to show that a property is not satisfiable. For such cases, using Z3 makes it possible to automatically search for possible bounded counter examples.

However, the proposed framework focuses on addressing the data- and time-related properties of the top level cloud services while failing to address the availability and failure rate of online services. The case studies are also set in a high level conceptual setting. In future work, we plan to incorporate the QoS aspects on cloud services into this model and will investigate the formalization and quantitative reasoning about low level programs of cloud applications in SMT solvers. On the other hand, we hope to develop the formal model for dynamic reconfiguration and adaptation of cloud services as well, which is quite useful in real world scenarios.