1 Introduction

Building software models and analyzing them play an important role in the development of more reliable systems. However, as the complexity of the modeled systems increases, both the cost of creating the models and the complexity of analyzing these models become high [24].

Our focus in this paper is to reduce the cost of analyzing models written in Alloy [5] – a relational, first-order logic with transitive closure. The Alloy analyzer provides automatic analysis of Alloy models. To analyze the model, the user writes Alloy paragraphs (e.g., signatures, predicates, functions, facts and assertions), and the analyzer executes the commands that define constraint solving problems. The analyzer translates the commands and related Alloy paragraphs into propositional satisfiability (SAT) formulas and then solves them using off-the-shelf SAT solvers. We focus on successive runs of the analyzer as the model undergoes development and modifications. The key insight is that during model development and validation phases, the user typically makes many changes that are relatively small, which enables the incremental analysis to reduce the subsequent analysis cost [1].

We introduce a novel technique called iAlloy that incrementally computes the analysis results. iAlloy introduces a two-fold optimization for Alloy analyzer. Firstly, iAlloy comes with a static technique that computes the impact of a change on commands based on a lightweight dependency analysis, and selects for execution a subset of commands that may be impacted. We call this technique regression command selection (RCS), since it shares the spirit of regression test selection for imperative code [4] and adapts it to declarative models in Alloy. Secondly, iAlloy comes with a dynamic technique that uses memoization to enable solution reuse (SR) by efficiently checking if an existing solution already works for a command that must be executed. SR uses a partial-order based on sets of parameters in predicate paragraphs to enable effective re-use of solutions across different commands.

To evaluate iAlloy we conduct experiments using two sets of Alloy models that have multiple versions. One set, termed mutant version set, uses simulated evolving Alloy models where different versions are created using the MuAlloy [21, 27] tool for generating mutants with small syntactic modifications of the given base Alloy models. This set includes 24 base Alloy models and 5 mutant versions for each base model. The other set, termed real version set, uses base Alloy models that had real faults and were repaired using the ARepair [25, 26] tool for fixing faulty Alloy models. For each faulty base model, its evolution is the corresponding fixed model. This set includes 36 base Alloy models and 2 versions for each model.

The experimental results show that iAlloy is effective at reducing the overall analysis cost for both sets of subject models. Overall, iAlloy provides more than 50% command execution reduction on average, and up to 7x speed up. In addition, SR performs surprisingly well in the real version set with 58.3% reduction of the selected commands, which indicates that our approach is promising for incrementally analyzing real-world evolving Alloy models.

This paper makes the following contributions:

  • Approach. We introduce a novel approach, iAlloy, based on static analysis (regression command selection) and dynamic analysis (solution re-use) for incrementally analyzing evolving Alloy models, and embody the approach as a prototype tool on top of the Alloy analyzer.

  • Evaluation. We conduct an extensive experimental evaluation of our approach using two sets of subject Alloy models, one based on syntactic mutation changes and the other based on fault fixing changes. The results show that iAlloy performs well on both sets.

  • Dataset. We publicly release our subject Alloy models and their versions at the following URL: https://github.com/wenxiwang/iAlloy-dataset. Given the lack of common availability of Alloy models with evolution history, we believe that our dataset will be particularly useful for other researchers who want to evaluate their incremental analysis techniques for Alloy.

While our focus in this paper is the Alloy modeling language and tool-set, we believe our technique can generalize to optimize analysis for models in other declarative languages, e.g., Z [17] and OCL [2].

2 Background

In this section, we first introduce Alloy [5] based on an example which we use through the paper. Then, we describe MuAlloy [21, 27] – a mutation testing framework for Alloy, which we apply to create different versions of an Alloy model to simulate model evolutions. Finally, we briefly describe regression test selection (RTS) for imperative code. Although our regression command selection (RCS) applies to declarative code, the two methods share similar ideas.

2.1 Alloy

Alloy [5] is a declarative language for lightweight modeling and software analysis. The language is based on first-order logic with transitive closure. Alloy comes with an analyzer which is able to perform a bounded exhaustive analysis. The input of the Alloy analyzer is an Alloy model that describes the system properties. The analyzer translates the model into conjunctive normal form (CNF) and invokes an off-the-shelf SAT solver to search for solutions, i.e., boolean instances. The boolean instances are then mapped back to Alloy level instances and displayed to the end user.

Fig. 1.
figure 1

Dijkstra Alloy model from standard Alloy distribution (version 4.2); the line written in red was absent from the faulty version

Figure 1 shows the Dijkstra Alloy model which illustrates how mutexes are grabbed and released by processes, and how Dijkstra’s mutex ordering constraint can prevent deadlocks. This model comes with the standard Alloy distribution (version 4.2). An Alloy model consists of a set of relations (e.g., signatures, fields and variables) and constraints (e.g., predicates, facts and assertions) which we call paragraphs. A signature () defines a set of atoms, and is the main data type specified in Alloy. The running example defines 3 signatures (lines 3–6), namely , and .

Facts () are formulas that take no arguments and define constraints that must be satisfied by every instance that exists. The formulas can be further structured using predicates () and functions () which are parameterized formulas that can be invoked. Users can use Alloy’s built-in command to invoke a predicate and the Alloy analyzer either returns an instance if the predicate is satisfiable or reports that the predicate is unsatisfiable. The predicate (lines 12–14) is invoked by the predicate (line 16) and the command (line 53). The parameters of the predicate are and with signature types and , respectively. An assertion () is also a boolean formula that can be invoked by the built-in command to check if any counter example can refute the asserted formula. Assertions does not take any parameter. The assertion (lines 45–47) is invoked by the command (line 60) with a scope of up to atoms for each signature.

2.2 MuAlloy

MuAlloy [21, 27] automatically generates mutants and filters out mutants that are semantically equivalent to the original base model. Table 1 shows the mutation operators supported in MuAlloy. MOR mutates signature multiplicity, e.g., to . QOR mutates quantifiers, e.g., to . UOR, BOR and LOR define operator replacement for unary, binary and formula list operators, respectively. For example, UOR mutates to ; BOR mutates to ; and LOR mutates to . UOI inserts an unary operator before expressions, e.g., to . UOD deletes an unary operator, e.g., to . LOD deletes an operand of a logical operator, e.g., to . PBD deletes the body of an Alloy paragraph. BOE exchanges operands for a binary operator, e.g., to . IEOE exchanges the operands of operation, e.g., to .

Table 1. Mutation Operators Supported in MuAlloy

2.3 Regression Test Selection for Imperative Code

Regression test selection (RTS) techniques select a subset of test cases from an initial test suite. The subset of tests checks if the affected sources of a project continue to work correctly. RTS is safe if it guarantees that the subset of selected tests includes all tests whose behavior may be affected by the changes [4, 32]. RTS is precise if tests that are not affected are also not selected. Typical RTS techniques has three phases: the analysis phase selects tests to run, the execution phase runs the selected tests, and the collection phase collects information from the current version for future analysis. RTS techniques can perform at different granularities. For example, FaultTracer [35] analyzes dependencies at the method level while Ekstazi [3] does it at the file level, and both tools target projects written in Java.

During the analysis phase, RTS tools commonly compute a checksum, i.e., a unique identifier, of each code entity (e.g., method or file) on which a test depends. If the checksum changes, we view its source code as changed, in which case the test is selected and executed; otherwise it is not selected. The execution phase is tightly integrated with the analysis phase and simply executes selected tests. During the collection phase, RTS either dynamically monitors the test execution [3] or statically analyzes the test [7] to collect accessed/used entities, which are saved for the analysis phase in the next run.

3 Motivating Example

This section describes how iAlloy works using two versions of the Dijkstra Alloy model. Line 18 (highlighted in red) in Fig. 1 was absent in a faulty version of the model which we denote as Version 1. The model in Fig. 1 is the correct version which we denote as Version 2.

First, we apply iAlloy to Version 1. iAlloy invokes commands (line 52), (line 53), (line 54) and (line 55) with the SAT solver. Before invoking command (line 56), iAlloy finds that the solution obtained from invoking can be reused as the solution of . Therefore, command is solved without invoking SAT. iAlloy continues to invoke the rest of the commands and finds that command (line 58) can reuse the solution of , and command can reuse the solution of . Next, we apply iAlloy again to Version 2. iAlloy performs dependency analysis between Version 1 and Version 2, and only selects the commands that are affected by the change (Line 18 in Fig. 1), namely commands , , and . iAlloy tries to reuse the solutions of previous runs when invoking the four selected commands and reuses the solution of command in Version 1.

Traditionally, Alloy analyzer needs to execute 18 commands with expensive SAT solving, which takes total of 103.01 seconds. In comparison, iAlloy only invokes 9 commands where 5 commands are saved by regression command selection and 4 commands are saved by solution reuse. In total, iAlloy takes 84.14 seconds. Overall, iAlloy achieves 1.22x speed-up with 18.87 seconds time saving. Section 5 evaluates more subjects and shows that iAlloy achieves 1.59x speed-up on average and reduces unnecessary command invocations by more than 50%.

4 Techniques

In an evolving Alloy model scenario, we propose a two-step incremental analysis to reduce the time overhead of command execution. The first step is regression command selection (RCS) based on static dependency analysis (Sect. 4.1). The second step is solution reuse (SR) using fast instance evaluation (Sect. 4.2). Note that RCS handles paragraph-level dependency analysis, while SR covers more sophisticated expression-level dependency analysis.

figure a
figure b

Algorithm 1 shows the general algorithm of our incremental analysis. For each version (\(m_v\)) in a sequence of model evolutions (ModelVersionSeq), iAlloy first applies RCS (RCmdSelection) to select the commands (SelectCmdList) that are affected since the last version. Then, for each command in SelectCmdList, iAlloy further checks whether the solutions of previous commands can be reused in the new commands (CheckReuse). Note that the solutions of commands in the same version can also be reused. However, if the signatures change in the current version, then SR is not applicable and all commands are executed. If none of the old solutions can be reused for the current command c, then iAlloy invokes the SAT solver (Execute) to find a new solution which may be used for the next run.

4.1 Regression Command Selection (RCS)

Algorithm 2 presents the algorithm for RCS. iAlloy first gets the dependent paragraphs of each command (Cmd2DpdParagraphs) based on the dependency analysis (DpdAnalysis). For each command c in model version \(m_v\), iAlloy generates a unique identifier, as described in Sect. 2.3, for each dependent paragraph (CheckSum). If the checksum of any dependent paragraph changes, iAlloy selects the corresponding command as the command execution candidate (SelectCmdList) and updates the dependency with new checksum.

Fig. 2.
figure 2

Dependency graph for ShowDijkstra (left) and ReleaseMutex (right) command in the Dijkstra model

The dependency information of each command is the key for RCS. The dependency analysis for Alloy models can be either at the paragraph level or at the expression level. For safety reasons as we mentioned in Sect. 2.3, we do dependency analysis on the paragraph level in RCS. And we address further fine-grained expression level analysis in SR to achieve a better precision. To filter out the changes in comments and spaces, we traverse the AST of each paragraph and output the canonicalized string of the paragraph. The canonicalized string is hashed into a checksum which represents the unique version of the paragraph.

We take the Dijkstra Alloy model in Fig. 1 as an example. The dependency graph of command is shown in Fig. 2 (left), including transitively dependent Alloy paragraphs and their corresponding checksums . Since the checksum of predicate is changed (line 18 in Fig. 1) and is in the dependency graph of command , command is selected. In comparison, the dependency graph of command is shown in Fig. 2 (right). Since the checksums of both and do not change, command is not selected.

4.2 Solution Reuse (SR)

Algorithm 3 illustrates how iAlloy checks if a solution can be reused by the current command. The input to Algorithm 3 is each selected command (c) from RCS and a solution set containing all the previous solutions (SolutionSet). If the solution s from SolutionSet includes valuations of parameters of the Alloy paragraph (represented as CheckList which includes implicit Alloy facts) invoked by c (Sect. 4.2.1), and CheckList is satisfiable under s (Sect. 4.2.2), then s can be reused as the Alloy instance if c is invoked and c need not be invoked with expensive SAT solving (return true). Otherwise, SAT solving is involved to generate a new solution (if there is any) which is stored for subsequent runs (Algorithm 4, Sect. 4.2.3).

Note that SR not only filters out the semantically equivalent regression changes, but also covers the sophisticated expression-level dependency analysis. For example, suppose the only change in an Alloy model is a boolean expression changed from to where stands for disjunction and is another boolean expression, the old solution of the corresponding command is still valid and can be reused. Besides, SR allows solutions from other commands to be reused for the current command, which further reduces SAT solving overhead.

figure c

4.2.1 Solution Reuse Condition

As described in Sect. 2, each command invokes either a predicate or an assert. Each predicate has multiple parameter types which we denote as parameter set for simplicity in the rest of the paper. The parameter set of any assertion is an empty set (\(\varnothing \)). As shown in the following equation, we define the parameter set of a command c () as the parameter set of the directly invoked predicate () or assertion (\(\varnothing \)).

$$ \begin{aligned} c.param= {\left\{ \begin{array}{ll} ParamSet(c.pred), &{} c \text { is run command} \\ \varnothing , &{} c \text { is check command} \\ \end{array}\right. } \end{aligned} $$

A command that invokes an Alloy paragraph with parameters implicitly checks if there exists a set of valuations of the corresponding parameters that satisfies the paragraph. We observe that command \(c_2\) can reuse the solution \(s_1\) obtained by invoking \(c_1\) if the parameter set of \(c_2\) is a subset of that of \(c_1\), namely \(c_2.param \subseteq c_1.param\). The solution reuse complies to a partial order based on the subset relation of command parameters. On the other hand, solution \(s_1\) cannot be reused by \(c_2\) if \(c_2.param \subsetneq c_1.param\), in which case we do not know all the valuations of \(c_2\)’s parameters.

Figure 3 shows how solution reuse is conducted based on the subset relations of command parameter set in the Dijkstra model. For instance, since the parameter set (\(\varnothing \)) is the subset of all parameter sets above it, the corresponding commands , and with parameter set can reuse all solutions of commands whose parameter sets are the super set of , namely , , , and . Since any parameter set is a subset of itself, a solution s1 of command \(c_1\) can be reused by the command \(c_2\) which has the same parameter set as \(c_1\).

Fig. 3.
figure 3

Parameter relations of commands in the Dijkstra model

figure d

4.2.2 Solution Reuse Evaluation

Once a solution s can be reused for command c, we need to further check if s is actually the solution of c that satisfies the corresponding constraints. As described in Sect. 2, the constraints of a command come from all facts and the transitively invoked predicate/assertion. To reuse s in the old version, s must be satisfiable for c in the new version. If c is unsatisfiable under the valuations of s, it does not imply that c is unsatisfiable in the solution space and thus c must be invoked with SAT solving. The satisfiability of command c is determined by the Alloy built-in evaluator under the valuation of s.

4.2.3 Command Execution

If none of the solutions can be reused by command c, iAlloy executes the command as described in Algorithm 4. If a solution sol is found (), the solution sol together with the command c is saved for subsequent runs. To avoid saving too many solutions as the model evolves (which may slow down the SR and reduce the overall gain), we only keep the most recent solution for each command. In future work, we plan to evaluate how long a solution should be kept.

Fig. 4.
figure 4

Speedup results on Mutant Version Set

5 Experimental Evaluation

In this paper, we answer the following research questions to evaluate iAlloy:

  • RQ1: How does iAlloy perform compared to traditional Alloy Analyzer (which we treat as the baseline)?

  • RQ2: How much reduction of the commands executed does Regression Command Selection and Solution Reuse contribute in the two subject sets?

  • RQ3: What is the time overhead of Regression Command Selection, Solution Reuse and command execution in iAlloy, respectively?

Fig. 5.
figure 5

Speedup results on Real Version Model Set

5.1 Experimental Setup

Subjects: There are two subject sets in the experiment. The first set of subjects is the simulated evolving Alloy model version sets, which we call Mutant Version Set. In this set, we take 24 Alloy models from the standard Alloy distribution (version 4.2) and use them as the first version. For each model in version 1, we use MuAlloy [27] to generate several mutants and randomly select one as version 2. This process continues until we get the fifth version. Thus, each subject in the Mutant Version Set includes five versions. The second subject set is called Real Version Set. Each subject in this set consists of two model versions: the real faulty model (version 1) from the ARepair [26] distribution and the correct model after the fix (version 2). There are 36 subjects in this set.

Baseline: The baseline in this experiment is the traditional Alloy Analyzer, which executes each command for each version.

Platform: We conduct all our experiments on Ubuntu Linux 16.04, an Intel Core-i7 6700 CPU (3.40 GHz) and 16 GB RAM. The version of Alloy we did experiments on is version 4.2.

Table 2. RCS, SR and Command Execution Results in Mutant Version Set

5.2 RQ1: Speed-up Effectiveness

Figures 4 and 5 show the speedup of iAlloy compared to the baseline on Mutant Version Set and Real Version Set, respectively. The x-axis denotes the subject names and the y-axis denotes the speed up. In Mutant Version Set, iAlloy achieves speed-up for 19 subjects (75% of the subject set), with up to 4.5x speed-up and 1.79x on average. The reason iAlloy did not speed up on the remaining 5 subjects is that either the change is in the signatures or many commands are unsatisfiable under the previous solutions, where the analysis time overhead in iAlloy (RCS and SR) is larger than the savings. In Real Version Set, we observe that iAlloy achieves a speedup of up to 7.66x and 1.59x on average over all subjects except one (97% of the subject set). iAlloy does not save any time on because there exists a single command in the subject and the command is unsatisfiable (in which case neither RCS nor SR can save any command executions).

Table 3. RCS, SR and Command Execution Results in Real Version Set

5.3 RQ2: Command Selection and Solution Reuse Effectiveness

Columns 2–5 in Tables 2 and 3 show the total number of commands in each subject (cmd), the number of the selected commands and their percentage compared to the total number of commands (select), the number of solution reuse and their percentage in selected commands (reuse), and the number of actually executed commands and their percentage in selected commands (execute), for the Mutant and Real Version Set respectively. We can see that, both RCS and SR help reduce command execution in both subject sets, but to different extent. A smaller portion of commands are selected in Mutant Set (82.4%) than in Real Set (98.7%). This is due to the fact that there are more changes between versions in Real Set than in Mutant Set. However, smaller portion (41.7% vs. 55.4%) of the selected commands are executed and a larger portion (58.3% vs. 44.6%) of selected commands successfully reuse solutions in Real Set, comparing with Mutant Set. Besides, there are 54.3% command execution reduction (\(\dfrac{cmd-execute}{cmd}\)) in Mutant Set and 58.8% in Real Set. The result shows that iAlloy is promising in reducing the command executions in analyzing real world Alloy models as they evolve.

5.4 RQ3: Time Consumption

Columns 6–8 in Tables 2 and 3 present the percentage of time consumption in RCS (T_select), SR (T_reuse), and command execution (T_execute) in the Mutant Version Set and Real Version Set, respectively. We can see that in both subject sets, execution takes most of the time while RCS and SR are lightweight.

6 Related Work

A lot of work has been done to improve [20, 22, 24] and extend [10,11,12,13, 16, 19, 25, 28,29,30,31, 33] Alloy. We discuss work that is closely related to iAlloy.

Incremental Analysis for Alloy. Li et al. [9] first proposed the incremental analysis idea for their so-called consecutive Alloy models which are similar to the evolving models. They exploit incremental SAT solving to solve only the delta which is the set of boolean formulas describing the changed part between two model versions. Solving only the delta would result in a much improved SAT solving time than solving the new model version from scratch. Titanium [1] is an incremental analysis tool for evolving Alloy models. It uses all the solutions of the previous model version to potentially calculate tighter bounds for certain relational variables in the new model version. By tightening the bounds, Titanium reduces the search space, enabling SAT solver to find the new solutions at a fraction of the original solving time. These two approaches are the most relevant to our work that both focus on improving solving efficiency in the translated formulas. Whereas our incremental approach is to avoid the SAT solving phase completely, which is fundamentally different from existing approaches. In addition, Titanium has to find all the solutions in order to tighten the bounds, which would be inefficient when only certain number of solutions are needed.

Regression Symbolic Execution. Similar to the SAT solving applications such as Alloy analyzer, symbolic execution tools also face the scalability problems, in which case a lot of work has been done to improve the performance [6, 14, 23, 34]. The most closely related to our work is regression symbolic execution [14, 15, 34]. Similar to our RCS, symbolic execution on the new version is guided through the changed part with the previous versions. In addition, there is also work on verification techniques that reuses or caches the results [8, 18].

7 Conclusion and Future Work

In this paper, we proposed a novel incremental analysis technique with regression command selection and solution reuse. We implemented our technique in a tool called iAlloy. The experimental results show that iAlloy can speed up 90% of our subjects. Furthermore, it performs surprisingly well in models of the real faulty versions with up to 7.66 times speed up and above 50% command execution reduction. This indicates that iAlloy is promising in reducing time overhead of analyzing real-world Alloy models. In the future, we plan to extend iAlloy to support changes that involve Alloy signatures and perform a more fine-grained analysis to improve command selection.