Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

An array processing loop is a common occurrence in programs, and an assurance of reliability often requires the program developer to prove properties that are quantified over the elements of the array being processed. This is, in general, difficult because such programs have huge, at times infinite state space. So while static analysis techniques like array smashing and partitioning [4, 5, 11, 14, 16, 17] fail due to abstractions that are too coarse, attempts with bounded model checkers or theorem provers that are equipped with array theories [3, 8, 9, 15, 18, 22, 23] tend to fail for lack of scalability or their inability to synthesize the right quantified invariants.

In certain situations, the decidability of property checking of finite state programs can be used to prove properties of infinite state space programs. Consider a program P and a property \(\psi \) that can be transformed to an abstract finite state program \(P'\) and a property \(\psi '\), such that if the property \(\psi '\) holds in \(P'\) then the property \(\psi \) holds in P. Then \(P'\) can be analyzed for \(\psi '\) to show that \(\psi \) holds in P. In this paper we present such a transformation for programs which process arrays in loops. The property \(\psi \) is usually a \(\forall \) or a \(\exists \) property over elements of the array, but can also be a property over scalar variables modified in the loop. The transformation replaces the loop that manipulates an array of possibly large or even unknown size with a smaller loop that operates only on a few non-deterministically chosen elements of the array.

Fig. 1.
figure 1

Loop shrinking abstraction illustration

As an example, consider the program in Fig. 1(a). The loop in the program purportedly computes in a variable \(\texttt {m}\), the minimum element, denoted min, of an array \(\texttt {a}\). However, due to a programmer error at line 10 (\(\texttt {a}[\texttt {i}]\texttt {-1}\) instead of \(\texttt {a}[\texttt {i}]\)), the program actually computes the last value in the longest subsequence \(\texttt {a[}i_1\texttt {]}, \texttt {a[}i_2\texttt {]},\ldots ,\texttt {a[}i_p\texttt {]}\) of the array, such that \(\texttt {a}[i_1]\,=\,min\), and for any two consecutive elements \(\texttt {a}[i_k]\) and \(\texttt {a}[i_{k+1}]\) of the subsequence, \(\texttt {a}[i_{k+1}] \le \texttt {a}[i_k]+1\).Footnote 1 Notice that for ease of exposition, we have used a \(\forall \) to express universal quantification; in reality, a loop will be used instead. The property holds for the example because the longest subsequence of the array with the stated properties is {2, 2, 2}, and the last element happens to be the same as min. However, the assertion will fail if, for example, the last two elements of the array are changed to 3 and 5, so that the longest subsequence is now {2, 3}.

Abstraction based verifiers as well as bounded model checkers fail to verify this program when the array size is increased to 1000. For example, CBMC 5.8 [8] reports “out of memory”, when run with an unwinding count of 20. Abstraction based verifiers like SATABS 3.2 [9] and CPAchecker 1.6 [3] keep on iterating in their abstraction refinement cycle in search of an appropriate loop invariant, until they run out of memory. Therefore, it is worthwhile to look for an abstraction of the property checking problem for array processing loops that can be verified by a bounded model checker (BMC).

Observe that in this program, the assertion will hold if and only if, after the last index containing the minimum value min, no other index in \(\texttt {a}\) contains the value \(min+1\). This can be conservatively checked by examining for each pair of array indices, say k and \(k+j\), \(j>0\), whether \(\texttt {a}[k+j]\,=\,\texttt {a}[k]+1\). The computation is effected by selecting a pair of indices non-deterministically and executing in sequence the loop body with the loop index i first instantiated to k and then to \(k+j\). The resulting value of \(\texttt {m}\) can then be checked for the condition \(\texttt {m} \le \texttt {a}[k] \wedge \texttt {m} \le \texttt {a}[k+j]\). As we shall see later, it is helpful to think in terms of iteration numbers instead of array indices; the correspondence between the two for the present example is that the value at index i of the array is accessed at iteration number \(i+1\).

In other words, we compute \(\texttt {m}\) for every pair of iterations of the loop, and check if \(\texttt {m}\) satisfies the property for the chosen iterations. For example, the value of \(\texttt {m}\) computed for the iterations numbered 2 and 3 of the loop is 4, and the property restricted to these two iterations, \(\texttt {m} \le \texttt {a}[1] \wedge \texttt {m} \le \texttt {a}[2]\), is satisfied. On the other hand, if we change the last two elements to 3 and 5 then the property fails for the original program. However, we can now find a pair of iterations, namely 4 and 6, such that value of \(\texttt {m}\) calculated on the basis of just these two iterations will be 3, and it will not satisfy the corresponding property \(\texttt {m} \le \texttt {a}[3]\wedge \texttt {m} \le \texttt {a}[5]\), since \(\texttt {a}[3]\) is 2. In summary, if executing the loop for every sequence of two iterations [\(i_1\), \(i_2\)], \(i_2 > i_1\), establishes the property restricted to these iterations, then the property will also hold for the entire loop. Read contrapositively, if the given program does not satisfy the assertion, then there must be a sequence of two iterations for which the property will not hold. This is true irrespective of the size or the contents of the array in the program. Loops which exhibit this feature for iteration sequences of length k (k is 2 in this example) will be called shrinkable loops with a shrink-factor k.

We create a second program, shown in Fig. 1(b), that over-approximates the behaviour of the original with respect to the property being checked. The \(\texttt {while}\) loop is substituted with a loop that executes the non-deterministically chosen iteration sequence stored in the two-element array \(\texttt {it}\). The \(\texttt {while}\) loop in the original program, schematically denoted as \(\texttt {while}~(C)~B\), is replaced by a \(\texttt {for}\) loop that is equivalent to the unrolled program fragment \(\texttt {i=it[0]-1;if(}C\texttt {)\{} B\texttt {;i=it[1]-1;if(}C\texttt {)}B\texttt {\}}\). We call this \(\texttt {for}\) loop (or its unrolled equivalent) the residual loop for the iteration sequence \(\texttt {it}\). The \(\texttt {break}\) statement ensures that the chosen iteration numbers do not result in an out-of-bounds access of the array, and the \(\texttt {assume}\) statement ensures that exactly two iterations are chosen. Similarly, the given property is also substituted by a residual property quantified over array indexes corresponding to the same chosen iteration sequence. CBMC is able to verify the property on this transformed program, as the original loop, even with a changed bound of 1000, is now reduced to only two iterations. We call this method property checking by loop shrinking. Needless to say, the method can only be applied to a program if its shrinkability and shrink-factor are known. We develop a method to determine both using a BMC.

Thus the central idea, demonstrated in the rest of the paper, is that over-approximation using shrinkability is an effective technique to verify properties of programs that iterate over arrays of large or unknown size. Specifically, our contributions are:

  1. 1.

    We introduce and formalize a concept called shrinkability for loops that process arrays. We show formally that a shrinkable loop with shrink-factor k can be over-approximated by a loop that executes only k non-deterministically chosen iterations.

  2. 2.

    We provide an algorithm to find the shrink-factor k for which the loop is shrinkable.

  3. 3.

    We describe an implementation of the proposed abstraction.

  4. 4.

    We report experimental results showing the effectiveness of the technique on SV-COMP 2017 [2] benchmarks in the ArraysReach category.

2 Background

We shall present our ideas in the context of imperative programs that consist of assignment statements, conditional statements, while loops, and function calls. We assume that conditional expressions have no side effects. We restrict ourselves to goto-less programs with single-entry single-exit loops. This makes for an easier formal treatment of our method without losing expressibility.

Let \( Var \) be the set of variables in a program P and \( Val \) be the set of possible values which the variables in \( Var \) can take. A program state is a valuation of the variables in \( Var \) that is consistent with their declared types. It is represented by a map \(\sigma : Var \rightarrow Val \). \(\sigma (v)\) denotes the value of v in the program state \(\sigma \).

Property checking will be expressed in a formalism called a Hoare triple and denoted as \(\lbrace \varphi \rbrace P\lbrace \psi _{}\rbrace \). Here \(\varphi \) and \(\psi _{}\) are first order formulas representing sets of states, and P is a program. A Hoare triple is said to be valid if and only if starting from an initial state satisfying \(\varphi \), the execution of P terminates in a final state that satisfies \(\psi _{}\). In this paper we shall only consider programs that are deterministic and guaranteed to terminate. A fact that we shall make use of is that in the special case when \(\varphi \) represents a single program state \(\sigma _{}\). Since our programs are deterministic, \(\psi _{}\) also will be a unique single state. Therefore, the invalidity of \(\lbrace \sigma _{}\rbrace P\lbrace \psi _{}\rbrace \) is equivalent to the validity of \(\lbrace \sigma _{}\rbrace P\lbrace \lnot \psi _{}\rbrace \).

An iteration sequence is a strictly ascending sequence of numbers, representing iteration counts. Iterations of a loop are counted starting from 1. The notation \( i\!:\! T _{} \) will represent a sequence whose first element is i and the sequence comprising the rest of the elements is \( T _{}\). Given sequences \( U _{}\) and \( T _{}\), we shall use \( U _{} \sqsubset T _{}\) to mean that \( U _{}\) is a strict subsequence of \( T _{}\). Further, we shall write \(\mathcal {P}_{k}( T _{})\) to denote the set of all k-sized subsequences of a sequence \( T _{}\). For example, if \( T _{}=[1,2,5]\) then \(\mathcal {P}_{2}( T _{})=\{[1,2], [2,5], [1,5]\}\).

Loop acceleration [19] is a commonly used technique for finding loop invariants. It captures the effect of a loop through closed-form expressions that give the value of variables at the beginning of an iteration in terms of the initial state and the iteration count. Variables whose values can be expressed in this manner are called accelerable. For example, in the program of Fig. 1, the value of the variable \(\texttt {i}\) in the beginning of an iteration j is expressible as \(j\texttt {-}1\). We assume that we have available tools [12] to identify accelerable variables and their corresponding accelerating expressions. While our approach does not require us to identify all accelerable variables, the precision of the result does depend on the identification of as many accelerable variables as possible.

Our technique makes good use of bounded model checkers (BMCs). Industrial strength BMCs exist [8] and are widely used to detect property violations in safety critical software. Given a program P and a property \(\psi \), a BMC searches for a counterexample to \(\psi \) in executions of P whose length is bounded by some integer n. If it finds a counterexample to \(\psi \) within the bound, then it reports the program as being unsafe. However, if it does not find a counter example within the given bound, then the program cannot be regarded as either being safe or unsafe. BMCs are, therefore, very effective in finding bugs but not in proving properties.

3 Programs and Properties of Interest

We focus on programs that process arrays in loops that we assume always terminate. The property to be checked is encoded in a fragment of code that follows the array processing loop. If the property is expressed as a loop, we denote it in our discussion as a universally or existentially quantified formula over the elements of the array. As an illustration, the property checked in the motivating example is \(\forall j \;\!\!.\;\!\! 0 \le j<N \implies m \le a[j]\). Similarly, the program min2 of Fig. 3 checks for the property \(\exists j \;\!\!. \;\!\! 0\le j<S \wedge min\,=\,a[j]\). In particular, we consider program fragments \(R~;~Q~;~\psi _{}\), in which R is a simple loop possibly manipulating arrays, Q is a loop free (possibly empty) sequence of statements and \(\psi _{}\) is the property to be checked. We call R ;  Q as an array processing loop. In addition, we assume R has an upper bound on number of iterations which can be computed through static analysis [10]. The property \(\psi _{}\) is assumed to have at most one quantifier. We assume that the array-processing loop and the loop which checks the property have the same number of iterations. Finally, since the quantified variable ranges over a finite domain (iteration counts of a finite loop), it is useful to think of \(\psi _{}\) as a set of quantifier-free formulas, connected by conjunction in the case of \(\forall \) and disjunction in the case of \(\exists \).

Fig. 2.
figure 2

Residual loop for iteration sequence [2,4] for the program in Fig. 1

3.1 Residual Loop and Residual Property

Consider a program P consisting of an array processing loop \( L _{}\) \(\equiv \) \(\texttt {while(}C\texttt {)\{}B\texttt {\}}Q\) followed by code that checks the property \(\psi _{}\). Let \( T _{}\) = \([{j_1, j_2,..., j_n}]\) be an arbitrary iteration sequence of the loop. We define the residual loop for the iteration sequence \( T _{}\), denoted as \( L _{ T _{}}\), as the statements \(\lbrace S _{j_1}; S _{j_2}; \ldots S _{j_n}; \texttt {loop\_exit:}{\text {Q}} \rbrace \), where each \( S _{j_r}\) is \(\{A_{j_r}~;~\texttt {if(}\text {C}{} \texttt {)} \{B\}~;~\texttt {else~goto~loop\_exit};\}\). Here \(A_{j_r}\) is the sequence of statements assigning to each accelerable variable the corresponding expression defining its value at the beginning of iteration \(j_r\). Obviously, for \( T _{}=j: T _{}'\) with \( T _{}'\) being nonempty, \( L _{ T _{}}= S _{j}; L _{ T _{}'}\). As an illustration, the code fragment in Fig. 2 is the residual loop for the iteration sequence [2,4] for the program in Fig. 1(a). If the loop iterates for a maximum of N times, then \([{1,2,...,N}]\) will be called the complete iteration sequence of the loop. It is obvious that, the residual loop \( L _{[{1,2,...,N}]}\) represents an unrolling of \( L _{}\) and the two are semantically equivalent. Similarly, for the iteration sequence \( T _{}\) = \([{j_1,j_2,...,j_m}]\) and the property \(\psi _{}\), we define the residual property \(\psi _{ T _{}}\) as a conjunction or disjunction of a set of clauses \(\{\psi _{j_1}, \psi _{j_2}, \ldots , \psi _{j_m}\}\).

Let us represent the set of initial states at the beginning of the loop \( L _{}\) as \(\varphi \). Then the set of states at the beginning of an iteration numbered i would be given by \(sp(S_1;S_2;...S_{i-1}, \varphi )\), the strongest post-condition of \(S_1;S_2;...S_{i-1}\) wrt \(\varphi \). However, we sometimes have to estimate these set of states in the context of an arbitrary iteration sequence \( T _{}\) that contains iteration i and in which the sequence of iterations preceding i is not exactly known. Therefore, instead of the earlier exact calculation, we over-approximate the set of states at the beginning of iteration i, denoted \(\varphi _{i}\), through the recurrences \({\varphi }_{1}\,=\,\varphi \), and \({\varphi }_{i}\,=\,sp(S_{i-1}, {\varphi }_{i-1}) \cup {\varphi }_{i-1}\). The additional term \({\varphi }_{i-1}\) in the union accounts for the possibility that the iteration \({i-1}\) may not precede i in \( T _{}\), and therefore the set of states at the beginning of i should also include the states at the beginning of \(i-1\).

Fig. 3.
figure 3

Examples showing property loops

4 Shrinkability of Loops

We now characterize the conditions under which the behaviour of an array-processing loop \( L _{}\) with respect to a property \(\psi _{}\) can be over-approximated by a residual loop \( L _{ U _{}}\) with respect to the corresponding residual property \(\psi _{ U _{}}\), where the iteration sequence \( U _{}\) consists of fewer (non-deterministically chosen) iterations than the iterations in the original program, i.e. \( U _{} \sqsubset [{1,2,\ldots , N}]\).

Definition 1

(Shrinkable loops) Consider a program consisting of a loop \( L _{}\) and a property \(\psi _{}\) to be checked. Let \( T _{}\) represent the complete sequence of iterations of the loop. The loop is said to be shrinkable with respect to \(\psi _{}\) and with a shrink-factor k, \(0< k < | T _{}|\), if and only if, starting from any state \(\sigma _{} \in \varphi \), the loop \( L _{}\) satisfies \(\psi _{}\) whenever the residual loops \( L _{ U _{}}\) of each k-length subsequence \( U _{}\) of \( T _{}\) satisfy the corresponding residual property \(\psi _{ U _{}}\). Formally:

$$\begin{aligned} \forall \sigma _{} \in \varphi : ((\forall U _{} \in \mathcal {P}_k( T _{}): \{\sigma _{}\} L _{ U _{}} \{\psi _{ U _{}}\}) \implies \{\sigma _{}\} L _{} \{\psi _{}\}) \end{aligned}$$
(1)

It will often be useful to read the formal description above in a contrapositive manner, i.e. starting from a state in \(\varphi \), if the loop \( L _{}\) fails to satisfy \(\psi _{}\), then the failure is also witnessed by a k-length sequence \( U _{}\) whose residual loop \( L _{ U _{}}\) also fails to satisfy the corresponding residual property \(\psi _{ U _{}}\). Note that executions of both \( L _{}\) and \( L _{ U _{}}\) begin with the same state in \(\varphi \).

A shrinkable loop with a shrink-factor k will be called k-shrinkable. If we know that a loop is \(k\)-shrinkable, we can construct an abstract program that non-deterministically chooses an iteration sequence of size k, runs the residual loop and then checks the corresponding residual property. If the residual property holds, then shrinkability guarantees the correctness of the original program. However, a counter-example in the abstract program does not necessarily imply a violation of the property in the original program, except in situations described below.

In the absence of loop-carried dependences [1], the values assigned to variables that are not accelerable, in any iteration are independent of the values assigned in any other iteration. In addition, consider the case when the array elements accessed in ith iteration of the array-processing loop are also asserted in \(\psi _{i}\). In this situation, if the original program P violates the property \(\psi _{}\), in particular the clause, say \(\psi _{i}\), then the program consisting of the residual loop \( L _{[{\;i\;}]}\), constructed on the basis of the only iteration i, will also violate the residual clause \(\psi _{[{\;i\;}]}\). Thus a loop without loop-carried dependences is \(1\)-shrinkable. More significantly, if the property being tested for such programs is universal, the converse is also true, i.e. if the residual loop corresponding to a sequence consisting of a single iteration violates its residual property, then the original program will also not satisfy its specified property.

Note that according to Definition 1, if a program P satisfies its property \(\psi _{}\), then the loop constituting the program is \(k\)-shrinkable for any shrink-factor \(k>0\). Similarly, a loop with a bound of m iterations is trivially \(m\)-shrinkable. Obviously, if the shrink-factor is small, then the abstract program with a smaller length iteration sequence loads the verifier to a lesser extent and thus offers greater chances of verifier returning with an answer. Therefore, we are interested in finding shrink-factors that are much smaller than the loop bound.

However, finding out whether a loop is shrinkable is difficult as we illustrate through an example. Consider the two programs \(\texttt {min2}\) and \(\texttt {lmin}\) in Fig. 3 which are similar in structure and in the nature of what they compute. The program \(\texttt {min2}\) computes the minimum of the array and is correct with respect to the asserted property. Thus the loop in the program is \(k\)-shrinkable for all k from 1 to \(\texttt {S}\). The second program \(\texttt {lmin}\) is similar to our motivating example with a property that asserts that the final value of \(\texttt {m}\) does not exceed any array element by more than a value d. The reader can verify that this property does not hold for \(d < \texttt {S}-1\).Footnote 2 It turns out that the loop in \(\texttt {lmin}\) is shrinkable with a shrink-factor \(k\,=\,d+2\). This illustrates the difficulty of analytically finding whether a given loop is shrinkable, and based on the development in rest of this section, we shall suggest an empirical method in Sect. 5.

4.1 Identifying Shrinkable Loops

While Definition 1 lays down the consequences of a loop being shrinkable, it does not provide a convenient method to decide whether a loop is shrinkable and find the shrink-factor. To get around this problem, we first extend the notion of shrinkability from loops to arbitrary iteration sequences. We then identify the conditions under which the shrinkability of smaller iteration sequences (that are checked explicitly) would imply the shrinkability of larger iteration sequences and eventually of the entire loop.

Definition 2

(Shrinkable iteration sequence) Consider a program consisting of a loop \( L _{}\) and a property \(\psi _{}\) to be checked. Let \( T _{}\) be an iteration sequence, and let j be the first iteration in \( T _{}\). The sequence \( T _{}\) is \(k\)-shrinkable with respect to \(\psi _{}\), \(0< k < | T _{}|\), if and only if, starting from every state \(\sigma _{} \in {\varphi }_{j}\), the residual loop \( L _{ T _{}}\) satisfies the residual property \(\psi _{ T _{}}\) whenever the residual loops \( L _{ U _{}}\) of each k-length subsequences \( U _{}\) of \( T _{}\) satisfy the corresponding residual property \(\psi _{ U _{}}\). Formally:

$$\begin{aligned} \forall \sigma _{} \in {\varphi }_{j}: ((\forall U _{} \in \mathcal {P}_k( T _{}): \{\sigma _{}\} L _{ U _{}} \{\psi _{ U _{}}\}) \implies \{\sigma _{}\} L _{ T _{}} \{\psi _{ T _{}}\}) \end{aligned}$$
(2)

The only difference between the notion of shrinkability of a loop and an iteration sequence is the starting state \(\sigma _{}\), which, in this case, is from the set \({\varphi }_{j}\). Recall that \({\varphi }_{j}\) is an over-approximation of the set of states at the beginning of iteration j in the residual loop of any iteration sequence that contains j. As in the case of loops, by \(k\)-shrinkable sequence we shall mean a shrinkable sequence with shrink-factor k. It is obvious that, if the sequence consisting of all iterations of a loop is \(k\)-shrinkable then the loop itself is \(k\)-shrinkable.

As an illustration of an iteration sequence that is not shrinkable, consider the program \(\texttt {lmin}\) in Fig. 3(b) with \(d=0\). Consider the array \(\texttt {a}\) with its initial two elements as \(\{0,1\}\) and the iteration sequence \( T _{}=[1,2]\). The residual loop of \( T _{}\) computes \(\texttt {m}=1\) for which the residual property \(\psi _{ T _{}}\) does not hold (\(\texttt {m}>\texttt {a[0]}\)). However, the residual loop for every 1-length sequence satisfies its residual property, and thus \( T _{}\) is not \(1\)-shrinkable. Also notice that when \(d=0\), the program is the same as the motivating example in Fig. 1 except for array initialisation. Thus, from the observations in Sect. 1, every iteration sequence of length 3 is \(2\)-shrinkable.

4.2 Conditions Guaranteeing Shrinkability of Loops

We are interested in a method which guarantees that a loop is shrinkable by examining iteration sequences up to a given length. More specifically, we are interested in a pair of numbers n and a k, such that the \(k\)-shrinkability of all sequences of length between \(k+1\) and n would imply the \(k\)-shrinkability of any sequence longer than n—in particular, the complete sequence of iterations comprising the loop. If we can identify the conditions under which we can find such a pair, then our strategy would be to establish the \(k\)-shrinkability of sequences up to n empirically, and the \(k\)-shrinkability of all iteration sequences with lengths greater than n will follow.

Since empirical verification of \(k\)-shrinkability for all subsequences of length between \(k+1\) and n would be costly, we shall consider the case where \(n\,=\,k+1\), i.e. we shall empirically find a k such that all \(k+1\) length iteration sequences are \(k\)-shrinkable. The identified conditions will then ensure the \(k\)-shrinkability of sequences larger than \(k+1\). Notice that the generalization from \(k+1\) to larger sequences does not happen unconditionally. As an example, consider the program \(\texttt {lmin}\) in Fig. 3(b). For \(\texttt {d=2}\), all the iteration sequences of size 3 are \(2\)-shrinkable but not all sequences of size 4 are \(2\)-shrinkable.

To derive the required conditions, let us examine what it takes to ensure the \(k\)-shrinkability of a sequence of length \(k+2\), given the \(k\)-shrinkability of all sequences of length \(k+1\). For simplicity of exposition, we shall limit ourselves to conjunctive properties. The treatment for disjunctive properties is very similar, and we shall merely touch upon it later in this section.

Consider an iteration sequence \( T _{}\) of size \(k+2\), represented as \(j: T _{}'\). Obviously, \( T _{}'\) being of size \(k+1\), is \(k\)-shrinkable. Taking a contrapositive view of the condition for shrinkability, assume that starting from \(\sigma _{}\), the residual property \(\psi _{ T _{}}\) is violated for the program \( L _{ T _{}}\) i.e. \(\lbrace \sigma _{}\rbrace L _{ T _{}}\lbrace \lnot \psi _{ T _{}}\rbrace \) is true. Given that all sequences of length \(k+1\) are \(k\)-shrinkable, it suffices to find a subsequence \( T _{}'' \sqsubset T _{}\) of length \(k+1\) such that \(\lbrace \sigma _{}\rbrace L _{ T _{}''}\lbrace \lnot \psi _{ T _{}''}\rbrace \) is true. \(k\)-shrinkability will then ensure that there is a k-length subsequence \( U _{} \sqsubset T _{}'' \sqsubset T _{}\) such that \(\lbrace \sigma _{}\rbrace L _{ U _{}{}}\lbrace \lnot \psi _{ U _{}}\rbrace \). Let the state after the iteration j in the sequence be \(\sigma _{}'\). Clearly \(\lbrace \sigma _{}'\rbrace L _{ T _{}'}\lbrace \lnot \psi _{[j]} \vee \lnot \psi _{ T _{}'}\rbrace \) is true.

  1. 1.

    Consider the case when \(\psi _{ T _{}'}\) is violated. Since \( T _{}'\) is \(k\)-shrinkable, it is possible to find a k-length subsequence \( U _{}\) within \( T _{}'\) such that starting from \(\sigma _{}'\), \(\psi _{ U _{}}\) would be violated after \( L _{ U _{}}\). Now consider the iteration sequence \( T _{}''\,=\,j: U _{}\). Clearly, starting from \(\sigma _{}\), \(\psi _{ T _{}''}\) would be violated after executing \( L _{ T _{}''}\), and thus the \(k+1\)-length sequence that we want is \( T _{}''\).

  2. 2.

    Now suppose that \(\psi _{ T _{{}}}\) is violated only because the clause \(\psi _{ [\,j\,] }\) is violated. There are two subcases to be considered. In the first, assume that the violation of \(\psi _{ [\,j\,] }\) also shows up in the state after iteration \(k+1\). In this case the \( T _{}''\) that we want is the \((k+1)\)-length prefix of \( T _{}\).

  3. 3.

    The interesting case is when the violation of \(\psi _{ T _{{}}}\) is solely because of \(\psi _{ [\,j\,] }\), and this violation of \(\psi _{ [\,j\,] }\) does not show up in the state after iteration \(k+1\). In this case, the definition of shrinkability, in its current form, does not enable us to produce the required sequence \( T _{}''\). To remedy this, notice that for the subsequence \({ T _{}'}\), there is an iteration in the past, namely j, whose clause \(\psi _{ [\,j\,] }\) has been violated. If we revise the definition of \(k\)-shrinkability of iteration sequences (Definition 2) to ensure that this violation also shows up at the end of some k-length subsequence \( U _{}'\) of \({ T _{}'}\), then we are done. The required \(k+1\)-length subsequence \( T _{}''\) in this case would be \(j: U _{}'\) for which \(\lbrace \sigma _{}\rbrace L _{ T _{}''}\lbrace \lnot \psi _{ T _{}''}\rbrace \) would be satisfied.

We call the modification introduced above as past-preservation. The revised definition of shrinkability that includes past-preservation is presented below.

Definition 3

(Shrinkable iteration sequence, revised) Consider a program consisting of a loop \( L _{}\) and a property \(\psi _{}\) to be checked. Let \( T _{}\) be an iteration sequence, and let j be the first iteration in \( T _{}\). In addition, let i stand for any iteration before j. The sequence \( T _{}\) is \(k\)-shrinkable with respect to a property \(\psi _{}\), \(0< k< | T _{}|\), if and only if, starting from every state \(\sigma _{} \in {\varphi }_{j}\) the residual loop \( L _{ T _{}}\) satisfies \(\psi _{ T _{}} \wedge \psi _{ [\,i\,] }\) whenever the residual loops \( L _{ U _{}}\) of each k-length subsequences \( U _{}\) of \( T _{}\) satisfy the corresponding property \(\psi _{ U _{}} \wedge \psi _{ [\,i\,] }\). In other words:

$$\begin{aligned} \forall \sigma _{} \in {\varphi }_{j}, \forall 0 \le i < j: ((\forall U _{} \in \mathcal {P}_k( T _{}): \{\sigma _{}\} L _{ U _{}} \{\psi _{ U _{}} \wedge \psi _{ [\,i\,] }\}) \implies \{\sigma _{}\} L _{ T _{}} \{\psi _{ T _{}} \wedge \psi _{ [\,i\,] }\}) \end{aligned}$$
(3)

A contrapositive reading of the revised condition for shrinkability of \( T _{}\) says that if an execution of \( L _{ T _{}}\) with initial state \(\sigma _{}\) results in a violation of its residual property \(\psi _{ T _{}}\) or the clause \(\psi _{ [\,i\,] }\) corresponding to a past iteration i, then there exists a k-length subsequence \( U _{}\) of \( T _{}\) such that execution of \( L _{ U _{}}\) with the same initial state also violates \(\psi _{ U _{}}\) or \(\psi _{ [\,i\,] }\). Henceforth we will consider this to be the definition of shrinkability of iteration sequences.

As a technical point, notice that we include 0 as a possible value of a past iteration. Otherwise, any sequence that starts with iteration 1, would have an empty set of past iterations and the condition of \(k\)-shrinkability would be vacuously true for the sequence. We therefore include 0 as a past-iteration and define \(\psi _{ [\,0\,] }\) to be true. A pleasing consequence of this is when the iteration sequence consists of all the iterations of a loop, the revised definition that includes past-preservation also coincides with the definition of shrinkability of loops (Definition 1).

Consider the example \(\texttt {lmin}\) in Fig. 3(b) with \(\texttt {S}=5\) and \(d=1\). Not all sequences of length two are \(1\)-shrinkable by the revised definition. To see this, consider the case of an array \(\texttt {a}\) as \(\{2,1,2,3,4\}\). Let \( T _{}\) be [4,5] and take past iteration i as 1. Let \(\texttt {m}\) be 2 in a state \(\sigma _{}\). Then \(\psi _{ [\,1\,] }\) = \(\texttt {m}\le \texttt {a[0]}+1 \equiv \texttt {m} \le 3\). Clearly, starting from state \(\sigma _{}\), for the residual loops of size 1 subsequences \( U _{}\), i.e. [4] and [5], the resulting \(\texttt {m}\) will be 3 and 2 respectively and \(\psi _{ [\,1\,] } \wedge \psi _{ U _{}}\) is satisfied. But starting from the same state \(\sigma _{}\), the residual loop \( L _{ T _{}}\), will produce \(\texttt {m}=4\), and therefore \(\psi _{ [\,1\,] } \wedge \psi _{ T _{}}\) is not satisfied. On the other hand, it is easy to see that, for the same d, all the sequences of size 4 are \(3\)-shrinkable.

We now formally prove the result that we have been working towards: For a loop to be \(k\)-shrinkable, it is enough if every iteration sequence of size \(k+1\) is \(k\)-shrinkable. Our method of determining shrinkable and the shrink-factor will make use of this important result.

Theorem 1

An array processing loop is \(k\)-shrinkable with respect to a property \(\psi _{}\), if every iteration sequence of size \(k+1\) is \(k\)-shrinkable with respect to \(\psi _{}\).

Proof

To show that the loop is \(k\)-shrinkable, it is enough to show that the complete iteration sequence of the loop is \(k\)-shrinkable according to Definition 3. However, we shall show a stronger condition that all sequences of size greater than k are \(k\)-shrinkable. The proof is by induction on the length n of an iteration sequence \( T _{}\) of the loop. For the base case \(n\,=\,k+1\), the \(k\)-shrinkability of T is a given in the statement of the theorem. Now let n be greater than \(k+1\) and assume as the induction hypothesis that every sequence of length less than n is \(k\)-shrinkable. Let \( T _{}=j: T _{}'\). As usual, we take a contrapositive view of the shrinkability condition and assume that for some past iteration i of \( T _{}\), starting from a state \(\sigma _{} \in {\varphi }_{j}\), the property \(\psi _{ [\,i\,] } \wedge \psi _{ T _{}}\) fails after executing \( L _{ T _{}}\) i.e. \(\lbrace \sigma _{}\rbrace L _{ T _{}}\lbrace \lnot \psi _{ [\,i\,] } \vee \lnot \psi _{ T _{}}\rbrace \) is true. We show that there exists a k-sized subsequence \( U _{} \sqsubset T _{}\) such that \(\lbrace \sigma _{}\rbrace L _{ U _{}}\lbrace \lnot \psi _{ [\,i\,] } \vee \lnot \psi _{ U _{}}\rbrace \) is true.

Since \( L _{ T _{}}= S _{j}; L _{ T _{}'}\) and \(\psi _{ T _{}}=\psi _{[j]} \wedge \psi _{ T _{}'}\), we have \(\lbrace \sigma _{}\rbrace S _{j}; L _{ T _{}'}\lbrace \lnot \psi _{ [\,i\,] } \vee \lnot \psi _{[j]} \vee \lnot \psi _{ T _{}'}\rbrace \). Assume that starting with \(\sigma _{}\), the state reached after executing \( S _{j}\), the loop body for the iteration j, is \(\sigma _{1}\), i.e. \(\lbrace \sigma _{}\rbrace S _{j}\lbrace \sigma _{1}\rbrace \). We then have \(\lbrace \sigma _{1}\rbrace L _{ T _{}'}\lbrace \lnot \psi _{ [\,i\,] } \vee \lnot \psi _{ T _{}'}\rbrace \bigvee \lbrace \sigma _{1}\rbrace L _{ T _{}'}\lbrace \lnot \psi _{[j]} \vee \lnot \psi _{ T _{}'}\rbrace \). We show the existence of the desired \( U _{}\) by assuming that the first disjunct is true. Since i and j are both past iterations for \( T _{}'\), the proof in the case in which only the second disjunct is true is similar. Assume that the first iteration of \( T _{}'\) is \(j'\). Obviously \(\sigma _{1} \in {\varphi }_{j'}\). Since \( T _{}'\) is \(k\)-shrinkable, we must have a k-sized subsequence \( U _{}' \sqsubset T _{}'\) such that \(\lbrace \sigma _{1}\rbrace L _{ U _{}'}\lbrace \lnot \psi _{[i]} \vee \lnot \psi _{ U _{}'}\rbrace \) is true. It follows that and therefore are also true. Let \( T _{}''\) be \(j: U _{}'\). Obviously, \( T _{}'' \sqsubset T _{}\). Since the size of \( T _{}''\) is \(k+1\), \( T _{}''\) is \(k\)-shrinkable by the induction hypothesis and therefore there exists a k-sized subsequence such that \(\lbrace \sigma _{}\rbrace L _{ U _{}}\lbrace \lnot \psi _{[i]} \vee \lnot \psi _{ U _{}}\rbrace \) holds.      \(\blacksquare \)

For a disjunctive property \(\psi _{}\), the definition (3) of sequence shrinkability, changes as follows:

$$\begin{aligned} \forall \sigma _{} \in {\varphi }_{j}, \forall 0 \le i < j: ((\exists U _{} \in \mathcal {P}_k( T _{}): \{\sigma _{}\} L _{ U _{}} \{\psi _{ U _{}} \vee \psi _{ [\,i\,] }\}) \implies \{\sigma _{}\} L _{ T _{}} \{\psi _{ T _{}} \vee \psi _{ [\,i\,] }\}) \end{aligned}$$
(4)

Theorem 1 applies to disjunctive properties as well, and the proof is similar.

5 Determining Shrinkability and Property Checking

We now show how Theorem 1 can be used to empirically determine whether a given loop is shrinkable and also find the corresponding shrink-factor. Starting with 1, we repeatedly construct the program shown in Fig. 4 for successive values of \(\texttt {k}\), the candidate shrink-factor, and feed it to a bounded model checker for verification. If the program is verified to be correct for some value of \(\texttt {k}\), then Theorem 1 guarantees that the loop in the given program is \(\texttt {k}\)-shrinkable. The constructed program depends on \(\texttt {k}\), the loop \( L _{}\) and the property to be verified, \(\psi _{}\). The process stops when we either find a \(\texttt {k}\) for which the loop is shrinkable (success), or we reach a predefined limit l that is dependent on the available time and computing resources (failure). As we shall see in Sect. 6, the shrink-factors for shrinkable loops are usually small. This is a favourable situation, since programs with a smaller shrink-factors are relatively easier to verify than programs with larger shrink-factors.

5.1 Checking Shrinkability of an Iteration Sequence

Recall that according to Theorem 1, a loop is \(k\)-shrinkable, if every iteration sequence of length \(k+1\) is k shrinkable. In addition, with our assumption that the loop has a statically computable upper bound of number of iterations, the number of such iteration sequences will be finite. Given a candidate \(\texttt {k}\), the procedure \(\texttt {check\_loop}\) in Fig. 4 non-deterministically chooses an iteration sequence \( T _{}\) of length \(\texttt {k+1}\), and attempts to verify that \( T _{}\) is \(\texttt {k}\)-shrinkable. This is done in the procedure \(\texttt {check\_iter\_seq}\), which encodes the criterion for sequence shrinkability, as given by Definition 3. The construction shown applies to conjunctive properties; disjunctive properties can be handled in a similar manner.

Fig. 4.
figure 4

Program construction for determining shrinkability. Note that \(\texttt {X}\) and \(\texttt {X\_initial}\) are vectors of variables, and \(\texttt {nondet()}\), accordingly, generates a vector of values.

Assume that the given program consists of an array processing loop \( L _{}\) of the form \(\texttt {while(}C\texttt {)} \{ B \};Q\) followed by the assertion assert(\(\psi _{}\)). Let \(\texttt {X}\) denote the vector of variables which may be modified (by resolving dereferences, if any, using a safe points-to-analysis) in the loop body B. Recall that the implication in the criterion for shrinkability is required to hold for all states in \({\varphi }_{j}\), where j is the head of sequence \( T _{}\). The states in \({\varphi }_{j}\) are over-approximated by assigning non-deterministic values to \(\texttt {X}\) (through \(\texttt {X\_initial}\)). Thus our process of determining shrinkability is conservative and a future extension to this work would be a static analysis to obtain a better approximation of \({\varphi }_{j}\).

The loop in lines 6–9 checks the antecedent \((\forall U _{} \in \mathcal {P}_k( T _{}). \{\sigma _{}\} L _{ U _{}} \{\psi _{ U _{}} \wedge \psi _{ [\,i\,] }\})\) in the implication in the shrinkability condition (Definition 3), and stores the result in \(\texttt {c}\). This loop executes a maximum of \(k+1\) times, the number of subsequences of \( T _{}\) of size k. Line 10 checks the consequent \(\{\sigma _{}\} L _{ T _{}} \{\psi _{ T _{}} \wedge \psi _{ [\,i\,] }\}\) of the same implication, and stores it in \(\texttt {r}\). Finally line 11 checks the condition for shrinkability, given by the implication \(\texttt {c}\Rightarrow \texttt {r}\) itself. Observe that the residual loop for each subsequence \( U _{}\) and the residual loop for the sequence \( T _{}\) are all evaluated in the same state denoted by the values of the variables in X_initial. It is clear that the program shown in Fig. 4 can be automatically constructed for any given k, \( L _{}\), and \(\psi _{}\).

The fact that shrinkable loops usually have a low shrink-factor has two consequences for the procedure to determine shrinkability: (i) it allows us to keep the number l till which a program is tested for shrinkability at a low value without the fear of missing out many shrinkable programs, and (ii) since the \(\texttt {for}\) loop in lines 6–9 has a bound of \(k+1\), and k is smaller than l, the shrinkability testing procedure is fairly efficient.

5.2 Property Checking for Shrinkable Loops

Once we discover that the loop of a program is \(k\)-shrinkable, we construct an abstract program that consists of a program fragment to non-deterministically choose a k-sized iteration sequence \( T _{}\), a residual loop \( L _{ T _{}}\), and a residual property \(\psi _{ T _{}}\). The abstract program is submitted to a BMC for verification. The motivating example of Fig. 1 illustrates the nature of the abstract program, and it is easy to generalize and automate the process of abstraction to arbitrary programs that are within the scope of our method.

Since the quantified property is also encoded as a loop, the residual property can also be constructed as a residual of this loop. Consider a program with a loop \( L _{}\) for which the residual has to be constructed with respect to a k-length iteration sequence. Assume that the maximum iteration count of the loop is m. Let a[e] be an arbitrary expression involving an array a of size n. Also assume that the index expression e is accelerable and is of the form f(i), where \({i} \in [{1..m}]\) represents a particular loop iteration, and f is the acceleration function. The abstract program non-deterministically chooses a k-length iteration sequence, whose elements are in the range [1..m]. The iteration sequence is concretely represented as an array. A loop iterates over all the values of the iteration sequence. The expression a[e] in the loop body is replaced by the corresponding accelerable expression a[f(i)].

To make this clearer, consider the example in Fig. 5(a). Assume that the size \(\texttt {p}\) of the array is more than \((\texttt {n}+1)/2\). The loop initializes the array element \(\texttt {a[t]}\) with the value \(\texttt {2*t}\). Assume that the loop is \(\texttt {k}\)-shrinkable for some property. The maximum iteration count m for the loop is \((\texttt {n}+1)/2\). The code in Fig. 5(b), written in a C-like notation, is an abstract description of the residual loop. The call to \(\texttt {init}\) initializes the array \(\texttt {T}\) with a non-deterministically chosen \(\texttt {k}\)-length iteration sequence. The C-style comment indicates the constraints on the chosen iteration sequence \(\texttt {T}\). The conditions \(\texttt {1} \le \texttt {T[}l\texttt {-1]} < \texttt {T[}l\texttt {]}\) and \(\texttt {0} \le \texttt {2} * \texttt {(T[}l\texttt {]-1)} < \texttt {n}\) together ensure that the iteration sequence consists of increasing values in the range \([1\ldots m]\), and the condition \(\texttt {T[}l\texttt {]-1} < \texttt {p}\) ensures that the chosen values do not cause an out-of-bounds access of the array. The \(\texttt {for}\) loop covering lines 7 to 11 iterates over the elements in \(\texttt {T}\). Inside the loop body, \(\texttt {i}\) and \(\texttt {j}\) are computed through acceleration functions applied to the iteration numbers picked from \(\texttt {T}\).

In practice, the constraints on the values in \(\texttt {T}\) would be enforced programmatically, and this is shown in Fig. 5(c). Here an increasing sequence of values are chosen, and the constraint that the chosen values are in the range [1..m] is enforced through the the conditional \(\texttt {break}\). Similarly, the constraint that the index of \(\texttt {a}\) does not exceed its bound is enforced through the \(\texttt {assume}\) at line 8. Finally, \(\texttt {assume(l==k)}\) ensures that the residual indeed iterates \(\texttt {k}\) times and does not break out of the loop earlier.

Our method can also be used when the program consists of a cascaded series of simple loops that can be coalesced into one simple loop. To elaborate, let the program be \(\{Q_1; R_1; Q_2; R_2; Q_3; R_3\}\), where the \(Q_i\)s are loop-free statements and the \(R_i\)s are simple loops of the form \(\texttt {while}\; (C_i) \;\{ B_i\}\). Our method can handle such a program, if it can be transformed to a semantically equivalent program \(Q;\; \texttt {while}\; (C) \;\{ B_1 ; B_2; B_3 \}\) for some loop-free statements, Q, and condition C. Even this simple strategy enabled us to verify 50 of the 81 programs with non-nested multiple loops in the SV-COMP 2017 benchmark suite. However, our method, in its present form, cannot handle nested loops.

Fig. 5.
figure 5

Example illustrating the residual of a shrinkable loop. Program in (b) is an abstract description of the residual, presented for ease of explanation

6 Implementation and Measurements

The proposed abstraction has been implemented in a tool called VeriAbs [7]. Within the scope of our method, i.e. a single loop followed by the property to be checked, the tool supports most C constructs including pointers, structure, arrays, heaps and non-recursive function calls. It uses LABMC [12] to discover index expressions that can be accelerated and CBMC 5.8 as the bounded model checker to determine shrinkability of the loop and to check the residual property on the abstracted program. If a loop is not found shrinkable within a candidate shrink-factor of 5, we report the shrinkability of the loop to be unknown. Given a program with a shrinkable loop, if the verification of the corresponding abstract program succeeds on the residual property, the tool declares the original program to be correct with respect to the given property. On the other hand, if the verification of the abstract program fails and the loop in the program has no loop-carried dependencies, the original program is declared to be incorrect. Otherwise the tool indicates its inability to decide on the correctness of the program.

6.1 Experiments

An early version of the tool VeriAbs competed in the SV-COMP 2017 verification competition [2], where it ranked third amongst the 17 participating tools in the ArraysReach category. We have re-run the current version on the same benchmark. We ran the experiment on a machine with two i7-4600U cores @2.70 GHz and 8 GB RAM. ArraysReach consisted of 135 programs, of which 95 are correct and the remaining 40 incorrect with respect to their properties. Table (a) of Fig. 6 categorizes these programs. 42 of the 135 programs were beyond the scope of VeriAbs because they either contained nested loops (12 programs) or contained multiple loops which were not collapsible (30 programs). Out of the remaining 93 programs, 89 programs were \(1\)-shrinkable, 2 were \(2\)-shrinkable, and while our tool could not find the shrinkability of the remaining 2 programs, we manually found those to be non-shrinkable.

Figure 6(b) gives the verification results of the 91 shrinkable programs. All correct programs except one were verified successfully. Moreover, none of the 26 incorrect programs were declared to be correct, demonstrating the soundness of our tool. 23 of these 26 incorrect programs also had no loop carried dependency, and thus the tool could rightly declare these as being incorrect. For the remaining 3 programs our tool remained indecisive. The timing data shows the average time taken in verifying each program. As expected, the bulk of time is taken in determining shrinkability, as the BMC has to verify \(O(k^2)\) residual programs to determine that the shrink-factor is k, while property checking of the abstract program involves a loop with just k iterations. Given the limits of the machine configuration, the timings are reasonable.

Fig. 6.
figure 6

Experimental results for SV-COMP 2017 ArraysReach benchmarks

An interesting property of Veriabs is that, while it is limited by its ability to deal only with shrinkable loops, once a loop is discovered to be shrinkable, the method is impervious to either the existence or the size of loop bounds—increasing the loop bound does not cause an otherwise verifiable program to timeout. Comparison with the two tools that fared better than VeriAbs in the competition, namely ceagle [24] and smack [6], reveals interesting information. We selected four correct programs, one from each of the following categories, array copy, array initialisation, two index copying and finding minimum, of the test suite, and increased the array size considerably (from 100000 to 10000000). While both tools succeeded on the programs with the original array sizes, smack started timing out after the increase and ceagle either crashed or declared the programs to be incorrect. We surmise that the two tools are based on bounded model checking without any abstraction. In this respect, our tool performs better than these two tools that were placed ahead of ours in the competition.

7 Related Work

The various approaches to handle arrays have their roots in the types of static analyses used for property verification, namely: abstract interpretation, predicate abstraction, bounded model checking and theorem proving.

In abstract interpretation, arrays are handled using array smashing, array expansion and array slicing. In array smashing, all elements of an array are clubbed as a single anonymous element, with writes treated as weak updates. As a result it is imprecise. It cannot be used, for example, to verify the motivating example. In array expansion, array elements are explicated as a collection of scalar variables, and the resulting programs have fewer number of weak updates than array smashing. However, it works well only for small-sized arrays. A mix of smashing and expansion has been used in [4, 5] to prove that the program does not perform executions with undefined behaviours such as out-of-bounds array accesses. In array slicing, the idea is to track partitions of arrays based upon some criteria inferred from programs [11, 16, 17]. Each partition is treated as an independent smashed element. The approach is further refined in  [14] to introduce the notion of fluid updates, where a write operation may result in a strong update of one partition of the array and weak update of other partitions. In contrast to these approaches, our abstraction is based not only on the program but also on the associated property. By declaring an array-processing loop as k-shrinkable, we guarantee that an erroneous behaviour of the program with respect to the property can indeed be replayed on some k elements of the array.

Methods based on predicate abstraction go through several rounds of counterexample guided abstraction refinement (CEGAR). In each round a suitable invariant is searched based on the counter-example using Craig interpolants  [21]. Tools like SATABS [9] and CPAchecker [3] are based on this technique. To handle arrays, the approach relies on finding appropriate quantified loop invariants. However generating interpolants for scalar programs is by itself a hard problem. With the inclusion of arrays, which require universally quantified interpolants, the problem becomes even harder [20, 22]. Our method, in contrast, does not rely on the ability to find invariants. Instead, we find a bound on the number of loop iterations, and, in turn, the number of array elements that have to be accessed in a run of abstract program.

Theorem proving based methods generate a set of constraints, typically Horn clauses. The clauses relate invariants at various program points and the invariants are predicates over arrays. The constraints are then fed to a solver in order to find a model. However, these methods also face the same difficulty of synthesizing quantified invariants over arrays. A technique, called k-distinguished cell abstraction, addresses this problem by abstracting the array to only k elements. A 1-distinguished cell abstraction, for example, abstracts a predicate P(a) involving an array a by \(P'(i, a_i)\), where i and \(a_i\) are scalars. The relation between the two predicates is that \(P'(i, a_i)\) holds whenever P(a) holds and the value of a[i] is \(a_i\). The resulting constraints are easier to solve using a back end solver such as Z3 [13]. This technique and its variants appear in [22, 23] and in [15], where the term skolem constants is used instead of distinguished cells. We experimented with VAPHOR, a tool based on [23]. By way of comparison, we present two examples, one with a \(\exists \) property and the other with a \(\forall \) property. The first program computes the minimum of an array and asserts that the minimum is the same as some element in the array. The second program copies all but 1 elements from one large array to another. It then asserts that the copied elements are pairwise equal. While our tool could verify both examples, VAPHOR declared the first program to be incorrect with 1 and 2 distinguished cell abstraction and timed out on the second program.

A method that is properly subsumed by our method is [18]. This uses only one distinguished element called a witness element, and transforms a program to a loop free scalar program. This program is model-checked using a BMC. This approach works well on what authors call full array processing loops and such loops are a proper subset of our \(1\)-shrinkable loops.

8 Conclusion

We have proposed a fully automatic approach for property checking over array processing programs using loop shrinking. The approach enables us to verify properties over large or even unbounded loops by converting them to loops with a small finite bound. Towards this, we have defined a notion called shrinkability of a loop, and showed that arrays processed by \(k\)-shrinkable loops can be abstracted using only k elements. The abstracted program can then be checked using any bounded model checker as back-end. An important contribution of our method is an automated method to find out the required bound k. Although there are approaches that are based on abstracting an array by fewer elements, none of these provide a way to find out the number of elements that are sufficient to reason about the array. Our experiments have shown that the approach is powerful enough to handle a variety of array processing programs. As future work, we want to add a suitable refinement step to address false positives and extend our method to support nested loops and multi-dimensional arrays.