1 Introduction

In recent years, ic3  [6] has emerged as a leading algorithm for model checking hardware. It has been refined [10] and incorporated into state-of-the-art tools, and generalized to verify software [8, 12]. Our interest is that ic3 is amenable to parallelization [6], and promises new approaches to enhance the capability of model checking by harnessing the abundant computing power available today. Indeed, the original ic3 paper [6] described a parallel version of ic3 informally and reported on its positive performance. In this paper, we build on that work and make three contributions.

First, we formally present three variants – ic3sync, ic3async and ic3proof – of parallel ic3, and prove their correctness. All the variants have some common features: (i) they consist of a fixed number of threads that execute in parallel; (ii) each thread learns new lemmas and looks for counterexamples (CEXes) or proofs as in the original ic3; (iii) all lemmas learned by a thread are shared with the other threads to limit duplicated effort; and (iv) if any thread finds a CEX, the overall algorithm declares the problem unsafe and terminates.

However, the variants differ in the degree of inter-thread synchronization, and the frequency and technique for detecting proofs, making different trade-offs between the overhead and likelihood of proof-detection. Threads in ic3sync (cf. Sect. 4.1) synchronize after each round of new lemma generation and propagation, and check for proofs in a centralized manner. Threads in ic3async (cf. Sect. 4.2) are completely asynchronous. Proof-detection is decentralized and done by each thread periodically. Finally, threads in ic3proof are also asynchronous and perform their own proof detection, but more aggressively than ic3async. Specifically, each thread saves the most recent set of inductive lemmas constructed. When one of the threads finds a new set of inductive lemmas, it checks if the collection of inductive lemmas across all threads form an inductive invariant. Thus, in terms of increasing overhead (and likelihood) of proof-detection, the variants are ordered as follows: ic3sync, ic3async, and ic3proof. Collectively, we refer to all three variants as ic3par.

The runtime of ic3par is unpredictable (this is a known phenomenon [6]). In essence, the number of steps to arrive at a proof (or CEX) is sensitive to the thread interleaving. We propose to counteract this variance using a portfolio – run several ic3pars in parallel, and stop as soon as any one terminates with an answer. But how large should such a portfolio be? Our second contribution is a statistical analysis to answer this question. Our insight is that the runtime of ic3par should follow the Weibull distribution [20] closely. This is because it can be thought of as the minimum of the runtimes of the threads in ic3par, which are themselves independent and identically distributed (i.i.d.) random variables. According to the Extreme Value theorem [11], the minimum of i.i.d. variables converges to a Weibull. We empirically demonstrate the validity of this claim.

Next, we hoist the same idea to a portfolio of ic3pars. Again, the runtime of the portfolio should be approximated well by a Weibull, since it is the minimum of the runtime of each ic3par in the portfolio. Under this assumption, we derive a formula (cf. Theorem 5) to compute the portfolio size sufficient to solve any problem with a specific probability and speedup compared to a single ic3par. For example, this formula implies that a portfolio of 20 ic3pars has 0.99999 probability of solving a problem in time no more than the “expected time” for a single ic3par to solve it. We empirically show (cf. Sect. 6.3) that the predictions based on this formula have high accuracy. Note that each solver in the portfolio potentially searches for a different proof/CEX. The first one to succeed provides the solution. In this way, a portfolio utilizes the power of ic3par to search for a wide range of proofs/CEXes without sacrificing performance.

Finally, we implement all three ic3par variants, and evaluate them on benchmarks from the 2014 Hardware Model Checking Competition (HMCC14) and the Tip Suite. Using each variant individually, and in portfolios of size 20, we observe that ic3proof and ic3async outperform ic3sync. Moreover, compared to a purely sequential ic3, the variants are faster, providing an average speedup of over 6 and a maximum speedup of over 300. We also show that widening the proof search of ic3 by randomizing its SAT solver is not as effective as parallelization. In addition, we evaluate the performance of the parallel version of ic3 reported earlier [6], which we refer to as ic3par2010. Experimental results indicate that our parallelization approach is a good complement to ic3par2010, and overall outperforms it. Complete details are presented in Sects. 6.1, 6.2 and 6.3.

Next, we note that ic3par is paramaterized by the number of threads and SAT solvers. We empirically show that the parameter value affects performance of ic3par significantly, and the best parameter choice is located unpredictably in the input space. Thus, for any input problem, the best parameter choice is difficult to compute. However, we show empirically that a “parameter sweeping” [2] solver that executes a randomly selected ic3par variant with random parameters is competitive with the best ic3par variant with fixed parameters over a range of problems. Complete details are presented in Sect. 6.4.

For brevity, we defer proofs and other supporting material to an extended version of the paper [7]. The rest of the paper is organized as follows. Sect. 2 surveys related work. Sect. 3 presents preliminary definitions. Sect. 4 presents the three variants of parallel ic3. Sect. 5 presents the statistical analysis of the runtime of an ic3par solver, as well as a portfolio of such solvers. Sect. 6 presents our experimental results, and Sect. 7 concludes.

2 Related Work

The original ic3 paper [6] presents a parallel version informally, which we call ic3par2010, and shows empirically that parallelism can improve verification time. Our ic3par solvers were inspired by this work, but are different. For example, the parallel ic3 in [6] implements clause propagation by first distributing learned clauses over all solvers and then propagating them one frame at a time, in lock step. It also introduces uncertainty in the proof search by randomizing the backend SAT solver. Our ic3par solvers perform clause propagation asynchronously, and use deterministic SAT solvers. We also present each ic3par variant formally with pseudo-code and prove their correctness. In addition, we evaluate the performance of ic3par2010 empirically, and show that our parallelization approach provides a good complement to (and overall outperforms) it in terms of speedup. Finally, we go beyond the earlier work on parallelizing ic3  [6] by performing a statistical analysis of the runtimes of both ic3par solvers and their portfolios. Experimental results (cf. Sect. 6.1) indicate that a portfolio of ic3par solvers is more efficient than a portfolio composed of ic3 solvers with randomized SAT solvers.

A number of projects focus on parallelizing model checking [1, 35, 13, 17]. Ditter et al. [9] have developed GPGPU algorithms for explicit-state model checking. They do not report on variance in runtime, nor analyze it statistically like us, or explore the use of portfolios. Lopes et al. [15] do address variance in runtime of a parallel software model checker. However, their approach is to make the model checker’s runtime more predictable by ensuring that the counterexample generation procedure is deterministic. They also do not perform any statistical analysis or explore portfolios.

Portfolios have been used successfully in SAT solving [14, 16, 19, 22], SMT solving [21] and symbolic execution [18]. However, these portfolios are composed of a heterogeneous set of solvers. Our focus is on homogeneous portfolios of ic3par solvers and statistical analysis of their runtimes.

3 Preliminaries

Assume Boolean state variables V, and their primed versions \(V'\). A verification problem is (ITS) where I(V), \(T(V, V')\) and S(V) denote initial states, transition relation and safe states, respectively. We omit V when it is clear from the context, and write \(S'\) to mean \(S(V')\). Let \( Post (X)\) denote the image of X(V) under the transition relation T, i.e., \( Post (X) = (\exists V \centerdot X \wedge T)[V' \mapsto V]\). Let \( Post ^k(X)\) be the result of applying \( Post (\cdot )\) k times on X with \( Post ^0(X) = X\), and \( Post ^{k+}(X) = \bigcup \limits _{j \ge k} Post ^j(X)\). The verification problem (ITS) is safe if \( Post ^{0+}(I) \subseteq S\), and unsafe (a.k.a. buggy) otherwise. A “lemma” is a clause (i.e., disjunction of minterms) over V, and a “frame” is a set of lemmas.

A random variable X has a Weibull distribution with shape k and scale \(\lambda \), denoted \(X \sim \textsc {wei} (k, \lambda )\), iff its probability density function (pdf) \( pdf _X\) and cumulative distribution function (cdf) \( cdf _X\) are defined as follows:

$$\begin{aligned} pdf _X(x) = \left\{ \begin{array}{ll} \frac{k}{\lambda } (\frac{x}{\lambda })^{k-1} e^{-(\frac{x}{\lambda })^k} &{} \text{ if } x\ge 0 \\ 0 &{} \text{ if } x < 0 \end{array} \right.&~~~~~~~~~~~~~~~~~&cdf _X(x) = 1 - e^{-(\frac{x}{\lambda })^k} \end{aligned}$$

Let \(X_1, \dots , X_n\) be i.i.d. random variables (rvs) whose pdfs are lower bounded at zero, i.e., \(\forall x < 0 \centerdot pdf _{X_i}(x) = 0\). Then, by the Extreme Value theorem [11] (EVT), the pdf of the rv \(X = \min (X_1, \dots , X_n)\) converges to a Weibull as \(n \rightarrow \infty \). The “Gamma” function, \(\varGamma \), is an extension of the factorial function to real and complex numbers, with its argument decreased by 1, and is defined as follows: \(\varGamma (t) = \int _{x=0}^{\infty } x^{t-1}e^{-x}dx\).

Fig. 1.
figure 1

Pseudo-Code for ic3. Variables are passed by reference, and arrays are indexed from 0. This holds for all the pseudo-code in this article.

4 Parallelizing IC3

We begin with a description of the sequential ic3 algorithm. Figure 1 shows its pseudo-code. ic3 works as follows: (i) checks that no state in \(\lnot S\) is reachable in 0 or 1 steps from some state in I (lines 16–17); (ii) iteratively construct an array of frames, each consisting of a set of clauses, as follows: (a) initialize the frame array and flags (lines 18–19); (b) strengthen() the frames by adding new clauses (line 22); if a counterexample is found in this step (indicated by \( bug \) being set), ic3 terminates (line 24); (c) otherwise, propagate() clauses that are inductive to the next frame (line 26); if a proof of safety is found (indicated by an empty frame), ic3 again terminates (lines 27–28); (d) add a new empty frame to the end of the array (line 30) and repeat from step (b). In the rest of this paper we use the term “function” to mean a “procedure”, as opposed to a mathematical function. In particular, we use terms “pdf” and “cdf” to mean probability and cumulative distribution functions of random variables, respectively.

Definition 1

(Frame Monotonicity). A function is frame monotonic if at each point during its execution, \(\forall i \in [0,K-1] \centerdot f(i) \implies \tilde{f}(i)\) where \(\tilde{f}(i)\) is the value of f(i) when the function was called.

Correctness. Figure 1 also shows the invariants (indicated by @INV) before and after strengthen() and propagate(). Since strengthen() always adds new lemmas to frames, it is frame monotonic, and hence it maintains \(A_1\) and \(A_3\). It also maintains \(A_2\) since a new lemma \(\lnot m\) is added to frame \(F[l+1]\) (line 42) only if \(f(l) \wedge T \implies \lnot m'\) (line 41). Finally, when strengthen() returns, then either \( bug = \top \) (line 45), or \(f(K-2) \wedge T \implies S'\) (line 36). Hence \(\mathcal {I}_2\) is a valid post-condition for strengthen(). Also, propagate() is frame monotonic since it always pushes inductive lemmas forward (the order of the two statements at lines 56–57 is crucial for this). Hence, propagate() maintains \(A_1\) and \(A_4\) at all times. It also maintains \(A_2\) since a new lemma \(\alpha \) is added to frame \(F[i+1]\) (line 56) only if \(f(i) \wedge T \implies \alpha '\) (line 55). Hence \(\mathcal {I}_3\) is a valid post-condition for propagate(). Finally, note that \(A_4 \equiv A_3 \wedge f[\mathbf {K}-2] \implies S\). Hence, after \(\mathbf {K}\) is incremented, \(A_4\) becomes \(A_3\). Also, since the last frame is initialized to \(\emptyset \), \(A_1\) and \(A_2\) are preserved. Hence: \(\{{\mathcal {I}_3}\} \mathbf {F}[\mathbf {K}] := \emptyset ; \mathbf {K}:= \mathbf {K}+1; \{{\mathcal {I}_1}\}\). The correctness of ic3 is summarized by Theorem 1. Its proof is in the appendix of [7].

Theorem 1

If IC3() returns \(\top \), then the problem is safe. If IC3() returns \(\bot \), then the problem is unsafe.

We now present the three versions of parallel ic3 and their correctness (their termination follows in the same way as ic3  [6] – see Theorem 5 in the appendix of [7]).

Fig. 2.
figure 2

Pseudo-Code for \(\textsc {ic3sync} (n)\). Functions strengthen() and propagate() are defined in Fig. 1.

4.1 Synchronous Parallel IC3

The first parallelized version of ic3, denoted ic3sync, runs a number of copies of the sequential ic3 “synchronously” in parallel. Let \(\textsc {ic3sync} (n)\) be the instance of ic3sync consisting of n copies of ic3 executing concurrently. The copies maintain separate frames. However, for any copy, the frames of other copies act as “background lemmas”. Specifically, the copies interact by: (i) using the frames of all other copies when computing f(i); (ii) declaring the problem unsafe if any copy finds a counterexample; (iii) declaring the problem safe if some frame becomes empty across all the copies; and (iv) “synchronizing” after each call to strengthen() and propagate().

The pseudo-code for \(\textsc {ic3sync} (n)\) is shown in Fig. 2. The main function is IC3Sync(). After checking the base cases (lines 73–74), it initializes flags and frames (lines 75–76), and then iteratively performs the following steps: (i) run n copies \(\textsc {ic3} \) where each copy does a single step of strengthen() followed by propagate() (lines 79–81); (ii) check if any copy of ic3 found a counterexample, and if so, terminate (line 83); (iii) check if a proof of safety has been found, and if so, terminate (lines 85–86); and (iv) add a frame and repeat from step (i) above (line 88). Functions strengthen() and propagate() are syntactically identical to ic3 (cf. Fig. 1). However, the key semantic difference is that lemmas from all copies are used to define f(j) (lines 65–66). Global variables are shared, and accessed atomically. Note that even though all ic3 copies write to variable \( bug \), there is no race condition since they always write the same value (\(\top \)).

Correctness. The correctness of ic3sync follows from the invariants specified in Fig. 2. To show these invariants are valid, the main challenge is to show that if \(\mathcal {I}_4\) holds at line 78, then \(\mathcal {I}_5\) holds at line 82. Note that since strengthen() and propagate() are frame monotonic, they preserve \(B_1\) and \(B_3\). This means that \(B_1 \wedge B_3\) holds at line 82. Now suppose that at line 82, we have \(\lnot bug \). This means that each strengthen() called at lines 79–81 returned from line 36. Thus, the condition \(f(\mathbf {K}-2) \wedge T \implies S'\) was established at some point, and once established, it continues to hold due to the frame monotonicity of strengthen() and propagate(). Since \(B_4 \equiv B_3 \wedge (f(\mathbf {K}-2) \wedge T \implies S')\), we therefore know that \(B_1 \wedge B_4\) holds at line 82. Also, \(B_2\) holds at line 82 since a new lemma \(\alpha \) is only added to frame \(F_i[j+1]\) by strengthen() (line 42) and propagate() (line 56) under the condition \(f(j) \wedge T \implies \alpha '\). Note that once \(f(j) \wedge T \implies \alpha '\) is true, it continues to hold even in the concurrent setting due to frame monotonicity. Finally, the statement at line 88 transforms \(\mathcal {I}_6\) to \(\mathcal {I}_4\). The correctness of ic3sync is summarized by Theorem 2. Its proof is in the appendix of [7].

Theorem 2

If IC3Sync() returns \(\top \), then the problem is safe. If IC3Sync() returns \(\bot \), then the problem is unsafe.

Fig. 3.
figure 3

Pseudo-Code for \(\textsc {ic3async} (n)\). Functions strengthen() and propagate() are defined in Fig. 1.

4.2 Asynchronous Parallel IC3

The next parallelized version of ic3, denoted ic3async, runs a number of copies of the sequential ic3 “asynchronously” in parallel. Let \(\textsc {ic3async} (n)\) be the instance of ic3async consisting of n copies of ic3 executing concurrently. Similar to ic3sync, the copies maintain separate frames, interact by sharing lemmas when computing f(i), and declare the problem unsafe if any copy finds a counterexample. However, due to the lack of synchronization, proof detection is distributed over all the copies instead of being centralized in the main thread.

Figure 3 shows the pseudo-code for \(\textsc {ic3async} (n)\). The main function is IC3Async(). After checking the base cases (lines 102–103), it initializes flags (line 104), launches n copies of ic3 in parallel (line 105) and waits for some copy to terminate (the \(\diamond \) operator), and checks the flag and returns with an appropriate result (line 106). Function IC3Copy() is similar to IC3() in Fig. 1. The key difference is that lemmas from all copies are used to compute f(j) (lines 90–91).

Correctness. The correctness of ic3async follows from the invariants specified in Fig. 3. To see why these invariants are valid, note that \(C_1\) and \(C_3\) are always preserved due to frame monotonicity. If strengthen() returns with \( bug =\bot \), then it returned from line 36, and hence \(f(\mathbf {K}_i -2) \wedge T \implies S'\) was true at some point in the past and continues to hold due to frame monotonicity. Together with \(C_3\), this implies that \(C_4\) holds at line 119. Also, \(C_2\) holds at line 119 since a new lemma \(\alpha \) is only added to frame \(F_i[j+1]\) by strengthen() (line 42) and propagate() (line 56) under the condition \(f(j) \wedge T \implies \alpha '\). Note that once \(f(j) \wedge T \implies \alpha '\) is true, it continues to hold even under concurrency due to frame monotonicity. Hence, \(\mathcal {I}_8\) holds at line 119. Since \( bug \) is never set to \(\bot \) after line 104, this means that \(\mathcal {I}_9\) holds at line 121 even under concurrency. Finally, the statement at line 126 transforms \(\mathcal {I}_9\) to \(\mathcal {I}_7\). The correctness of ic3async is summarized by Theorem 3. Its proof is in the appendix of [7].

Theorem 3

If IC3Async() returns \(\top \), then the problem is safe. If IC3Async() returns \(\bot \), then the problem is unsafe.

Fig. 4.
figure 4

Pseudo-Code for \(\textsc {ic3proof} (n)\). Function strengthen() is defined in Fig. 1. Formulas \(f(j), \mathcal {I}_7, \mathcal {I}_8\), and \(\mathcal {I}_9\) are defined in Fig. 3.

4.3 Asynchronous Parallel IC3 with Proof-Checking

The final parallelized version of ic3, denoted ic3proof, is similar to ic3async, but performs more aggressive checking for proofs. Let \(\textsc {ic3proof} (n)\) be the instance of ic3proof consisting of n copies of ic3 executing concurrently. Similar to ic3async, the copies maintain separate frames, interact by sharing lemmas when computing f(i), and declare the problem unsafe if any copy finds a counterexample. However, whenever a copy finds an empty frame, it checks whether the set of lemmas over all the copies for that frame forms an inductive invariant.

The pseudo-code for \(\textsc {ic3proof} (n)\) is shown in Fig. 4. The main function is IC3Proof(). After checking the base cases (lines 148–149), it initializes flags (line 150), launches n copies of ic3 in parallel (line 151) and waits for at least one copy to terminate, and checks the flag and returns with an appropriate result (line 152). Function IC3PrCopy is similar to IC3 in Fig. 1, but calls propProof() instead of propagate() where, once an empty frame is detected (line 160), we check whether a proof has been found by collecting the lemmas for that frame (lines 161–162), and checking if these lemmas are inductive (line 163).

Correctness. The correctness of ic3proof follows from the invariants (whose validity is similar to those for ic3async) specified in Fig. 4. It is summarized by Theorem 4. The proof of the theorem is in the appendix of [7].

Theorem 4

If IC3Proof() returns \(\top \), then the problem is safe. If IC3Proof() returns \(\bot \), then the problem is unsafe.

5 Parallel ic3 Portfolios

In this section, we investigate the question of how a good portfolio size can be selected if we want to implement a portfolio of ic3pars. We begin with an argument about the pdf of the runtime of \(\textsc {ic3async} (n)\).

Conjecture 1

The runtime of \(\textsc {ic3async} (n)\) converges to a Weibull rv as \(n \rightarrow \infty \).

Argument: Recall that each execution of \(\textsc {ic3async} (n)\) consists of n copies of ic3 running in parallel, and that \(\textsc {ic3async} (n)\) stops as soon as one copy finds a solution. We can consider the runtime of each copy of ic3 to be a rv. Specifically, let rv \(X_i\) be the runtime of the i-th copy of ic3 under the environment provided by the other \(n-1\) copies. Recall that the pdf of \(X_i\) has a lower bound of 0, since no run of ic3 can take negative time. Also, for the sake of argument, assume that \((X_1, \dots , X_n)\) are i.i.d. since the interaction between the copies of ic3 is logical and symmetric. Finally, let X be the random variable denoting the runtime of \(\textsc {ic3async} (n)\). Note that \(X=\min (X_1, \dots , X_n)\). Hence, by the EVT, \(X \sim \textsc {wei} (k, \lambda )\) for large n.    \(\square \)

Fig. 5.
figure 5

Fitting \(\textsc {ic3par} (4)\) runtime to Weibull. First 5 examples are safe, next 5 are buggy; SAFE, BUG, ALL = average over safe, buggy, and all examples; \(\mu , \mu ^*\) = predicted, observed mean; \(\sigma , \sigma ^*\) = predicted, observed standard deviation.

A similar argument holds for ic3sync and ic3proof, and therefore their runtime should follow Weibull as well. Indeed, despite the assumption of \((X_1, \dots , X_n)\) being i.i.d., we empirically find that the runtime of \(\textsc {ic3par} (n)\) follows a Weibull distribution closely for even modest values of n. Specifically, we selected 10 examples (5 safe and 5 buggy) from HWMCC14, and for each example we:

  1. 1.

    Executed \(\textsc {ic3async} (4)\) “around” 3000 times (we actually ran each example 3000 times but some timed out – the exact number of timeouts varied across examples);

  2. 2.

    Measured the runtimes;

  3. 3.

    Estimated the k and \(\lambda \) values for the Weibull distribution that best fits these values (using maximum likelihood estimation and the R statistical tool); and

  4. 4.

    Computed the observed mean and standard deviation from the data, and the predicted mean and standard deviation from the k and \(\lambda \) estimates.

We repeated these experiments with ic3sync and ic3proof. The results are shown in Fig. 5. We see that in all cases, the observed mean and standard deviation is quite close to the predicted ones, indicating that the estimated Weibull distribution is a good fit for the measured runtimes. ic3async and ic3proof have similar performance, are and slightly faster overall than ic3sync, indicating that additional synchronization is counter-productive. The estimated k and \(\lambda \) values vary widely over the examples, indicating their diversity. Note that smaller values of \(\lambda \) mean a smaller expected runtime.

Determining Portfolio Size. Consider a portfolio of ic3pars. In general, increasing the size of the portfolio reduces the expected time to solve a problem. However, there is diminishing returns to adding more solvers to a portfolio in terms of expected runtime. We now express this mathematically, and derive a formula for computing a portfolio size to achieve an runtime with a target probability. Consider a portfolio of m ic3par solvers run on a specific problem. Let \(Y_i\) denote the runtime of the i-th ic3par. From previous discussion we know that \(Y_i \sim \textsc {wei} (k,\lambda )\) for some k and \(\lambda \). Therefore, the cdf of \(Y_i\) is: \( cdf _{Y_i}(x) = 1 - e^{-(\frac{x}{\lambda })^k}\). Note that \(Y_i\) refers to an instance of ic3par, whereas \(X_i\), used before, referred to a single thread (executing a copy of ic3) within an instance of ic3par.

Let Y be the rv denoting the runtime of the portfolio. Thus, we have \(Y = \min (Y_1, \dots , Y_m)\). More importantly, the cdf of Y is:

$$\begin{aligned} cdf _Y(x)= & {} 1 - (1 - cdf_{Y_1}(x)) \times \dots \times (1 - cdf _{Y_m}(x)) \\= & {} 1 - (e^{-(\frac{x}{\lambda })^k})^m = 1 - e^{-m(\frac{x}{\lambda })^k} = 1 - e^{-(\frac{xm^{\frac{1}{k}}}{\lambda })^k} \end{aligned}$$

Note that this means Y is also a Weibull rv, not just when \(m \rightarrow \infty \) (as per the EVT) but for all m. More specifically, \(Y \sim \textsc {wei} (k, \frac{\lambda }{m^\frac{1}{k}})\). Recall that if \(m=1\), then the expected time to solve the problem by the portfolio is \(E[Y_1]\). We refer to this time as \(t^*\), the expected solving time for a single ic3par. Recall the Gamma function \(\varGamma \). Since \(Y_1 \sim \textsc {wei} (k,\lambda )\), it is known that \(t^* = \lambda \varGamma (1 + \frac{1}{k})\). Now, we come to our result, which expresses the probability that a portfolio of m ic3pars will require no more than \(t^*\) to solve the problem.

Theorem 5

Let p(m) be the probability that \(Y \le t^*\). Then \(p(m) > 1 - e^{-\frac{m}{e^\gamma }}\) where \(\gamma \approx 0.57721\) is the Euler-Mascheroni constant.

Proof

We know that:

$$\begin{aligned}&p(m) = cdf _Y(t^*) = 1 - e^{-m(\varGamma (1+\frac{1}{k}))^k} = 1 - (\alpha (k))^m, \text{ where } \alpha (k) = e^{-(\varGamma (1+\frac{1}{k}))^k} \end{aligned}$$

Next, observe that \(\alpha (k)\) increases monotonically with k but does not diverge as \(k \rightarrow \infty \). For example, Fig. 11 in the appendix of [7] shows a plot of \(\alpha (k)\). Since we want to prove an lower bound on p(m), let us consider the limiting case \(k \rightarrow \infty \). It can be shown that (see Lemma 2 in the appendix of [7]): \(\lim _{k \rightarrow \infty } \alpha (k) = e^{-\frac{1}{e^\gamma }}\). In practice, as seen in Fig. 11 in the appendix of [7], the value of \(\alpha (k)\) converges quite rapidly to this limit as k increases. For example, \(\alpha (5) > 0.91 \cdot e^{-\frac{1}{e^\gamma }}\), and \(\alpha (10) > 0.95 \cdot e^{-\frac{1}{e^\gamma }}\). Since \(\forall k \centerdot \alpha (k) < e^{-\frac{1}{e^\gamma }}\), we have our result:

$$ {}p(m) > 1 - (e^{-\frac{1}{e^\gamma }})^m = 1 - e^{-\frac{m}{e^\gamma }} $$

   \(\square \)

Achieving a Target Probability. Now suppose we want p(m) to be greater than some target probability p. Then, from Theorem 5, we have:

$$\begin{aligned}&p = 1 - (e^{-\frac{1}{e^\gamma }})^m \iff 1 - p = e^{-\frac{m}{e^\gamma }} \iff \ln (1 - p) = -\frac{m}{e^\gamma } \\&\quad \quad \quad \quad \iff \ln (\frac{1}{1-p}) = \frac{m}{e^\gamma } \iff m = e^\gamma \ln (\frac{1}{1-p}) \end{aligned}$$

For example, if we want \(p = 0.99999\), then \(m \approx 20\). Thus, a portfolio of 20 ic3pars has about 0.99999 probability of solving a problem at least as quickly as the expected time in which a single ic3par will solve it. We validated the efficacy of Theorem 5 by comparing its predictions with empirically observed results on the HWMCC14 benchmarks. Overall, we find the observed and predicted probabilities to agree significantly. Further details are presented in Sect. 6.3.

Fig. 6.
figure 6

(a) p(mc, 4) for different values of c; (b) p(m, .5, k) for different values of k.

Speeding Up the Portfolio. To reduce the portfolio’s runtime below \(t^*\), we must increase m appropriately. In general, for any constant \(c \in [0,1]\), the probability that a portfolio of m ic3par solvers will have a runtime \(\le c \cdot t^*\) is given by:

$$ p(m,c,k) = 1 - e^{-m(c\cdot \varGamma (1+\frac{1}{k}))^k} $$

For \(c < 1\) we do not have a closed form for \(\lim \limits _{k\rightarrow \infty } p(m,c,k)\), unlike when \(c=1\). However, the value of p(mck) is computable for fixed m, c and k. Figure 6(a) plots p(mc, 4) for \(m = \{{1, \dots , 100}\}\) and \(c = \{{0.4, 0.5, 0.6}\}\). Figure 6(b) plots p(m, .5, k) for \(m = \{{1, \dots , 100}\}\) and \(k = \{{3, 4, 5}\}\). As expected, p(mck) increases with: (i) increasing m; (ii) increasing c; and (iii) decreasing k. One challenge here is that we do not know how to estimate k for a problem without actually solving it. In general, a smaller value of k means that a smaller portfolio will reach the target probability. In our experiments – recall Fig. 5 – we observed k-values in a tight range (1–10) for problems from HWMCC14. These numbers can serve as guidelines, and be refined based on additional experimentation.

6 Experimental Results

We implemented ic3sync, ic3async and ic3proof by modifying a publicly available reference implementation of ic3 (https://github.com/arbrad/IC3ref), which we call ic3ref. All propositional queries in ic3 are implemented by calls to minisat. We refer to the variant of ic3ref that uses a randomized minisat as ic3rnd. We use ic3rnd to introduce uncertainty in ic3 purely by randomizing the backend SAT solver. We performed two experiments – one to evaluate the effectiveness of the ic3par variants, and another to validate our statistical analysis of their portfolios. All our tools and results are available at http://www.andrew.cmu.edu/~schaki/misc/paric3.tgz.

Benchmarks. We constructed four benchmarks. The first was constructed by pre-processing the safe examples from HWMCC14 (http://fmv.jku.at/hwmcc14cav) with iimc (http://ecee.colorado.edu/wpmu/iimc), and selecting the ones solved by ic3ref within 900 s on a 8 core 3.4 GHz machine with 8GB of RAM. The remaining three benchmarks were constructed similarly from the buggy examples from HWMCC14, and the safe and buggy examples (without pre-processing) from the TIP benchmark suite (http://fmv.jku.at/aiger/tip-aig-20061215.zip). We refer to the four benchmarks as hwcsafe, hwcbug, tipsafe and tipbug, respectively.

SAT Solver Pool. The function f (cf. Fig. 14) is implemented in ic3ref by a SAT solver (minisat). A separate SAT solver \(S_i\) is used for each f(i). Whenever f(i) changes due to the addition of a new lemma to a frame, the corresponding solver \(S_i\) is also updated by asserting the lemma. To avoid a single SAT solver from becoming the bottleneck between competing threads in ic3par, we use a “pool” of minisat solvers to implement each \(S_i\). The solvers are maintained in a FIFO queue. When a thread requests a solver, the first available solver is given to it. When a lemma is added to the pool, it is added to all available solvers, and recorded as “pending” for the busy ones. When a busy solver is released by a thread, all pending lemmas are first asserted to it, and then it is inserted at the back of the queue. We refer to the number of solvers in each pool as \({SPSz}\).

6.1 Comparing Parallel ic3 Variants

Fig. 7.
figure 7

(a) Speedup of ic3sync, ic3async, ic3proof and ic3rnd compared to ic3ref; (b) Speedup of ic3par2010 compared to ic3ref2010.

These experiments were carried on a Intel Xeon machine with 128 cores, each running at 2.67 GHz, and 1TB of RAM. For each solver selected from \(\{{\textsc {ic3async} (4), \textsc {ic3sync} (4), \textsc {ic3proof} (4), \textsc {ic3rnd}}\}\) and each benchmark \(\mathcal {B}\), and with \({SPSz}=3\), we performed the following steps:

  1. 1.

    extract all problems from \(\mathcal {B}\) that are solved by ic3ref in at least 10 s; call this set \(\mathcal {B}^*\); the cutoff of 10 s was a tradeoff between problem complexity and benchmark size; our goal was to avoid evaluating our approach on very simple examples to limit measurement errors, and also to have enough examples for statistically meaningful results;

  2. 2.

    solve each problem in \(\mathcal {B}^*\) with ic3ref and also with a portfolio of 20 solvers, compute the ratio of the two runtimes; this is the speedup for the specific problem;

  3. 3.

    compute the mean and \(\max \) of the speedups over all problems in \(\mathcal {B}^*\).

Figure 7(a) shows the results obtained. In all cases, we see speedup. On this particular run, ic3proof performs best overall, with an average speedup of over 6 and a maximum speedup of over 300. As in the non-portfolio case (cf. Fig. 5) ic3proof and ic3async have similar performance, and are better than ic3sync. The pattern is followed for both safe and buggy examples. Finally, ic3rnd provides mediocre speedup (not just on the whole, but across all examples) indicating that parallelization enables better search for shorter proofs/CEXes compared to randomizing the SAT solver.

Fig. 8.
figure 8

Validating Theorem 5; (a) mean and standard deviation of ratios of predicted and observed probabilities; (b) scatter plot of predicted and observed probabilities.

6.2 Comparing ic3par2010

We compared the parallel version of ic3 reported in the original paper [6], which we refer to as ic3par2010, with our ic3par variants. We first downloaded the source codeFootnote 1 of ic3par2010. It comes with its own version of ic3 implemented in Ocaml, which we refer to as ic3ref2010. The parallelization in ic3par2010 is implemented via three Python scripts that invoke the ic3 binary. We modified these scripts to implement a solver with four copies of ic3 running in parallel. This was done for a fairer comparison with our ic3par results presented earlier which also used four copies of ic3 per solver. In addition, we made other changes to the scripts to make the solver more robust (e.g., replacing hard coded TCP/IP port numbers with dynamically selected ones). All experiments were done on the same machine as in Sect. 6.1.

While ic3ref was quite deterministic in its runtime, ic3ref2010 demonstrated random behavior in this respect. One source of this randomness is that ic3ref2010 randomizes the backend SAT solver. However, there could be other sources of randomness due to the management of the priority queue during strengthen(). We were unable to eliminate the randomness satisfactorily via command line options. Instead, we accounted for it by running experiments multiple times and computing the average. We computed the speedup of ic3par2010 using a similar process as for the ic3par variants. Specifically, we performed the following steps:

  1. 1.

    extract all problems from \(\mathcal {B}\) that are solved by ic3ref2010 in at least 10s; call this set \(\mathcal {B}^+\); note that \(\mathcal {B}^+\) differs from \(\mathcal {B}^*\) since the underlying solvers – ic3ref2010 and ic3ref – have different solving capability.

  2. 2.

    solve each problem in \(\mathcal {B}^+\) with ic3ref2010 twenty times and compute the average runtime (call this \(t_s\)) and also with ic3par2010 twenty times and compute the average runtime (call this \(t_p\)); compute the ratio \(\frac{t_s}{t_p}\); this is the speedup with ic3par2010 for that specific problem;

  3. 3.

    compute the mean and \(\max \) of the speedups over all problems in \(\mathcal {B}^+\).

Figure 7(b) shows the results obtained. Comparing with Fig. 7(a), we see that all three of our ic3par invariants provided considerably better speedups compared to ic3par2010 on the three benchmark groups hwcbug, tipsafe and tipbug. Indeed, for the tipsafe group as a whole, ic3par2010 does not provide a speedup. However, for the hwcsafe group, ic3par2010 provided better speedups. If we look at all the safe examples, then ic3par2010 edges out ic3par marginally. In contrast, for unsafe examples ic3par provides much better speedups. Overall, ic3proof performs best. In summary, portfolios of ic3par variants appear to be a good complement to ic3par2010, and a better option for unsafe examples.

6.3 Portfolio Size

These experiments were done on a cluster of 11 machines, each with 16 cores at 2.4 GHz, and between 48 GB and 190 GB of RAM. To validate Theorem 5, we compared its predictions to empirically observed results as follows (again using \({SPSz}=3\)):

  1. 1.

    Process each problem from Fig. 5 as follows.

  2. 2.

    Solve the problem b (\(\approx 3000\)) times using \(\textsc {ic3par} (4)\). This gives a set of runtimes \(t_1, \dots , t_b\). Fit these runtimes to a Weibull distribution to obtain the estimated value of k (the same as the second column of Fig. 5).

  3. 3.

    Compute \(\tilde{t} = \mathrm {mean}(t_1, \dots , t_b)\). This is the estimated average time for \(\textsc {ic3par} (4)\) to solve the problem.

  4. 4.

    Pick a portfolio size m. Start with \(m=1\).

  5. 5.

    Divide \(t_1, \dots , t_b\) into blocks of size m. Let \(B = \lfloor \frac{b}{m}\rfloor \). We now have B blocks of runtime \(T_1, \dots , T_B\), each consisting of m elements. Thus, \(T_1 = \{{t_1, \dots , t_m}\}\), \(T_2 = \{{t_{m+1}, \dots , t_{2m}}\}\), and so on. For \(i = 1, \dots , B\), compute \(\mu _i = \min (T_i)\). Note that each \(\mu _i\) is the runtime of a portfolio of m \(\textsc {ic3par} (4)\) solvers.

  6. 6.

    Let n(m) be the number of blocks for which \(\mu _i \le \tilde{t}\), i.e., \(n(m) = |\) \(\{{i \in [1,B] \mid \mu _i \le \tilde{t}}\}|\). Compute \(p^*(m) = \frac{n(m)}{B}\). Note that \(p^*(m)\) is the estimate of p(m) based on our experiments. Compute \(p(m) = 1 - (\alpha (k))^m\) (use k from Step 2). Compute \(\rho (m) = \frac{p^*(m)}{p(m)}\). We expect \(\rho (m) \approx 1\).

  7. 7.

    Repeat steps 5 and 6 with \(m = 2, \dots , 100\) to obtain the sequence \(\rho = \langle {\rho (1), \dots , \rho (100)}\rangle \). Compute the mean and standard deviation of \(\rho \).

Figure 8(a) shows the results of the above steps for each ic3par variant. We see that for each example, the mean of \(\rho \) is very close to 1 and its standard deviation is very close to 0, indicating that p(m) and \(p^*(m)\) agree considerably. Figure 8(b) shows a scatter plot of all \(p^*(m)\) values computed against their corresponding p(m). Note that most values are very close to the (red) \(x=y\) line, as expected.

Fig. 9.
figure 9

Heatmap of ic3proof runtimes for three examples. Deeper color of cell (is) indicates that \(\textsc {ic3proof} (i,s)\) solves the benchmark faster; n = total number of runs of \(\textsc {ic3proof} \) over all 64 values of (is).

6.4 Parameter Sweeping

ic3par has two parameters: number of ic3 threads and \({SPSz}\). We write \(\textsc {ic3par} \)(is) to mean an instance of ic3par with i ic3 threads and \({SPSz}= s\). Thus, \(\textsc {ic3par} (4,3)\) was used is all previous experiments. We observed in Sect. 6.1 that different ic3par variants perform the best for different benchmarks. We now evaluate the performance of ic3proof by varying i and s. These experiments were also done on our cluster (cf. Sect. 6.3). We begin with a conjecture about the relationship of runtime and parameter values.

Conjecture 2

The parameter value affects performance of ic3par significantly, and the best parameter choice is located unpredictably in the input space.

To investigate Conjecture 2, we measured the runtime of \(\textsc {ic3proof} (i,s)\) for each \((i,s) \in I \times S\) where \(I = S = \{{1, \dots , 8}\}\). We selected 16 examples from \(\mathcal {B}\). For each example \(\eta \), and each \((i,s) \in I \times S\), we executed \(\textsc {ic3proof} (i,s)\) on \(\eta \) “around” 3000 times (again, the exact number varied across examples due to timeouts) and computed the average runtime. This gives us the entry at (is) for the “heatmap” for \(\eta \). The heatmaps in Fig. 9 summarize our results for three of the benchmarks that we found to be representative. They support Conjecture 2, as average runtimes (indicated by the color depth of cells in the heatmaps) across the parameter space are varied. The depth of cells show no discernable pattern (e.g., do not increase with i or s), and the deepest cells are significantly more so than the lightest ones. This implies that: (i) selecting the best parameters for ic3proof would be quite beneficial; but (ii) this is a non-trivial problem.

Fig. 10.
figure 10

Parameter sweeping; Sync, Async, Proof, Sweep = average speedups over ic3ref for portfolios of 20 ic3sync (4,3), ic3async (4,3), ic3proof (4,3), and ic3sweep, respectively.

As a preliminary step to address this challenge, we ran portfolios of a solver that uses parameter sweeping [2]. Specifically, the solver (denoted ic3sweep) executes a randomly selected ic3par variant with a random (is) selected from \(I \times S\). We compared the average speedup (over 50 runs) of a portfolio of 20 ic3sweeps with the average speedup (over 50 runs) of portfolios of 20 of each of the three ic3par variants with fixed \((i,s) = (4,3)\). Figure 10 summarizes our results. We observe that in general ic3sweep is competitive with each of the ic3par variants (indeed, it performs best for the hardest examples from the safe and buggy categories). We believe that parameter sweeping shows promise as a strategy for real-life problems where good parameters would be difficult (if not impossible) to compute.

7 Conclusion

We present three ways to parallelize ic3. Each variant uses a number of threads to speed up the computation of an inductive invariant or a CEX, sharing lemmas to minimize duplicated effort. They differ in the degree of synchronization and technique to detect if an inductive invariant has been found. The runtime of these solvers is unpredictable, and varies with thread-interleaving. We explore the use of portfolios to counteract the runtime variance. Each solver in the portfolio potentially searches for a different proof/CEX. The first one to succeed provides the solution. Using the Extreme Value theorem and statistical analysis, we construct a formula that gives us a portfolio size to solving a problem within a target time bound with a certain probability. Experiments on HWMCC14 benchmarks show that the combination of parallelization and portfolios yields an average speedups of 6x over ic3, and in some cases speedups of over 300. We show that parameter sweeping is a promising approach to implement a solver that performs well over a wide range of problems of unknown difficulty. An important area of future work is the effectiveness of parallelization and portfolios in the context of software verification via a generalization of ic3  [12].