figure a

1 Introduction

Model checking techniques are widely used in proving design correctness, and have received unprecedented attention in the hardware design community [9, 16]. Given a system model M and a property P, model checking proves whether or not P holds for M. A model checking algorithm exhaustively checks all behaviors of M, and returns a counterexample as evidence if any behavior violates the property P. The counterexample gives the execution of the system that leads to property failure, i.e., a bug. Particularly, if P is a safety property, model checking reduces to reachability analysis, and the provided counterexample has a finite length. Popular safety checking techniques include Bounded Model Checking (BMC) [10], Interpolation Model Checking (IMC) [21], and IC3/PDR [12, 14]. It is well known that there is no “universal” algorithm in model checking; different algorithms perform differently on different problem instances [7]. BMC outperforms IMC on checking unsafe instances, while IC3/PDR can solve instances that BMC cannot and vice-versa. [19]. Therefore, BMC and IC3/PDR are the most popular algorithms in the portfolio for unsafety checking, or bug-finding.

Complementary Approximate Reachability (CAR) [19] is a SAT-based model checking framework for reachability analysis. Contrary to reachability analysis via IC3/PDR, CAR maintains two sequences of over- and under- approximate reachable state-sets. The over-approximate sequence is used for safety checking, and the under-approximate sequence for unsafety checking. CAR does not require the over-approximate sequence to be monotone, unlike IC3/PDR. Both forward (Forward-CAR) and backward (Backward-CAR) reachability analysis are permissible in the CAR framework. Preliminary results show that Forward-CAR complements IC3/PDR on safe instances [19].

We present, SimpleCAR, a tool specifically developed for evaluating and extending the CAR framework. The new tool is a complete rewrite of CARChecker [19] with several improvements and added capabilities. SimpleCAR has a lighter and cleaner implementation than CARChecker. Several heuristics that aid Forward-CAR to complement IC3/PDR are integrated in CARChecker. Although useful, these heuristics make it difficult to understand and extend the core functionalities of CAR. Like IC3/PDR, the performance of CAR varies significantly by using heuristics [17]. Therefore, it is necessary to provide a basic implementation of CAR (without code-bloating heuristics) that serves as a “bottom-line” performance measure for all extensions in the future. To that end, SimpleCAR differs from CARChecker in the following aspects:

  • Eliminates all heuristics integrated in CARChecker except a configuration option to enable a IC3/PDR-like clause “propagation” heuristic.

  • Uses UNSAT cores from the SAT solver directly instead of the expensive minimal UNSAT core (MUC) computation in CARChecker.

  • Poses incremental queries to the SAT solver using assumptions;

  • While CARChecker contributes to safety checking [19], SimpleCAR shows a clear advantage on unsafety checking.

We apply SimpleCAR to 748 benchmarks from the Hardware Model Checking Competition (HWMCC) 2015 [2] and 2017 [3], and compare its performance to reachability analysis algorithms (BMC, IMC, 4 \(\times \) IC3/PDR, Avy [22], Quip [18]) in state-of-the-art model checking tools (ABC, nuXmv, IIMC, IC3Ref). Our extensive experiments reveal that Backward-CAR is particularly suited for unsafety checking: it can solve 8 instances within a 1-h time limit, and 7 instances within a 8-h time limit not solvable by BMC and IC3/PDR. We conclude that, along with BMC and IC3/PDR, CAR is an important candidate in the portfolio of unsafety checking algorithms, and SimpleCAR provides an easy and efficient way to evaluate, experiment with, and add enhancements to the CAR framework. We identify 1 major bug and 48 errors in counterexample generation in our evaluated tool set; all have been reported to the tool developers.

2 Algorithms and Implementation

We present a very high-level overview of the CAR framework (refer [19] for details). CAR is a SAT-based framework for reachability analysis. It maintains two over- and under- approximate reachable state sequences for safety and unsafety checking, respectively. CAR can be symmetrically implemented either in the forward (Forward-CAR) or backward (Backward-CAR) mode. In the forward mode, the F-sequence (\(F_0, F_1, \ldots , F_i\)) is the over-approximated sequence, while the B-sequence (\(B_0, B_1, \ldots , B_i\)) is under-approximated. The roles of the F- and B- sequence are reversed in the backward mode. We focus here on the backward mode of CAR, or Backward-CAR (refer [19] for Forward-CAR)

Table 1. Sequences in Backward-CAR.

2.1 High-Level Description of Backward-CAR

A frame \(F_i\) in the F-sequence denotes the set of states that are reachable from the initial states (I) in i steps. Similarly, a frame \(B_i\) in the B-sequence denotes the set of states that can reach the bad states (\(\lnot P\)) in i steps. Let \(R(F_i)\) represent the set of successor states of \(F_i\), and \(R^{-1}(B_i)\) represent the set of predecessor states of \(B_i\). Table 1 shows the constraints on the sequences and their usage in Backward-CAR for safety and unsafety checking.

figure b

Let \(S(F) = \bigcup F_i\) and \(S(B) = \bigcup B_i\). Algorithm 1 gives a description of Backward-CAR. The B-sequence is extended exactly once in every iteration of the loop in lines 2–8, but the F-sequence may be extended multiple times in each loop iteration in lines 3–5. As a result, CAR normally returns counterexamples with longer depth compared to the length of the B-sequence. Due to this inherent feature of the framework, CAR is able to complement BMC and IC3/PDR on unsafety checking.

2.2 Tool Implementation

SimpleCAR is publicly available [5, 6] under the GNU GPLv3 license. The tool implementation is as follows:

  • Language: C++11 compilable under gcc 4.4.7 or above.

  • Input: Hardware circuit models expressed as and-inverter graphs in the aiger 1.9 format [11] containing a single safety property.

  • Output: “1” (unsafe) to report the system violates the property, or “0” (safe) to confirm that the system satisfies the property. A counterexample in the aiger format is generated if run with the -e configuration flag.

  • Algorithms: Forward-CAR and Backward-CAR with and without the propagation heuristic (enabled using the -p configuration flag).

  • External Tools: Glucose 3.0 [8] (based on MiniSAT [15]) is used as the underlying SAT solver. Aiger tools [1] are used for parsing the input aiger files to extract the model and property information, and error checking.

  • Differences with CARChecker [19]: The Minimal Unsat Core (MUC) and Partial Assignment (PA) techniques are not utilized in SimpleCAR, which allows the implementation to harness the power of incremental SAT solving.

3 Experimental Analysis

3.1 Strategies

Tools. We consider five model checking tools in our evaluation: ABC 1.01 [13], IIMC 2.0Footnote 1, Simplic3 [17] (IC3 algorithms used by nuXmv for finite-state systemsFootnote 2), IC3Ref [4], CARChecker [19], and SimpleCAR. For ABC, we evaluate BMC (bmc2), IMC (int), and PDR (pdr). There are three different versions of BMC in ABC: bmc, bmc2, and bmc3. We choose bmc2 based on our preliminary analysis since it outperforms other versions. Simplic3 proposes different configuration options for IC3. We use the three best candidate configurations for IC3 reported in [17], and the Avy algorithm [22] in Simplic3. We consider CARChecker as the original implementation of the CAR framework and use it as a reference implementation for SimpleCAR. A summary of the tools and their arguments used for experiments is shown in Table 2. Overall, we consider four categories of algorithms implemented in the tools: BMC, IMC, IC3/PDR, and CAR.

Benchmarks. We evaluate all tools against 748 benchmarks in the aiger format [11] from the SINGLE safety property track of the HWMCC in 2015 and 2017.

Error Checking. We check correctness of results from the tools in two ways:

  1. 1.

    We use the aigsim [1] tool to check whether the counterexample generated for unsafe instances is a real counterexample by simulation.

  2. 2.

    For inconsistent results (safe and unsafe for the same benchmark by at least two different tools) we attempt to simulate the unsafe counterexample, and if successful, report an error for the tool that returns safe (surprisingly, we do not encounter cases when the simulation check fails).

Platform. Experiments were performed on Rice University’s DavinCI cluster, which comprises of 192 nodes running at 2.83 GHz, 48 GB of memory and running RedHat 6.0. We set the memory limit to 8 GB with a wall-time limit of an hour. Each model checking run has exclusive access to a node. A time penalty of one hour is set for benchmarks that cannot be solved within the time/memory limits.

Table 2. Tools and algorithms (with category) evaluated in the experiments.

3.2 Results

Error Report. We identify one bug in simplic3-best3: reports safe instead of unsafe, and 48 errors with respect to counterexample generation in iimc-quip algorithm (26) and all algorithms in the Simplic3 tool (22). At the time of writing, the bug report sent to the developers of Simplic3 has been confirmed. In our analysis, we assume the results from these tools to be correct.

Fig. 1.
figure 1

Number of benchmarks solved by each algorithm category (run as a portfolio). Uniquely solved benchmarks are not solved by any other category.

Coarse Analysis. We focus our analysis to unsafety checking. Figure 1 shows the total number of unsafe benchmarks solved by each category (assuming portfolio-run of all algorithms in a category). CAR complements BMC and IC3/PDR by solving 128 benchmarks of which 8 are not solved by any other category. Although CAR solves the least amount of total benchmarks, the count of the uniquely solved benchmarks is comparable to other categories. When the wall-time limit (memory limit does not change) is increased to 8 h, BMC and IC3/PDR can only solve one of the 8 uniquely solved benchmarks by CAR. The analysis supports our claim that CAR complements BMC/IC3/PDR on unsafety checking.

Fig. 2.
figure 2

Number of benchmarks solved by every algorithm in a category. Distinctly solved benchmarks by an algorithm are not solved by any algorithm in other categories. The set union of distinctly solved benchmarks for all algorithms in a category equals the count of uniquely solved for that category in Fig. 1.

Granular Analysis. Figure 2 shows how each algorithm in the IC3/PDR (Fig. 2a) and CAR (Fig. 2b) categories performs on the benchmarks. simpcar-bp distinctly solves all 8 benchmarks uniquely solved by the CAR category (Fig. 1), while no single IC3/PDR algorithm distinctly solves all uniquely solved benchmarks in the IC3/PDR category. In fact, a portfolio including at least abc-pdr, simplic3-best1, and simplic3-best2 solves all 8 instances uniquely solved by the IC3/PDR category. It is important to note that SimpleCAR is a very basic implementation of the CAR framework compared to the highly optimized implementations of IC3/PDR in other tools. Even then simpcar-b outperforms four IC3/PDR implementations. Our results show that Backward-CAR is a favorable algorithm for unsafety checking.

Analysis Conclusions. Backward-CAR presents a more promising research direction than Forward-CAR for unsafety checking. We conjecture that the performance of Forward- and Backward-CAR varies with the structure of the aiger model. Heuristics and performance-gain present a trade-off. simpcar-bp has a better performance compared to the heuristic-heavy carchk-b. On the other hand, simpcar-bp solves the most unsafe benchmarks in the CAR category, however, adding the “propagation” heuristic effects its performance: there are several benchmarks solved by simpcar-b but not by simpcar-bp.

4 Summary

We present SimpleCAR, a safety model checker based on the CAR framework for reachability analysis. Our tool is a lightweight and extensible implementation of CAR with comparable performance to other state-of-the-art tool implementations of highly-optimized unsafety checking algorithms, and complements existing algorithm portfolios. Our empirical evaluation reveals that adding heuristics does not always improve performance. We conclude that Backward-CAR is a more promising research direction than Forward-CAR for unsafety checking, and our tool serves as the “bottom-line” for all future extensions to the CAR framework.