1 Introduction

Computational higher-order logics are widely used to reason about purely functional programs and form the basis of proof assistants such as ACL2 [12], Coq [8], and Isabelle [15]. Searching for models in such logics is useful both for refuting wrong conjectures and for testing — it is often faster and easier to test a property than to prove it. In this work we focus on a logic with algebraic datatypes and terminating recursive functions. Once proven terminating, these functions have a natural interpretation in any model as least fixpoints.

The typical use case is for the users to specify a property they believe to hold for the program they wrote and let a solver search for a (counter-)example until some resource is exhausted — time, patience, etc. Our goal is to build a tool that can be used for finding counter-examples in proof assistants. Figure 1 presents such a problem in TIP syntax [6] that defines natural numbers, lists, and operations on lists, where the (unsatisfiable) goal is to find a list of natural numbers that is a palindrome of length 2 with sum 3.

Fig. 1.
figure 1

Looking for impossible palindromes

In the functional programming community, tools such as QuickCheck [5] and SmallCheck [18] have been used to test conjectures against random values or up to a certain depth. Feat [10] is similar to SmallCheck but enumerates inputs by increasing size, rather than depth. However, QuickCheck is limited when invariants have to be enforced (e.g. red-blackness of trees), forcing users to write custom random generators, and SmallCheck and Feat can get lost quickly in large search spaces. Lazy SmallCheck (LSC) is similar to SmallCheck but relies on the lazy semantics of Haskell to avoid enumerating inputs that are not needed to evaluate the property. LSC is close to narrowing [1, 14], a symbolic approach that has ties to functional logic programming [11] and builds a model incrementally. Nevertheless, LSC and narrowing-based tools explore the space of possible inputs quite naively, making many counter-examples very hard to find. All these approaches lack a way of analyzing why a given search path failed.

Modern SMT solvers are often efficient in difficult combinatorial problems. They rely on a SAT solver to analyze conflicts and interleave theory reasoning with propositional choices. However, their focus is first-order classical logic, where symbols are neatly partitioned between theory symbols that have a precise definition and user-provided symbols that are axiomatized. When a user want to introduce their own parameterized operators, they must use quantifiers and full first order logic, where solvers are usually incomplete. Some work has been done on handling datatypes [4, 16] and recursive functions in SMT solvers such as CVC4 [17] or calling an SMT solver repeatedly while expanding function definitions as in Leon [19], but each reduction step (e.g. function call) is very expensive.

Bridging the gap between QuickCheck and SMT solvers is HBMC [7] (Haskell Bounded Model Checker — not published yet). HBMC progressively encodes the evaluation graph into propositional constraints (effectively “bit-blasting” recursive functions and datatypes), leveraging the powerful constraint propagations of modern SAT solvers. However, it suffers from the same weakness as SMT-based techniques: every evaluation step has to be encoded, then performed, inside the SAT solver, making computations slow.

We present a new technique, Satisfiability Modulo Bounded Checking (SMBC) that occupies a middle ground between narrowing and HBMC. On the one hand, it can evaluate terms more efficiently than pure bit-blasting although not quite as fast as native code; on the other hand it benefits from propositional conflict-driven clause learning (CDCL) of modern SAT solvers to never make the same bad choice twice. Two main components are involved: (i) a symbolic evaluation engine (Sect. 3), and (ii) a SAT solver with incremental solving under assumptions (Sect. 4). Those two components communicate following lazy SMT techniques [3]. Inputs are lazily and symbolically enumerated using iterative deepening (Sect. 5) to ensure fairness, but we use the ability of the SAT solver to solve under assumptions to avoid the costly re-computations usually associated with that technique. In addition, building on CDCL allows SMBC to sometimes prove the unsatisfiability of the problem, something evaluation-based tools are incapable of.

We can extend SMBC to support uninterpreted types and unspecified functions (Sect. 6). After presenting refinements to the calculus (Sect. 7) and an implementation (Sect. 8), we run some experiments (Sect. 9) to compare SMBC with some of the previously mentioned tools on various families of problems. Detailed proofs and additional content can be found in our report.Footnote 1

2 Logic

We consider a multi-sorted higher-order classical logic, without polymorphism. A finite set of mutually recursive datatypes \(d_1,\ldots ,d_k\) is defined by a system

$$ \left( d_i \mathrel {\overset{\tiny {\text {def}}}{=}} c_{i,1}(\overline{\alpha _{i,1}}) ~|~ \cdots ~|~ c_{i,n_i}(\overline{\alpha _{i,n_i}}) \right) _{i \in \{ 1, \ldots , k\} } $$

where the \(\overline{\alpha _{i,j}}\) are tuples of type arguments. We consider only standard models, in which the domain of a datatype is the set of terms freely generated from its constructors. Similarly, mutually recursive functions \(f_1,\ldots ,f_k\) are defined by a set of equations \(f_1(\overline{x_1}) \mathrel {\overset{\tiny {\text {def}}}{=}} t_1, \ldots , f_k(\overline{x_k}) \mathrel {\overset{\tiny {\text {def}}}{=}} t_k\) that we assume total and terminating. The term language comprises bound variables, datatype constructors, shallow pattern-matching over datatypes, \(\lambda \)-abstractions \(\lambda x:\tau .~ t\), and applications \(f~t_1~\ldots ~t_n\). Constructors are always fully applied. \(t\sigma \) is the application of a substitution over bound variables \(\sigma \) to t. \(\textsf {Bool} \mathrel {\overset{\tiny {\text {def}}}{=}} \{ \top , \bot \}\) is a special datatype, paired with tests \(\textsf {if}~a~b~c\) that are short for \(\textsf {case} ~a~\textsf {of}~\top \rightarrow b ~|~ \bot \rightarrow c~\textsf {end} \). A value is a \(\lambda \)-abstraction or constructor application. The operators \(\wedge : \textsf {Bool} \rightarrow \textsf {Bool} \rightarrow \textsf {Bool} \) and \(\lnot : \textsf {Bool} \rightarrow \textsf {Bool} \) have the usual logical semantics; evaluation of \(\wedge \) is parallel rather than the usual sequential semantics it has in most programming languages: \(t \wedge \bot \) reduces to \(\bot \) even if t is not a value. We will speak of parallel conjunction. Other boolean connectives are encoded in terms of \(\wedge \) and \(\lnot \). We also define an ad hoc polymorphic equality operator \(=\) that has the classic structural semantics on datatypes and booleans; comparison of functions is forbidden. An unknown is simply an uninterpreted constant which must be given a value in the model. This logic corresponds to the monomorphic fragment of TIP [6] or the extension of SMT-LIB [2] with recursive functions, with the additional assumption that they always terminate.

A data value is a term built only from constructor applications, bound variables, and \(\lambda \)-abstractions (without defined symbols, matching, or unknowns). The depth of a data value is recursively defined as 1 on constant constructors, \(1+\textsf {depth}(t)\) for \(\lambda x.~ t\), and \(1 + \max _{i=1\ldots n} \textsf {depth}(t_i)\) on constructor applications \(c(t_1,\ldots ,t_n)\).Footnote 2 A goal set G is a set of boolean terms. A model of G is a mapping from unknowns of G to data values, such that \(\bigwedge _{t \in G} t\) evaluates to \(\top \). The depth of a model is the maximal depth of the data values in it.

In the rest of this paper, t, u will represent terms, k will be unknowns, cd will be constructors, and e will stand for explanations (conjunctions of literals). We will use an injective mapping to propositional variables denoted .

3 Evaluation with Explanations

The semantics of our logic relies on evaluating expressions that contain tests, pattern matching, and (recursive) functions. Because expressions can contain unknowns, their reduction is influenced by assignments to these unknowns. We need an evaluator that keeps track of which choices were used to reduce a term. In this way, when a goal term reduces to \(\bot \), we know that this combination of choices is wrong.

In Fig. 2, we show the evaluation rules for terms, given a substitution \(\rho \) on unknowns. The notation means that t reduces to u in one step, with explanations e (a set of boolean literals), under substitution \(\rho \). We denote for the transitive reflexive closure of the reduction. We write \(t \!\downarrow _{\rho } \) (the normal form of t under \(\rho \)) for the unique term u such that and no rule applies to u. In a first approximation, ignoring the explanations, the rules correspond to a normal call-by-need evaluation strategy for the typed \(\lambda \)-calculus. This matches the definition of values given earlier: a value is a weak head normal form. It is possible to use environments instead of substitutions, carrying bindings in every rule, but we chose this presentation for reasons related to hash-consing, as often used in SMT solvers. The choice of call-by-need rather than call-by-value is justified by the maximal amount of laziness it provides in presence of unknowns: instead of waiting for function call arguments, matched terms, or test conditions to be fully evaluated (and therefore, for their unknowns to be fully decided in the partial model), we can proceed with only a weak head normal form.

Fig. 2.
figure 2

Evaluation rules under substitution \(\rho \)

The rules id and trans specify how explanations are combined in the reflexive transitive closure; The rule case reduces a pattern matching once the matched term is a value (i.e. starts with a constructor, by typing). The rule app allows to reduce the function term in an application (until it becomes a value, that is, a \(\lambda \)-abstraction); rule \(\beta \) is the regular \(\beta \)-reduction; rule def unfolds definitions (in particular, recursive definitions are unfolded on demand). The rule decision replaces an unknown with its value in the current substitution \(\rho \) (i.e. the partial model). The other rules define the semantics of boolean operators and equality. We forbid checking equality of functions as is it not computable.

Whether to use small-step or big-step semantics (i.e. reducing a term by one step if a subterm reduces, or waiting for the subterm to become a value) is of little importance for most cases. The only exception is the rules for conjunction, in which big-step semantics is required (i.e. \(a \wedge b\) does not always reduce when, e.g., a reduces). To see why, assume small-step semantics and consider and where \(a, b : \textsf {Bool} \). The following reduction

is imprecise because \(e_2\) is not actually needed for , \( e_1 \cup e_3\) is sufficient. The resulting explanation is not as general as it could be, and a smaller part of the search space will be pruned as a result.

Evaluation of a normal form t that is not a value in a substitution \(\rho \) is blocked by a set of unknowns \(\textsf {block}_{\rho }(t)\):

$$\begin{aligned} \textsf {block}_{\rho }(\lambda x.~ t)&= \emptyset \\ \textsf {block}_{\rho }(c(u_1,\ldots ,u_n))&= \emptyset \text { if}~c~\text {is a constructor} \\ \textsf {block}_{\rho }(f~ t)&= \textsf {block}_{\rho }(f) \\ \textsf {block}_{\rho }(\textsf {case} ~t~\textsf {of}~\ldots ~\textsf {end} )&= \textsf {block}_{\rho }(t \!\downarrow _{\rho } ) \\ \textsf {block}_{\rho }(k)&= \{ k \} \text { if}~k~\text {is an unknown} \\ \textsf {block}_{\rho }(a=b)&= \textsf {block}_{\rho }(a) \cup \textsf {block}_{\rho }(b) \\ \textsf {block}_{\rho }(\lnot a)&= \textsf {block}_{\rho }(a \!\downarrow _{\rho } ) \\ \textsf {block}_{\rho }(a\wedge b)&= \textsf {block}_{\rho }(a \!\downarrow _{\rho } ) \cup \textsf {block}_{\rho }(b \!\downarrow _{\rho } ) \end{aligned}$$

In some cases, the blocking unknowns are found in the normal form of subterms of t. This corresponds to the evaluation rules that wait for the subterm to become a value before reducing.

Lemma 1

(Uniqueness of values for ). If and where \(v_1\) and \(v_2\) are values, then \(v_1 = v_2\).

Proof

The rules are deterministic, and values are always normal forms since no rule applies to them.    \(\square \)

Lemma 2

If \(t = t \!\downarrow _{\rho } \) is a normal form, then \(\textsf {block}_{\rho }(t) = \emptyset \) iff t is a value.

Proof

By induction on the shape of t.    \(\square \)

4 Delegating Choices and Conflict Analysis to SAT

We now have evaluation rules for reducing terms given a substitution on unknowns but have not yet explained how this substitution is built. As in narrowing [1, 14], it is constructed by refining unknowns incrementally, choosing their head constructor (or boolean value) and applying it to new unknowns that might need to be refined in turn if they block evaluation.Footnote 3 However, in our case, the SAT solver will do the refinement of an unknown k once it has been expanded; the first time \(k:\tau \) blocks the evaluation of a goal g (i.e., \(k \in \textsf {block}_{\rho }(g)\)), some clauses are added to the SAT solver, forcing it to satisfy exactly one of the literals , where \(c_i\) is a constructor of \(\tau \). Once one of the literals is true in the SAT solver’s partial model — implying that \(\rho (k) = t_i\), as we will see next — evaluation of the goal g can resume using rule decision (in Fig. 2) and k is no longer blocking.

The state of the SAT solver is represented below as a pair \(M \parallel F\) where M is the trail (a set of literals not containing both l and \(\lnot l\)), and F is a set of clauses. The operation \(\textsf {subst}(M)\) extracts a substitution on unknowns from positive literals in the trail:

The interactions between the SAT solver and the evaluation engine are bidirectional. When the SAT solver makes some decisions and propagations, yielding the new state \(M \parallel F\), the substitution \(\textsf {subst}(M)\) is used to evaluate the goals in G. If all the goals evaluate to \(\top \), we can report M as a model. Otherwise, if there is a goal \(t \in G\) such that , M must be discarded. This is done by adding to F a conflict clause \(C \mathrel {\overset{\tiny {\text {def}}}{=}} \bigvee _{a \in e} \lnot a\) that blocks the set of choices in e. The SAT solver will backjump to explore models not containing e. Backjumping with clause C and state \(M \parallel F\) returns to a state \(M' \parallel F\) where \(M'\) is the longest prefix of M in which C is not absurd.

Lemma 3

(Monotonicity of Models). A model of G, expressed as a trail M, satisfies . No subset of M reduces \(\bigwedge _{t \in G} t\) to \(\bot \).

5 Enumeration of Inputs and Iterative Deepening

We have not specified precisely how to enumerate possible models. This section presents a fair enumeration strategy based on Iterative Deepening [13].

A major issue with a straightforward combination of our evaluation function and SAT solver is that there is a risk of non-termination. Indeed, a wrong branch might never be totally closed. Consider the goal \(p(b)\wedge a+b=\text {Z} \) with unknowns \(\{a,b\}\), where \(p(x) \mathrel {\overset{\tiny {\text {def}}}{=}} \textsf {case} ~x~\textsf {of}~\text {Z} \rightarrow \top ~|~ \text {S}(\_) \rightarrow \top ~\textsf {end} \) is trivial, and \(+\) is defined on Peano numbers by recursion on its left argument. Then making the initial choice \(b=\text {S}(b_2) \) (to unblock p(b)) and proceeding to refine a in order to unblock \(a+b=\text {Z} \) will lead to an infinite number of failures related to a, none of which will backjump past \(b=\text {S}(b_2) \).

To overcome this issue, we solve a series of problems where the depth of unknowns is limited to increasingly large values, a process inspired from iterative deepening. Because the SAT solver controls the shape of unknowns, we use special boolean literals to forbid any choice that causes an unknown to be deeper than n; then we solve under assumption . If a model is found, it is also valid without the assumption and can be returned immediately to the user. Otherwise, we need the SAT solver to be able to provide unsat cores — the subset of its clauses responsible for the problem being unsatisfiable — to make the following distinction: if contributed to the unsat core, it means that there is no solution within the depth limit, and we start again with (where ). The last case occurs when the conflict does not involve the assumption : then the problem is truly unsatisfiable (e.g., in Fig. 1).

The iterative deepening algorithm is detailed below, in three parts: (i) the main loop, in Algorithm 1; (ii) solving within a depth limit, in Algorithm 2; (iii) expanding unknowns, in Algorithm 3. These functions assume that the SAT solver provides functions for adding clauses dynamically (AddSatClause), adding a conflict clause (Conflict), performing one round of decision then boolean propagation (MakeSatDecision and BoolPropagate), and extracting unsat cores (UnsatCore). These functions modify the SAT solver state \(M \parallel F\). In practice, it is also possible to avoid computing unsat cores at line 7 in Algorithm 1, by checking for pure boolean satisfiability again, but without the depth-limit assumption. Most computations (including the current normal form of \(\bigwedge _{t\in G} t\)) can be done incrementally and are backtracked in case of conflict.

figure a
figure b
figure c

Theorem 1

(Termination). The function SolveUpTo in Algorithm 2 terminates.

Theorem 2

(Soundness). The function SolveUpTo in Algorithm 2 returns either Sat or Unsat. If it returns Sat, then the substitution \(\textsf {subst}(M)\) from the boolean trail is a model. If it returns Unsat, then there are no solutions of depth smaller than d.

Theorem 3

(Bounded Completeness). If there exists a model of depth smaller at most \(\textsc {Step}\left\lfloor {\small {\textsc {MaxDepth}}/{\textsc {Step}} }\right\rfloor \), then Algorithm 1 will return Sat.

Proof

The depth d is always a multiple of \(\textsc {Step}\). Let \(d_{\min } \le \textsc {MaxDepth}\) be the smallest multiple of \(\textsc {Step}\) such that there is a model of depth \(\le d_{\min }\). Iterations of the loop with \(d < d_{\min }\) return Unsat by soundness of SolveUpTo (Theorem 2); the iteration at depth \(d_{\min }\) returns Sat.    \(\square \)

5.1 Application to the Introductory Example

We illustrate our technique on an example.Footnote 4 Pick the same definitions as in Fig. 1, but with the goal set \(G \mathrel {\overset{\tiny {\text {def}}}{=}} \{ \textsf {rev}(l) = l,~\textsf {length}(l) =2,~\textsf {sum}(l) =2 \}\) where the unknown is a list l. Unlike in Fig. 1, this problem is satisfiable. Assuming \(\textsc {Step}=1\), we start solving under constraint . Under the empty substitution, G reduces to a set of terms all blocked by a pattern matching on l; expansion of l into \(\{ \text {Nil},~\text {Cons}(x_1,l_1) \}\) follows, where \(x_1:\textsf {Nat} \) and \(l_1:\textsf {List} \) are fresh unknowns. Suppose the SAT solver picks . G reduces to \(\{ \top , \bot , \bot \}\) with explanations , so the conflict clause is asserted, added to the partial model with no effect, and the solver backtracks.

The next boolean decision must be . Subsequently, G reduces to

$$ \{ \textsf {append}(\textsf {rev}(l_1) ,\text {Cons}(x_1,\text {Nil}) ) = \text {Cons}(x_1,l_1) ,~ \textsf {length}(l_1) = 1,~ x_1+\textsf {sum}(l_1) =2 \} $$

(more precisely, to a less readable version of these terms where function definitions are unfolded into some pattern matching). The resulting set is blocked both by \(l_1\) (in the first two terms) and \(x_1\) (in \(x_1+\textsf {sum}(l_1) \)). Expansion of these terms yields \(\{ \text {Nil},~\text {Cons}(x_2,l_2) \}\) and \(\{ \text {Z} ,~\text {S}(y_1) \}\), but the choice \(\text {Cons}(x_2,l_2) \) is blocked by . The solver must choose , which entails

The conflict clause triggers an Unsat result, but only because of the assumption .

The main loop (Algorithm 1) then proceeds to depth 2, and tries to solve the problem again under the assumption . The SAT solver can now pick . At some point, it will pick (the other choice is too deep), , . Other choices would reduce one of the goals to \(\bot \): once the shape of \(l_1\) is fixed by the length constraint, \(\textsf {rev}(l) =l\) reduces to \(x_1 = x_2\) and \(\textsf {sum}(l) =2\) becomes \(x_1+x_2 = 2\), and wrong choices quickly reduce those to \(\bot \). At this point, and we obtain the model \(l = \text {Cons}(1, \text {Cons}(1, \text {Nil}) ) \).

6 Extensions of the Language

6.1 Uninterpreted Types

Finding counter-example for programs and formalizations that use only recursive function definitions might still involve uninterpreted types arising from type Skolemization or abstract types. To handle those in SMBC, e.g. for a type \(\tau \), which corresponds to a finite set of domain elements denoted \(\textsf {elt}_{0}(\tau ) ,\textsf {elt}_{1}(\tau ) ,\ldots \) (domain elements behave like constructors for evaluation). We also introduce type slices \({\tau _{[0\ldots ]} , \tau _{[1\ldots ]} ,\ldots }\) where \( {\tau \mathrel {\overset{\tiny {\text {def}}}{=}} \tau _{[0\ldots ]} }\). Conceptually, a type slice \({ \tau _{[n\ldots ]} }\) corresponds to the subtype of \(\tau \) that excludes its n first elements: \( { \tau _{[n\ldots ]} } \mathrel {\overset{\tiny {\text {def}}}{=}} \{ \textsf {elt}_{n}(\tau ) , \ldots , \textsf {elt}_{\text {card}(\tau )-1}(\tau ) \} \). Then, we introduce propositional literals that will be given a truth value by the SAT solver; if is true, it means \(\tau _{[n\ldots ]} \equiv \emptyset \); otherwise, it means \(\tau _{[n\ldots ]} \equiv \{ \textsf {elt}_{n}(\tau ) \} \cup \tau _{[n+1\ldots ]} \). We assume . Expansion of some unknown \(k : \tau _{[n\ldots ]} \) yields the following boolean constraints:

where \(k' : \tau _{[n+1\ldots ]} \) is a fresh unknown belonging in the next slice of \(\tau \). To express constraints on \(\tau \), the input language provides finite quantifiers \(\forall x:\tau .~ F\) and \(\exists x:\tau .~ F\) (which abbreviates \(\lnot (\forall x:\tau .~ \lnot F)\)). The quantifier is interpreted with the following rules:

figure d

6.2 Functional Unknowns

With uninterpreted types often come functions taking arguments of uninterpreted types. We can also wish to synthesize (simple) functions taking booleans or datatypes as parameters. It is possible to build functions by refinement, using currying (considering only one argument at a time) depending on its argument’s type. Expansion of a functional unknown \(f : a \rightarrow b\) depends on a:

  • If \(a = \textsf {Bool} \), \(f \in \{ \lambda x.~ \textsf {if}~x~t_1~t_2 \}\) where \(t_1, t_2 : b\) are fresh unknowns of type b that are deeper than f.

  • If a is uninterpreted, f is \(\lambda x.~ \textsf {switch}(x, m)\) where m is a table mapping \(\left( \textsf {elt}_{i}(a) \right) _{i=0\ldots }\) to fresh unknowns of type b (built lazily, in practice) and switch is otherwise similar to \(\textsf {case} \).

  • If a is a datatype, f is either a constant function \(\lambda x.~ k_f\) where \(k_f\) is an unknown of type b or \(\lambda x.~ \textsf {case} ~x~\textsf {of}~c_i(\overline{y}) \rightarrow k_{i}~\overline{y} ~|~ \cdots ~\textsf {end} \) where each \(k_i\) is a fresh unknown taking the corresponding constructor’s arguments as parameters. The constant case is used to be able to build functions that only peek superficially at inputs — otherwise, all function descriptions would be infinite in the model. The choice between the two forms for f is performed by the SAT solver; the non-constant case might be blocked by depth constraints. If a is infinite, bounded completeness is lost immediately, as we cannot generate all functions \(a \rightarrow b\).

  • Otherwise, a is a function type and we should reject the initial problem.

7 Refinements to the Calculus

7.1 Multiple Conflict Clauses

Sometimes, a partial model causes a failure for several reasons: in the presence of parallel conjunction, both formulas can reduce to \(\bot \). It would be wasteful to keep only one reason, because all of them might be useful to prune other branches. In this case, instead of just picking one explanation and discard the others, as suggested in Fig. 2, we add a new explanation constructor, \( e_1 \mathrel \oplus e_2 \), that combines two unrelated explanations, such that \(\mathrel \oplus \) is associative and commutative and \( ( e_1 \mathrel \oplus e_2) \cup e_3 \equiv ( e_1 \cup e_3) \mathrel \oplus (e_2 \cup e_3) \). Intuitively, means that a evaluates to b under substitution \(\rho \) assuming the choices in \(e_1\) or in \(e_2\) are made — those choices are never incompatible, but they might not be the same subset of \(\textsf {subst}(M)\). We add a new rule for \(\wedge \):

figure e

In case of conflict , we obtain a set of conflict clauses \( \left\{ \bigvee _{a \in e_i} \lnot a ~|~ i \in I \right\} \) that will prune distinct parts of the partial model.

7.2 Unification Rules

Equality already has many rules, but we can optimize it further. In our implementation, relying on hash-consing, we simplify \(t = t\) into \(\top \) in constant time, even when t contains unassigned unknowns. We can optimize equality further in the special case where reduction leads to a term \(c(t_1,\ldots ,t_n) = k\) or \(k = c(t_1,\ldots ,t_n)\) where k is an unknown and c a constructor or domain element. This term reduces with no explanation to , where is a new term construct that requires \(c(u_1,\ldots ,{}u_n)\) to be one of the cases resulting from the expansion of k. In the \(\bot \) case, the explanation forces the SAT solver to pick \(c(u_1,\ldots ,u_n)\) instead of ruling out the wrong choice \(d(u_1,\ldots ,u_m)\); if there are more than two constructors, this forces directly the right choice instead of trying every wrong choice.

figure f

8 Implementation

We implemented SMBC in OCaml Footnote 5 using a modular SAT solverFootnote 6 that is flexible enough that we can add clauses dynamically and parameterize it with a theory solver. It also supports incremental solving under assumptions, which is necessary for the efficiency of the iterative deepening exploration. The core solver is around 3,200 lines long, including the term data structures, the symbolic evaluation and the main loop. This implementation is a prototype that can be used, but we believe it could be made much faster with more work and perhaps by using a lower-level language. The code is free software, under a permissive license.

Our description of evaluation rules in Fig. 2 is quite high-level and can be implemented in various ways.Footnote 7 We chose to represent terms as perfectly shared directed acyclic graphs in which binders and bound variables rely on De Bruijn indices. The perfect sharing diminishes memory usage and makes let statements superfluous. We store in every term a pointer to a pair (explanation, term) that stores the current normal form of this term, effectively implementing a crude form of memoization. Any assignment of this pair must be undone upon backtracking — in a similar way as in congruence closure algorithms [3]. Similarly, unknowns are records with mutable pointers to a list of possible cases (once they have been expanded) and to their current assignment, which is reverted during backjumping thanks to a central backtracking stack that is controlled by the SAT solver. A good representation of explanations is required for efficiency, because union will be performed very often during the evaluation of terms and should be as fast as possible.

In addition, the evaluation function performs acyclicity checks to prune impossible branches early, and aggressively caches the normal forms of terms, stashing their old value on the central backtracking stack. Since we follow the architecture proposed by Barrett et al. [3], SMBC can delegate all branching to the SAT solver. Every time a boolean decision is made by the SAT solver (followed by propagation), the evaluation engine is called so as to prune bad models early. It does so by re-evaluating the set of goals G, which must contain at least one term not reduced yet, and cannot contain \(\bot \) (see Algorithm 2). This re-evaluation is made faster by starting from the cached normal forms instead of the original goals. If all goals in G reduce to \(\top \), the model is valid; if one of them reduces to \(\bot \), the SAT solver immediately receives a conflict clause that will make it backtrack.

9 Experiments

We ran a few experiments to compare SMBC with other approaches, namely LSC, HBMC, CVC4 [17], and Inox, a standalone version of Leon [19]. We do not compare against QuickCheck, SmallCheck, or Feat, because they are not designed to solve such tightly constrained problems. All the data and the code of SMBC can be found at https://cedeela.fr/~simon/files/cade_17.tar.gz. For this experiment, we wrote some problems and borrowed some others from HBMC’s test suite. We tried to pick diversified benchmarks so as to expose the strengths and weaknesses of each tool. TIP does not come yet with an exhaustive set of satisfiable benchmarks that would rely primarily on recursive functions. Benchmarks from our previous work on CVC4 [17] are expressed in SMT-LIB rather than TIP and use quantified axioms instead of recursive definitions, which makes them hard to use in our purely computational setting. The same holds of SMT-LIB and TPTP in general.

The solvers were run on a 4-cores Intel i5 CPU with 60 seconds timeout and a limit of 8 GB of RAM. Below, we give some numbers in Table 1 and then analyse the results on some categories of problems. The second column of the table is the number of satisfiable and unsatisfiable problems. Categories out of scope are marked with “–”.

Table 1. Results of the Experiments

 

Expr. :

Given arithmetic expressions, an evaluation function and several flawed simplifications, the goal is to find an expression such that its simplification does not evaluate to the same term. Here HBMC and Inox shine, but SMBC and LSC have more trouble due to the large branching factor of the search tree.

Fold. :

Those examples are about synthesizing a function that distinguishes between lists by only looking at one element at a time (plus an accumulator). In other words, we fold a function f on all elements, and the goal is to pick f such that it can distinguish between close, but distinct, lists. This problem is outside the scope of all the tools the author knows about, simply because it combines an uninterpreted type with an unknown of function type, but SMBC has no problem synthesizing what is in essence a state machine transition function.

Palindromes. :

After defining unary natural numbers and lists, we look for lists that are palindromes (i.e., \(\textsf {rev}(l) = l\)) that have some additional constraint on their sum or length. Some of those problems are more difficult variations of the problem from Sect. 5.1. Some of the problems are satisfiable and some are unsatisfiable. For example, the goal in long_rev_sum2.smt2 is to disprove the existence of a palindrome of length 200 with sum 1; HBMC times out because there are too many computations, and LSC cannot detect unsatisfiability. Those problems are easy but the toplevel goal is a parallel conjunction that needs to be treated properly, which is why LSC fails to solve even the satisfiable instances.

Pigeon. :

A computational version of the classical pigeon hole problem, here with 4 holes for 5 pigeons. This requires handling uninterpreted types and unsatisfiable problems.

Regex. :

Basic regular expressions are represented by a datatype featuring constants, star, and disjunction. The goal is generally to find a regular expression that matches a given string. Here, LSC shines and HBMC is in trouble, because (comparatively) many computations are required to check each input. SMBC has a good success rate here, even though its relatively naive interpreter is much slower than LSC’s native compiled code.

Sorted. :

Some problems about finding sorted lists of natural numbers that have additional properties (on their length, reverse, sum, etc.). The problems are fairly easy, but some of them are unsatisfiable.

Sudoku. :

A sudoku is represented as a list of lists of a datatype with 9 constructors. Some functions to check whether the sudoku is valid (no duplicate in any line, column or block) are defined, an initial state is given, and the goal is simply to solve the sudoku. This is also a combinatorial problem on which HBMC takes only 2 s, SMBC takes 12 s, and LSC times out. Here, it pays to bit-blast because the SAT solver can propagate constraints among the sudoku cells.

Type Checking. :

This example comes from the HBMC draft report [7]. Terms of the simply typed \(\lambda \)-calculus are defined by a datatype (variables being mapped to De Bruijn indices), along with a type-checking function that takes a term t, a type \(\tau \) and an environment \(\varGamma \) (i.e. a list of types), and returns \(\top \) iff \(\varGamma \vdash t :\tau \) holds. The goal is to find a term that has type \((a \rightarrow b) \rightarrow (b \rightarrow c) \rightarrow (a \rightarrow c)\) in the empty environment: in other words, to synthesize the composition operator from its type. The task is difficult because of the fast growth of the search space, in which LSC drowns, but SMBC manages well.

Overall, SMBC appears to be well balanced and to have good results both on problems that require computations and on problems where pruning of impossible cases is critical. Given the simplicity of our implementation, we believe these results are promising, and that SMBC occupies a sweet spot between handling computations well and traversing the search space in a smart way.

10 Conclusion

After describing a new technique for finding models in a logic of computable functions and datatypes, we presented ways of extending the language and described a working implementation. By combining symbolic evaluation with SAT-based conflict analysis, the approach is aimed at difficult problems where the search space is large (e.g., because of parallel disjunction and independent sub-problems) and large amounts of computations must be performed before discovering failure. It can be described as a spiritual heir to evaluation-driven narrowing [14] that replaces traditional exploration of the space of possible inputs by conflict driven clause learning. We hope that this work will benefit model finders in proof assistants, in particular Nunchaku [9, 17].