1 Introduction

Given a relational specification of input-output behaviour, synthesizing outputs as functions of inputs is a key step in several applications, viz. program repair [14], program synthesis [28], adaptive control [25] etc. The synthesis problem is, in general, uncomputable. However, there are practically useful restrictions that render the problem solvable, e.g., if all inputs and outputs are Boolean, the problem is computable in principle. Nevertheless, functional synthesis may still require formidable computational effort, especially if there are a large number of variables and the overall specification is complex. This motivates us to investigate techniques for Boolean functional synthesis that work well in practice.

Formally, let X be a sequence of m input Boolean variables, and Y be a sequence of n output Boolean variables. A relational specification is a Boolean formula \(\varphi (X, Y)\) that expresses a desired input-output relation. The goal in Boolean functional synthesis is to synthesize a function \(F: \{0,1\}^m\rightarrow \{0,1\}^n\) that satisfies the specification. Thus, for every value of X, if there exists some value of Y such that \(\varphi (X,Y)=1\), we must also have \(\varphi (X,F(X)) = 1\). For values of X that do not admit any value of Y such that \(\varphi (X,Y) = 1\), the value of F(X) is inconsequential. Such a function F is also referred to as a Skolem function for Y in \(\varphi (X,Y)\) [15, 22].

An interesting example of Boolean functional synthesis is the problem of integer factorization. Suppose \(Y_1\) and \(Y_2\) are n-bit unsigned integers, X is a 2n-bit unsigned integer and \(\times _{[n]}\) denotes n-bit unsigned multiplication. The relational specification \(\varphi _{\mathsf {fact}}(X, Y_1, Y_2) \equiv ((X=Y_1\times _{[n]} Y_2) \wedge (Y_1\ne 1) \wedge (Y_2\ne 1))\) specifies that \(Y_1\) and \(Y_2\) are non-trivial factors of X. This specification can be easily encoded as a Boolean relation. The corresponding synthesis problem requires us to synthesize the factors \(Y_1\) and \(Y_2\) as functions of X, whenever X is non-prime. Note that this problem is known to be hard, and the strength of several cryptographic systems rely on this hardness.

Existing approaches to Boolean functional synthesis vary widely in their emphasis, ranging from purely theoretical treatments (viz. [3, 6, 7, 10, 20, 23]) to those motivated by practical tool development (viz. [4, 11, 12, 15, 17, 18, 21, 22, 27,28,29]). A common aspect of these approaches is their focus on sequential algorithms for synthesis. In this paper, we present, to the best of our knowledge, the first parallel algorithm for Boolean functional synthesis. A key ingredient of our approach is a technique for solving the synthesis problem for a specification \(\varphi \) by composing solutions of synthesis problems corresponding to sub-formulas in \(\varphi \). Since Boolean functions are often represented using DAG-like structures (such as circuits, AIGs [16], ROBDDs [1, 8]), we assume w.l.o.g. that \(\varphi \) is given as a DAG. The DAG structure provides a natural decomposition of the original problem into sub-problems with a partial order of dependencies between them. We exploit this to design a parallel synthesis algorithm that has been implemented on a message passing cluster. Our initial experiments show that our algorithm significantly outperforms state-of-the-art techniques on several benchmarks.

Related Work: The earliest solutions to Boolean functional synthesis date back to Boole [6] and Lowenheim [20], who considered the problem in the context of Boolean unification. Subsequently, there have been several investigations into theoretical aspects of this problem (see e.g., [3, 7, 10, 23]). More recently, there have been attempts to design practically efficient synthesis algorithms that scale to much larger problem sizes. In [22], a technique to synthesize Y from a proof of validity of \(\forall X \exists Y \varphi (X,Y)\) was proposed. While this works well in several cases, not all specifications admit the validity of \(\forall X \exists Y \varphi (X,Y)\). For example, \(\forall X \exists Y \varphi _{\mathsf {fact}}(X,Y)\) is not valid in the factorization example. In [12, 29], a synthesis approach based on functional composition was proposed. Unfortunately, this does not scale beyond small problem instances [11, 15]. To address this drawback, a CEGAR based technique for synthesis from factored specifications was proposed in [15]. While this scales well if each factor in the specification depends on a small subset of variables, its performance degrades significantly if we have a few “large” factors, each involving many variables, or if there is significant sharing of variables across factors. In [21], Macii et al. implemented Boole’s and Lowenheim’s algorithms using ROBDDs and compared their performance on small to medium-sized benchmarks. Other algorithms for synthesis based on ROBDDs have been investigated in [4, 17]. A recent work [11] adapts the functional composition approach to work with ROBDDs, and shows that this scales well for a class of benchmarks with pre-determined variable orders. However, finding a good variable order for an arbitrary relational specification is hard, and our experiments show that without prior knowledge of benchmark classes and corresponding good variable orders, the performance of [11] can degrade significantly. Techniques using templates [28] or sketches [27] have been found to be effective for synthesis when we have partial information about the set of candidate solutions. A framework for functional synthesis, focused on unbounded domains such as integer arithmetic, was proposed in [18]. This relies heavily on tailor-made smart heuristics that exploit specific form/structure of the relational specification.

2 Preliminaries

Let \(X = (x_1, \ldots x_m)\) be the sequence of input variables, and \(Y = (y_1, \ldots y_n)\) be the sequence of output variables in the specification \(\varphi (X,Y)\). Abusing notation, we use X (resp. Y) to denote the set of elements in the sequence X (resp. Y), when there is no confusion. We use 1 and 0 to denote the Boolean constants true and false, respectively. A literal is either a variable or its complement. An assignment of values to variables satisfies a formula if it makes the formula true.

Fig. 1.
figure 1

DAG representing \(\varphi (X,Y)\)

We assume that the specification \(\varphi (X,Y)\) is represented as a rooted DAG, with internal nodes labeled by Boolean operators and leaves labeled by input/output literals and Boolean constants. If the operator labeling an internal node N has arity k, we assume that N has k ordered children. Figure 1 shows an example DAG, where the internal nodes are labeled by AND and OR operators of different arities. Each node N in such a DAG represents a Boolean formula \({\varPhi (N)}\), which is inductively defined as follows. If N is a leaf, \({\varPhi (N)}\) is the label of N. If N is an internal node labeled by \(\mathsf {op}\) with arity k, and if the ordered children of N are \(c_1, \ldots c_k\), then \({\varPhi (N)}\) is \(\mathsf {op}({\varPhi (c_1)}, \ldots {\varPhi (c_k)})\). A DAG with root R is said to represent the formula \({\varPhi (R)}\). Note that popular DAG representations of Boolean formulas, such as AIGs, ROBDDs and Boolean circuits, are special cases of this representation.

A k-ary Boolean function f is a mapping from \(\{0,1\}^k\) to \(\{0,1\}\), and can be viewed as the semantics of a Boolean formula with k variables. We use the terms “Boolean function” and “Boolean formula” interchangeably, using formulas mostly to refer to specifications. Given a Boolean formula \(\varphi \) and a Boolean function f, we use \(\varphi [y\mapsto f]\) to denote the formula obtained by substituting every occurrence of the variable y in \(\varphi \) with f. The set of variables appearing in \(\varphi \) is called the support of \(\varphi \). If f and g are Boolean functions, we say that f abstracts g and g refines f, if \(g \rightarrow f\), where \(\rightarrow \) denotes logical implication.

Given the specification \(\varphi (X, Y)\), our goal is to synthesize the outputs \(y_1, \ldots y_n\) as functions of X. Unlike some earlier work [5, 13, 22], we do not assume the validity of \(\forall X \exists Y~ \varphi (X,Y)\). Thus, we allow the possibility that for some values of X, there may be no value of Y that satisfies \(\varphi (X, Y)\). This allows us to accommodate some important classes of synthesis problems, viz. integer factorization. If \(y_1=f_1(X), \ldots y_n=f_n(X)\) is a solution to the synthesis problem, we say that \((f_1(X), \ldots f_n(X))\) realizes Y in \(\varphi (X, Y)\). For notational clarity, we simply use \((f_1, \ldots f_n)\) instead of \((f_1(X), \ldots f_n(X))\) when X is clear from the context.

In general, an instance of the synthesis problem may not have a unique solution. The following proposition, stated in various forms in the literature, characterizes the space of all solutions, when we have one output variable y.

Proposition 1

A function f(X) realizes y in \(\varphi (X, y)\) iff the following holds:

\(\varphi [y\mapsto 1] \wedge \lnot \varphi [y\mapsto 0] ~\rightarrow ~ f(X)\) and \(f(X) ~\rightarrow ~ \varphi [y\mapsto 1] \vee \lnot \varphi [y\mapsto 0]\).

As a corollary, both \(\varphi [y\mapsto 1]\) and \(\lnot \varphi [y\mapsto 0]\) realize y in \(\varphi (X,y)\). Proposition 1 can be easily extended when we have multiple output variables in Y. Let \(\sqsubseteq \) be a total ordering of the variables in Y, and assume without loss of generality that \(y_1 \sqsubseteq y_2 \sqsubseteq \cdots y_n\). Let \(\overrightarrow{F}\) denote the vector of Boolean functions \((f_1(X), \ldots f_n(X))\). For \(i \in \{1, \ldots n\}\), define \(\varphi ^{(i)}\) to be \(\exists y_1 \ldots \exists y_{i-1}\, \varphi \), and \(\varphi ^{(i)}_{\overrightarrow{F}}\) to be \((\cdots (\varphi ^{(i)}[y_{i+1}\mapsto f_{i+1}]) \cdots )[y_n\mapsto f_n]\), with the obvious modifications for \(i = 1\) (no existential quantification) and \(i = n\) (no substitution). The following proposition, once again implicit in the literature, characterizes the space of all solutions \({\overrightarrow{F}}\) that realize Y in \(\varphi (X,Y)\).

Proposition 2

The function vector \(\overrightarrow{F}= (f_1(X), \ldots f_n(X))\) realizes \(Y = (y_1, \ldots y_n)\) in \(\varphi (X, Y)\) iff the following holds for every \(i \in \{1, \ldots n\}\):

\(\varphi ^{(i)}_{\overrightarrow{F}}[y_i\mapsto 1] \wedge \lnot \varphi ^{(i)}_{\overrightarrow{F}}[y_i\mapsto 0] \rightarrow f_i(X)\), and \(f_i(X) \rightarrow \varphi ^{(i)}_{\overrightarrow{F}}[y_i\mapsto 1] \vee \lnot \varphi ^{(i)}_{\overrightarrow{F}}[y_i\mapsto 0]. \)

Propositions 1 and 2 are effectively used in [11, 12, 15, 29] to sequentially synthesize \(y_1, \ldots y_n\) as functions of X. Specifically, output \(y_1\) is first synthesized as a function \(g_1(X, y_2, \ldots y_n)\). This is done by treating \(y_1\) as the sole output and \(X \cup \{y_2,\ldots y_n\}\) as the inputs in \(\varphi (X, Y)\). By substituting \(g_1\) for \(y_1\) in \(\varphi \), we obtain \(\varphi ^{(2)} \equiv \exists y_1 \varphi (X, Y)\). Output \(y_2\) can then be synthesized as a function \(g_2(X, y_3, \ldots y_n)\) by treating \(y_2\) as the sole output and \(X \cup \{y_3, \ldots y_n\}\) as the inputs in \(\varphi ^{(2)}\). Substituting \(g_2\) for \(y_2\) in \(\varphi ^{(2)}\) gives \(\varphi ^{(3)} \equiv \exists y_1 \exists y_2\, \varphi (X, Y)\). This process is then repeated until we obtain \(y_n\) as a function \(g_n(X)\). The desired functions \(f_1(X), \ldots f_n(X)\) realizing \(y_1, \ldots y_n\) can now be obtained by letting \(f_n(X)\) be \(g_n(X)\), and \(f_i(X)\) be \((\cdots (g_i[y_{i+1}\mapsto f_{i+1}(X)]) \cdots )[y_n\mapsto f_n(X)]\), for all i from \(n-1\) down to 1. Thus, given \(\varphi (X,Y)\), it suffices to obtain \((g_1, \ldots g_n)\), where \(g_i\) has support \(X \cup \{y_{i+1}, \ldots y_n\}\), in order to solve the synthesis problem. We therefore say that \((g_1, \ldots g_n)\) effectively realizes Y in \(\varphi (X, Y)\), and focus on obtaining \((g_1, \ldots g_n)\).

Proposition 1 implies that for every \(i \in \{1, \ldots n\}\), the function \(g_i \equiv \varphi ^{(i)}[y_i\mapsto 1]\) realizes \(y_{i}\) in \(\varphi ^{(i)}\). With this choice for \(g_i\), it is easy to see that \(\exists y_i\, \varphi ^{(i)}\) (or \(\varphi ^{(i+1)}\)) can be obtained as \(\varphi ^{(i)}[y_i\mapsto g_i] = \varphi ^{(i)}[y_i\mapsto \varphi ^{(i)}[y_i\mapsto 1]]\). While synthesis using quantifier elimination by such self-substitution [11] has been shown to scale for certain classes of specifications with pre-determined optimized variable orders, our experience shows that this incurs significant overheads for general specifications with unknown “good” variable orders. An alternative technique for synthesis from factored specification was proposed by John et al. [15], in which initial abstractions of \(g_1, \ldots g_n\) are first computed quickly, and then a CEGAR-style [9] loop is used to refine these abstractions to correct Skolem functions. We use John et al.’s refinement technique as a black-box module in our work; more on this is discussed in Sect. 3.1.

Definition 1

Given a specification \(\varphi (X, Y)\), we define \({\varDelta _{y_i}}(\varphi )\) to be the formula \(\left( \lnot \exists y_1 \ldots y_{i-1}\, \varphi \right) [y_i \mapsto 0]\), and \({\varGamma _{y_i}}(\varphi )\) to be the formula \(\left( \lnot \exists y_1 \ldots y_{i-1}\, \varphi \right) [y_i \mapsto 1]\), for all \(i \in \{1, \ldots n\}\) Footnote 1. We also define \(\overrightarrow{\varDelta }(\varphi )\) and \(\overrightarrow{\varGamma }(\varphi )\) to be the vectors \(({\varDelta _{y_1}}(\varphi ), \ldots {\varDelta _{y_n}}(\varphi ))\) and \(({\varGamma _{y_1}}(\varphi ), \ldots {\varGamma _{y_n}}(\varphi ))\) respectively.

If N is a node in the DAG representation of the specification, we abuse notation and use \({\varDelta _{y_i}}(N)\) to denote \({\varDelta _{y_i}}({\varPhi (N)})\), and similarly for \({\varGamma _{y_i}}(N)\), \(\overrightarrow{\varDelta }(N)\) and \(\overrightarrow{\varGamma }(N)\). Furthermore, if both Y and N are clear from the context, we use \({\varDelta _{i}}\), \({\varGamma _{i}}\), \(\overrightarrow{\varDelta }\) and \(\overrightarrow{\varGamma }\) instead of \({\varDelta _{y_i}}(N)\), \({\varGamma _{y_i}}(N)\), \(\overrightarrow{\varDelta }(N)\) and \(\overrightarrow{\varGamma }(N)\), respectively. It is easy to see that the supports of both \({\varGamma _{i}}\) and \({\varDelta _{i}}\) are (subsets of) \(X \cup \{y_{i+1}, \ldots y_n\}\). Furthermore, it follows from Definition 1 that whenever \({\varGamma _{i}}\) (resp. \({\varDelta _{i}}\)) evaluates to 1, if the output \(y_i\) has the value 1 (resp. 0), then \(\varphi \) must evaluate to 0. Conversely, if \({\varGamma _{i}}\) (resp. \({\varDelta _{i}}\)) evaluates to 0, it doesn’t hurt (as far as satisfiability of \(\varphi (X,Y)\) is concerned) to assign the value 1 (resp. 0) to output \(y_i\). This suggests that both \(\lnot {\varGamma _{i}}\) and \({\varDelta _{i}}\) suffice to serve as the function \(g_i(X, y_{i+1}, \ldots y_n)\) when synthesizing functions for multiple output variables. The following proposition, adapted from [15], follows immediately, where we have abused notation and used \(\lnot \overrightarrow{\varGamma }\) to denote \((\lnot {\varGamma _{1}}, \ldots \lnot {\varGamma _{n}})\).

Proposition 3

Given a specification \(\varphi (X, Y)\), both \(\overrightarrow{\varDelta }\) and \(\lnot \overrightarrow{\varGamma }\) effectively realize Y in \(\varphi (X,Y)\).

Proposition 3 shows that it suffices to compute \(\overrightarrow{\varDelta }\) (or \(\overrightarrow{\varGamma }\)) from \(\varphi (X,Y)\) in order to solve the synthesis problem. In the remainder of the paper, we show how to achieve this compositionally and in parallel by first computing refinements of \({\varDelta _{i}}\) (resp. \({\varGamma _{i}}\)) for all \(i \in \{1,\ldots n\}\), and then using John et al.’s CEGAR-based technique [15] to abstract them to the desired \({\varDelta _{i}}\) (resp. \({\varGamma _{i}}\)). Throughout the paper, we use \(\delta _{i}\) and \(\gamma _{i}\) to denote refinements of \({\varDelta _{i}}\) and \({\varGamma _{i}}\) respectively.

3 Exploiting Compositionality

Given a specification \(\varphi (X,Y)\), one way to synthesize \(y_1, \ldots y_n\) is to decompose \(\varphi (X,Y)\) into sub-specifications, solve the synthesis problems for the sub-specifications in parallel, and compose the solutions to the sub-problems to obtain the overall solution. A DAG representation of \(\varphi (X,Y)\) provides a natural recursive decomposition of the specification into sub-specifications. Hence, the key technical question relates to compositionality: how do we compose solutions to synthesis problems for sub-specifications to obtain a solution to the synthesis problem for the overall specification? This problem is not easy, and no state-of-the-art tool for Boolean functional synthesis uses such compositional reasoning.

Our compositional solution to the synthesis problem is best explained in three steps. First, for a simple, yet representationally complete, class of DAGs representing \(\varphi (X,Y)\), we present a lemma that allows us to do compositional synthesis at each node of such a DAG. Next, we show how to use this lemma to design a parallel synthesis algorithm. Finally, we extend our lemma, and hence the scope of our algorithm, to significantly more general classes of DAGs.

3.1 Compositional Synthesis in AND-OR DAGs

For simplicity of exposition, we first consider DAGs with internal nodes labeled by only AND and OR operators (of arbitrary arity). Figure 1 shows an example of such a DAG. Note that this class of DAGs is representationally complete for Boolean specifications, since every specification can be expressed in negation normal form (NNF). In the previous section, we saw that computing \({\varDelta _{i}}(\varphi )\) or \({\varGamma _{i}}(\varphi )\) for all i in \(\{1, \ldots n\}\) suffices for purposes of synthesis. The following lemma shows the relation between \({\varDelta _{i}}\) and \({\varGamma _{i}}\) at an internal node N in the DAG and the corresponding formulas at the node’s children, say \(c_1, \ldots c_k\).

Lemma 1

(Composition Lemma). Let \(\varPhi (N) = \mathsf {op}(\varPhi (c_1),\ldots , \varPhi (c_k))\), where \(\mathsf {op}=\vee \) or \(\mathsf {op}=\wedge \). Then, for each \(1\le i\le n\):

$$\begin{aligned} \left( \bigwedge _{j=1}^k {\varDelta _{i}}(c_j)\right) \leftrightarrow {\varDelta _{i}}(N) \quad {and}\quad \left( \bigwedge _{j=1}^k {\varGamma _{i}}(c_j)\right) \leftrightarrow {\varGamma _{i}}(N)\,\, {if} \,\, \mathsf {op}= \vee \end{aligned}$$
(1)
$$\begin{aligned} \left( \bigvee _{j=1}^k {\varDelta _{i}}(c_j)\right) \rightarrow {\varDelta _{i}}(N) \quad {and}\quad \left( \bigvee _{j=1}^k {\varGamma _{i}}(c_j)\right) \rightarrow {\varGamma _{i}}(N) \,\, {if} \,\, \mathsf {op}= \wedge \end{aligned}$$
(2)

The proof of this lemma can be found in [2]. Thus, if N is an OR-node, we obtain \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\) directly by conjoining \({\varDelta _{i}}\) and \({\varGamma _{i}}\) at its children. However, if N is an AND-node, disjoining the \({\varDelta _{i}}\) and \({\varGamma _{i}}\) at its children only gives refinements of \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\) (see Eq. (2)). Let us call these refinements \(\delta _{i}(N)\) and \(\gamma _{i}(N)\) respectively. To obtain \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\) exactly at AND-nodes, we must use the CEGAR technique developed in [15] to iteratively abstract \(\delta _{i}(N)\) and \(\gamma _{i}(N)\) obtained above. More on this is discussed below.

A CEGAR step involves constructing, for each i from 1 to n, a Boolean error formula \(\mathsf {Err}_{\delta _{i}}\) (resp. \(\mathsf {Err}_{\gamma _{i}}\)) such that the error formula is unsatisfiable iff \(\delta _{i}(N) \leftrightarrow {\varDelta _{i}}(N)\) (resp. \(\gamma _{i}(N) \leftrightarrow {\varGamma _{i}}(N)\)). A SAT solver is then used to check the satisfiability of the error formula. If the formula is unsatisfiable, we are done; otherwise the satisfying assignment can be used to further abstract the respective refinement. This check-and-abstract step is then repeated in a loop until the error formulas become unsatisfiable. Following the approach outlined in [15], it can be shown that if we use \(\mathsf {Err}_{\delta _{i}} ~\equiv ~ \lnot \delta _{i} ~\wedge ~ \bigwedge _{j=1}^{i}\left( y_j \leftrightarrow \delta _{j}\right) ~\wedge ~ \lnot \varphi \) and \(\mathsf {Err}_{\gamma _{i}} ~\equiv ~ \lnot \gamma _{i} ~\wedge ~ \bigwedge _{j=1}^{i}\left( y_j \leftrightarrow \lnot \gamma _{j}\right) ~\wedge ~ \lnot \varphi \), and perform CEGAR in order from \(i = 1\) to \(i=n\), it suffices to gives us \({\varDelta _{i}}\) and \({\varGamma _{i}}\). For details of the CEGAR implementation, the reader is referred to [15]. The above discussion leads to a straightforward algorithm Compute (shown as Algorithm 1) that computes \(\overrightarrow{\varDelta }(N)\) and \(\overrightarrow{\varGamma }(N)\) for a node N, using \(\overrightarrow{\varDelta }(c_j)\) and \(\overrightarrow{\varGamma }(c_j)\) for its children \(c_j\). Here, we have assumed access to a black-box function Perform_Cegar that implements the CEGAR step.

figure a

3.2 A Parallel Synthesis Algorithm

The DAG representation of \(\varphi (X,Y)\) gives a natural, recursive decomposition of the specification, and also defines a partial order of dependencies between the corresponding synthesis sub-problems. Algorithm Compute can be invoked in parallel on nodes in the DAG that are not ordered w.r.t. this partial order, as long as Compute has already been invoked on their children. This suggests a simple parallel approach to Boolean functional synthesis. Algorithm ParSyn, shown below, implements this approach, and is motivated by a message-passing architecture. We consider a standard manager-worker configuration, where one out of available m cores acts as the manager, and the remaining \(m-1\) cores act as workers. All communication between the manager and workers is assumed to happen through explicit send and receive primitives.

figure b

The manager uses a queue Q of ready-to-process nodes. Initially, Q is initialized with the leaf nodes in the DAG, and we maintain the invariant that all nodes in Q can be processed in parallel. If there is an idle worker W and if Q is not empty, the manager assigns the node N at the front of Q to worker W for processing. If N is an internal DAG node, the manager also sends \(\overrightarrow{\varDelta }(c_j)\) and \(\overrightarrow{\varGamma }(c_j)\) for every child \(c_j\) of N to W. If there are no idle workers or if Q is empty, the manager waits for a worker, say \(W'\), to finish processing its assigned node, say \(N'\). When this happens, the manager stores the result sent by \(W'\) as \(\overrightarrow{\varDelta }(N')\) and \(\overrightarrow{\varGamma }(N')\). It then inserts one or more parents \(N''\) of \(N'\) in the queue Q, if all children of \(N''\) have been processed. The above steps are repeatedly executed at the manager until all DAG nodes have been processed. The job of a worker W is relatively simple: on being assigned a node N, and on receiving \(\overrightarrow{\varDelta }(c_j)\) and \(\overrightarrow{\varGamma }(c_j)\) for all children \(c_j\) of N, it simply executes Algorithm Compute on N and returns \(\left( \overrightarrow{\varDelta }(N), \overrightarrow{\varGamma }(N)\right) \).

Note that Algorithm ParSyn is guaranteed to progress as long as all workers complete processing the nodes assigned to them in finite time. The partial order of dependencies between nodes ensures that when all workers are idle, either all nodes have already been processed, or at least one unprocessed node has \(\overrightarrow{\varDelta }\) and \(\overrightarrow{\varGamma }\) computed for all its children, if any.

3.3 Extending the Composition Lemma and Algorithms

So far, we have considered DAGs in which all internal nodes were either AND- or OR-nodes. We now extend our results to more general DAGs. We do this by generalizing the Composition Lemma to arbitrary Boolean operators. Specifically, given the refinements \(\delta _{i}(c_j)\) and \(\gamma _{i}(c_j)\) at all children \(c_j\) of a node N, we show how to compose these to obtain \(\delta _{i}(N)\) and \(\gamma _{i}(N)\), when N is labeled by an arbitrary Boolean operator. Note that the CEGAR technique discussed in Sect. 3.1 can be used to abstract the refinements \(\delta _{i}\) and \(\gamma _{i}\) to \({\varDelta _{i}}\) and \({\varGamma _{i}}\) respectively, at any node of interest. Therefore, with our generalized Composition Lemma, we can use compositional synthesis for specifications represented by general DAGs, even without computing \({\varDelta _{i}}\) and \({\varGamma _{i}}\) exactly at all DAG nodes. This gives an extremely powerful approach for parallel, compositional synthesis.

Let \({\varPhi (N)} = \mathsf {op}({\varPhi (c_1)}, \ldots {\varPhi (c_r)})\), where \(\mathsf {op}\) is an r-ary Boolean operator. For convenience of notation, we use \(\lnot N\) to denote \(\lnot {\varPhi (N)}\), and similarly for other nodes, in the subsequent discussion. Suppose we are given \(\delta _{i}(c_j)\), \(\gamma _{i}(c_j)\), \(\delta _{i}(\lnot c_j)\) and \(\gamma _{i}(\lnot c_j)\), for \(1 \le j \le r\) and for \(1 \le i \le n\). We wish to compose these appropriately to compute \(\delta _{i}(N)\), \(\gamma _{i}(N)\), \(\delta _{i}(\lnot N)\) and \(\gamma _{i}(\lnot N)\) for \(1 \le i \le n\). Once we have these refinements, we can adapt Algorithm 1 to work for node N, labeled by an arbitrary Boolean operator \(\mathsf {op}\).

To understand how composition works for \(\mathsf {op}\), consider the formula \(\mathsf {op}(z_1, \ldots z_r)\), where \(z_1, \ldots z_r\) are fresh Boolean variables. Clearly, \({\varPhi (N)}\) can be viewed as \((\cdots (\mathsf {op}(z_1, \ldots z_r)[z_1\mapsto {\varPhi (c_1)}]) \cdots )[z_r\mapsto {\varPhi (c_r)}]\). For simplicity of notation, we write \(\mathsf {op}\) instead of \(\mathsf {op}(z_1,\ldots ,z_r)\) in the following discussion. W.l.o.g., let \(z_1 \prec z_2 \prec \cdots \prec z_r\) be a total ordering of the variables \(\{z_1, \ldots z_r\}\). Given \(\prec \), suppose we compute the formulas \(\delta _{z_l}(\mathsf {op})\), \(\gamma _{z_l}(\mathsf {op})\), \(\delta _{z_l}(\lnot \mathsf {op})\) and \(\gamma _{z_l}(\lnot \mathsf {op})\) in negation normal form (NNF), for all \(l \in \{1, \ldots r\}\). Note that these formulas have support \(\{z_{l+1}, \ldots z_r\}\), and do not have variables in \(X \cup Y\) in their support. We wish to ask if we can compose these formulas with \(\delta _{i}(c_j)\), \(\gamma _{i}(c_j)\), \(\delta _{i}(\lnot c_j)\) and \(\gamma _{i}(\lnot c_j)\) for \(1 \le j \le r\) to compute \(\delta _{i}(N)\), \(\gamma _{i}(N)\), \(\delta _{i}(\lnot N)\) and \(\gamma _{i}(\lnot N)\), for all \(i \in \{1, \ldots n\}\). It turns out that we can do this.

Recall that in NNF, negations appear (if at all) only on literals. Let \(\varUpsilon _{l, \mathsf {op}}\) be the formula obtained by replacing every literal \(\lnot z_s\) in the NNF of \(\gamma _{z_l}(\mathsf {op})\) with a fresh variable \(\overline{z_s}\). Similarly, let \(\varOmega _{l, \mathsf {op}}\) be obtained by replacing every literal \(\lnot z_s\) in the NNF of \(\delta _{z_l}(\mathsf {op})\) with the fresh variable \(\overline{z_s}\). The definitions of \(\varUpsilon _{l, \lnot \mathsf {op}}\) and \(\varOmega _{l, \lnot \mathsf {op}}\) are similar. Replacing \(\lnot z_s\) by a fresh variable \(\overline{z_s}\) allows us to treat the literals \(z_s\) and \(\lnot z_s\) independently in the NNF of \(\gamma _{z_l}(\mathsf {op})\) and \(\delta _{z_l}(\mathsf {op})\). The ability to treat these independently turns out to be important when formulating the generalized Composition Lemma. Let \(\left( \varUpsilon _{l,\mathsf {op}}\left[ z_s\mapsto \delta _{i}(\lnot c_{s})\right] \left[ \overline{z_s}\mapsto \delta _{i}(c_{s})\right] \right) _{s=l+1}^r\) denote the formula obtained by substituting \(\delta _{i}(\lnot c_s)\) for \(z_s\) and \(\delta _{i}(c_s)\) for \(\overline{z_s}\), for every \(s \in \{l+1, \ldots r\}\), in \(\varUpsilon _{l,\mathsf {op}}\). The interpretation of \(\left( \varOmega _{l,\mathsf {op}}\left[ z_s\mapsto \delta _{i}(\lnot c_{s})\right] \left[ \overline{z_s}\mapsto \delta _{i}(c_{s})\right] \right) _{s=l+1}^r\) is analogous. Our generalized Composition Lemma can now be stated as follows.

Lemma 2

(Generalized Composition Lemma). Let \({\varPhi (N)} = \mathsf {op}({\varPhi (c_1)}, \ldots {\varPhi (c_r)})\), where \(\mathsf {op}\) is an r-ary Boolean operator. For each \(1\le i\le n\) and \(1\le l \le r\):

$$\begin{aligned} 1.~&\delta _{i}(c_l) \wedge \left( \varOmega _{l,\mathsf {op}}\left[ z_s\mapsto \delta _{i}(\lnot c_{s})\right] \left[ \overline{z_s}\mapsto \delta _{i}(c_{s})\right] \right) _{s=l+1}^r ~\rightarrow ~ {\varDelta _{i}}(N) \\ 2.~&\delta _{i}(\lnot c_l) \wedge \left( \varUpsilon _{l,\mathsf {op}}\left[ z_s\mapsto \delta _{i}(\lnot c_{s})\right] \left[ \overline{z_s}\mapsto \delta _{i}(c_{s})\right] \right) _{s=l+1}^r ~\rightarrow ~ {\varDelta _{i}}(N) \\ 3.~&\gamma _{i}(c_l) \wedge \left( \varOmega _{l,\mathsf {op}}\left[ z_s\mapsto \gamma _{i}(\lnot c_{s})\right] \left[ \overline{z_s}\mapsto \gamma _{i}(c_{s})\right] \right) _{s=l+1}^r ~\rightarrow ~ {\varGamma _{i}}(N) \\ 4.~&\gamma _{i}(\lnot c_l) \wedge \left( \varUpsilon _{l,\mathsf {op}}\left[ z_s\mapsto \gamma _{i}(\lnot c_{s})\right] \left[ \overline{z_s}\mapsto \gamma _{i}(c_{s})\right] \right) _{s=l+1}^r ~\rightarrow ~ {\varGamma _{i}}(N) \end{aligned}$$

If we replace \(\mathsf {op}\) by \(\lnot \mathsf {op}\) above, we get refinements of \({\varDelta _{i}}(\lnot N)\) and \({\varGamma _{i}}(\lnot N)\).

The reader is referred to [2] for a proof of Lemma 2. We simply illustrate the idea behind the lemma with an example here. Suppose \({\varPhi (N)} = {\varPhi (c_1)} \wedge \lnot {\varPhi (c_2)} \wedge (\lnot {\varPhi (c_3)} \vee {\varPhi (c_4)})\), where each \({\varPhi (c_j)}\) is a Boolean function with support \(X \cup \{y_1, \ldots y_n\}\). We wish to compute a refinement of \({\varDelta _{i}}(N)\), using refinements of \({\varDelta _{i}}(c_j)\) and \({\varDelta _{i}}(\lnot c_j)\) for \(j \in \{1,\ldots 4\}\). Representing N as \(\mathsf {op}(c_1, c_2, c_3, c_4)\), let \(z_1, \ldots z_4\) be fresh Boolean variables, not in \(X \cup \{y_1, \ldots y_n\}\); then \(\mathsf {op}(z_1, z_2, z_3, z_4) = z_1 \wedge \lnot z_2 \wedge (\lnot z_3 \vee z_4)\). For ease of exposition, assume the ordering \(z_1 \prec z_{2} \prec z_{3} \prec z_{4}\). By definition, \({\varDelta _{z_2}}(\mathsf {op}) = \left( \lnot \exists z_1\, (z_1 \wedge \lnot z_2 \wedge (\lnot z_3 \vee z_4))\right) [z_2\mapsto 0]\) \(=\) \(z_3 \wedge \lnot z_4\), and suppose \(\delta _{z_2}(\mathsf {op}) = {\varDelta _{z_2}}(\mathsf {op})\). Replacing \(\lnot z_{4}\) by \(\overline{z_{4}}\), we then get \(\varOmega _{2,\mathsf {op}}=z_{3}\wedge \overline{z_{4}}\).

Recalling the definition of \(\delta _{z_2}(\cdot )\), if we set \(z_{3}=1\), \(z_{4}=0\) and \(z_2=0\), then \(\mathsf {op}\) must evaluate to 0 regardless of the value of \(z_1\). By substituting \(\delta _{i}(\lnot c_{3})\) for \(z_{3}\) and \(\delta _{i}(c_{4})\) for \(\overline{z_{4}}\) in \(\varOmega _{2,\mathsf {op}}\), we get the formula \(\delta _{i}(\lnot c_{3}) \wedge \delta _{i}(c_{4})\). Denote this formula by \(\chi \) and note that its support is \(X \cup \{y_{i+1}, \ldots y_n\}\). Note also from the definition of \(\delta _{i}(\cdot )\) that if \(\chi \) evaluates to 1 for some assignment of values to \(X \cup \{y_{i+1}, \ldots y_n\}\) and if \(y_i = 0\), then \(\lnot {\varPhi (c_{3})}\) evaluates to 0 and \({\varPhi (c_{4})}\) evaluates to 0, regardless of the values of \(y_1, \ldots y_{i-1}\). This means that \(z_{3} = 1\) and \(z_{4} = 0\), and hence \(\delta _{z_2}(\mathsf {op}) = 1\). If \(z_2\) (or \({\varPhi (c_2)}\)) can also be made to evaluate to 0 for the same assignment of values to \(X \cup \{y_i, y_{i+1}, \ldots y_n\}\), then \(N = \mathsf {op}(c_1, \ldots c_r)\) must evaluate to 0, regardless of the values of \(\{y_1, \ldots y_{i-1}\}\). Since \(y_i = 0\), values assigned to \(X \cup \{y_{i+1}, \ldots y_n\}\) must therefore be a satisfying assignment of \({\varDelta _{i}}(N)\). One way of having \({\varPhi (c_{2})}\) evaluate to 0 is to ensure that \({\varDelta _{i}}(c_2)\) evaluates to 1 for the same assignment of values to \(X \cup \{ y_{i+1}, \ldots y_n\}\) that satisfies \(\chi \). Therefore, we require the assignment of values to \(X \cup \{ y_{i+1}, \ldots y_n\}\) to satisfy \(\chi \wedge {\varDelta _{i}}(c_2)\), or even \(\chi \wedge \delta _{i}(c_{2})\). Since \(\chi = \delta _{i}(\lnot c_{3}) \wedge \delta _{i}(c_{4})\), we get \(\delta _{i}(c_2) \wedge {\delta _{i}(\lnot c_{3})} \wedge \delta _{i}(c_{4})\) as a refinement of \({\varDelta _{i}}(N)\).

Applying the Generalized Composition Lemma: Lemma 2 suggests a way of compositionally obtaining \(\delta _{i}(N)\), \(\gamma _{i}(N)\), \(\delta _{i}(\lnot N)\) and \(\gamma _{i}(\lnot N)\) for an arbitrary Boolean operator \(\mathsf {op}\). Specifically, the disjunction of the left-hand sides of implications (1) and (2) in Lemma 2, disjoined over all \(l \in \{1, \ldots r\}\) and over all total orders (\(\prec \)) of \(\{z_1, \ldots z_r\}\), gives a refinement of \({\varDelta _{i}}(N)\). A similar disjunction of the left-hand sides of implications (3) and (4) in Lemma 2 gives a refinement of \({\varGamma _{i}}(N)\). The cases of \({\varDelta _{i}}(\lnot N)\) and \({\varGamma _{i}}(\lnot N)\) are similar. This suggests that for each operator \(\mathsf {op}\) that appears as label of an internal DAG node, we can pre-compute a template of how to compose \(\delta _{i}\) and \(\gamma _{i}\) at the children of the node to obtain \(\delta _{i}\) and \(\gamma _{i}\) at the node itself. In fact, pre-computing this template for \(\mathsf {op}= \vee \) and \(\mathsf {op}= \wedge \) by disjoining as suggested above, gives us exactly the left-to-right implications, i.e., refinements of \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\), as given by Lemma 1. We present templates for some other common Boolean operators like if-then-else in [2].

Once we have pre-computed templates for composing \(\delta _{i}\) and \(\gamma _{i}\) at children of a node N to get \(\delta _{i}(N)\) and \(\gamma _{i}(N)\), we can use these pre-computed templates in Algorithm 1, just as we did for AND-nodes. This allows us to apply compositional synthesis on general DAG representations of Boolean relational specifications.

Optimizations Using Partial Computations: Given \(\delta _{i}\) and \(\gamma _{i}\) at children of a node N, we have shown above how to compute \(\delta _{i}(N)\) and \(\gamma _{i}(N)\). To compute \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\) exactly, we can use the CEGAR technique outlined in Sect. 3.1. While this is necessary at the root of the DAG, we need not compute \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\) exactly at each intermediate node. In fact, the generalized Composition Lemma allows us to proceed with \(\delta _{i}(N)\) and \(\gamma _{i}(N)\). This suggests some optimizations: (i) Instead of using the error formulas introduced in Sect. 3.1, that allow us to obtain \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\) exactly, we can use the error formula used in [15]. The error formula of [15] allows us to obtain some Skolem function for \(y_i\) (not necessarily \({\varDelta _{i}}(N)\) or \(\lnot {\varGamma _{i}}(N)\)) using the sub-specification \({\varPhi (N)}\) corresponding to node N. We have found CEGAR based on this error formula to be more efficient in practice, while yielding refinements of \({\varDelta _{i}}(N)\) and \({\varGamma _{i}}(N)\). In fact, we use this error formula in our implementation. (ii) We can introduce a timeout parameter, such that \(\overrightarrow{\varDelta }(N),\overrightarrow{\varGamma }(N)\) are computed exactly at each internal node until timeout happens. Subsequently, for the nodes still under process, we can simply combine \(\delta _{i}\) and \(\gamma _{i}\) at their children using our pre-computed composition templates, and not invoke CEGAR at all. The only exception to this is at the root node of the DAG where CEGAR must be invoked.

4 Experimental Results

Experimental Methodology. We have implemented Algorithm 2 with the error formula from [15] used for CEGAR in Algorithm 1 (in function Perform_Cegar), as described at the end of Sect. 3.3. We call this implementation \({\mathsf {ParSyn}}\) in this section, and compare it with the following algorithms/tools: (i) \({\mathsf {CSk}}\): This is based on the sequential algorithm for conjunctive formulas, presented in [15]. For non-conjunctive formulas, the algorithm in [15], and hence \({\mathsf {CSk}}\), reduces to [12, 29]. (ii) \({\textsf {RSynth}}\): The RSynth tool as described in [11]. (iii) \({\textsf {Bloqqer}}\): As prescribed in [22], we first generate special QRAT proofs using the preprocessing tool bloqqer, and then generate Boolean function vectors from the proofs using the \({\textsf {qrat-trim}}\) tool.

Our implementation of \({\mathsf {ParSyn}}\), available online at [26], makes extensive use of the ABC [19] library to represent and manipulate Boolean functions as AIGs. We also use the default SAT solver provided by ABC, which is a variant of MiniSAT. We present our evaluation on three different kinds of benchmarks.

  1. 1.

    Disjunctive Decomposition Benchmarks: Similar to [15], these benchmarks were generated by considering some of the larger sequential circuits in the HWMCC10 benchmark suite, and formulating the problem of disjunctively decomposing each circuit into components as a problem of synthesizing a vector of Boolean functions. Each generated benchmark is of the form \(\exists Y \varphi (X,Y)\) where \(\exists X (\exists Y \varphi (X,Y))\) is \(\textsf {true}\). However, unlike [15], where each benchmark (if not already a conjunction of factors) had to be converted into factored form using Tseitin encoding (which introduced additional variables), we have used these benchmarks without Tseitin encoding.

  2. 2.

    Arithmetic Benchmarks: These benchmarks were taken from the work described in [11]. Specifically, the benchmarks considered are floor, ceiling, decomposition, equalization and intermediate (see [11] for details).

  3. 3.

    Factorization Benchmarks: We considered the integer factorization problem for different bit-widths, as discussed in Sect. 1.

For each arithmetic and factorization benchmark, we first specified the problem instance as an SMT formula and then used Boolector [24] to generate the Boolean version of the benchmark. For each arithmetic benchmark, three variants were generated by varying the bit-width of the arguments of arithmetic operators; specifically, we considered bit-widths of 32, 128 and 512. Similarly, for the factorization benchmark, we generated four variants, using 8, 10, 12 and 16 for the bit-width of the product. Further, as \({\textsf {Bloqqer}}\) requires the input to be in qdimacs format and \({\textsf {RSynth}}\) in cnf format, we converted each benchmark into qdimacs and \(\texttt {cnf}\) formats using Tseitin encoding [30]. All benchmarks and the procedure by which we generated them are detailed in [26].

Variable Ordering: We used the same ordering of variables for all algorithms. For each benchmark, the variables are ordered such that the variable which occurs in the transitive fan-in of the least number of nodes in the AIG representation of the specification, appears at the top. For \({\textsf {RSynth}}\) this translated to an interleaving of most of the input and output variables.

Machine Details: All experiments were performed on a message-passing cluster, where each node had 20 cores and 64 GB main memory, each core being a 2.20 GHz Intel Xeon processor. The operating system was Cent OS 6.5. For \({\mathsf {CSk}}\), \({\textsf {Bloqqer}}\), and \({\textsf {RSynth}}\), a single core on the cluster was used. For all comparisons, \({\mathsf {ParSyn}}\) was executed on 4 nodes using 5 cores each, so that we had both intra-node and inter-node communication. The maximum time given for execution was 3600 s, i.e., 1 h. We also restricted the total amount of main memory (across all cores) to be 16 GB. The metric used to compare the different algorithms was the time taken to synthesize Boolean functions.

Results. Our benchmark suite consisted of 27 disjunctive decomposition benchmarks, 15 arithmetic benchmarks and 4 factorization benchmarks. These benchmarks are fairly comprehensive in size i.e., the number of AIG nodes (|SZ|) in the benchmark, and the number of variables (|Y|) for which Boolean functions are to be synthesized. Amongst disjunctive decomposition benchmarks, |SZ| varied from 1390 to 58752 and |Y| varied from 21 to 205. Amongst the arithmetic benchmarks, |SZ| varied from 442 to 11253 and |Y| varied from 31 to 1024. The factorization benchmarks are the smallest and the most complex of the benchmarks, with |SZ| varying from 122 to 502 and |Y| varying from 8 to 16.

We now present the performance of the various algorithms. On 4 of the 46 benchmarks, none of the tools succeeded. Of these, 3 belonged to the intermediate problem type in the arithmetic benchmarks, and the fourth one was the 16 bit factorization benchmark.

Fig. 2.
figure 2

Legend: Ar: arithmetic, Fa: factorization, Dd: disjunctive decomposition. FL: benchmarks for which the corresponding algorithm was unsuccessful.

Effect of the Number of Cores. For this experiment, we chose 5 of the larger benchmarks. Of these, two benchmarks belonged to the disjunctive decomposition category, two belonged to the arithmetic benchmark category and one was the 12 bit factorization benchmark. The number of cores was varied from 2 to 25. With 2 cores, \({\mathsf {ParSyn}}\) behaves like a sequential algorithm with one core acting as the manager and the other as the worker with all computation happening at the worker core. Hence, with 2 cores, we see the effect of compositionality without parallelism. For number of cores > 2, the number of worker cores increase, and the computation load is shared across the worker cores.

Figure 2a shows the results of our evaluation. The topmost points indicated by FL are instances for which \({\mathsf {ParSyn}}\) timed out. We can see that, for all 5 benchmarks, the time taken to synthesize Boolean function vectors when the number of cores is 2 is considerable; in fact, \({\mathsf {ParSyn}}\) times out on three of the benchmarks. When we increase the number of cores we observe that (a) by synthesizing in parallel, we can now solve benchmarks for which we had timed out earlier, and (b) speedups of about 4–5 can be obtained with 5–15 cores. From 15 cores to 25 cores, the performance of the algorithm, however, is largely invariant and any further increase in cores does not result in further speed up.

To understand this, we examined the benchmarks and found that their AIG representations have more nodes close to the leaves than to the root (similar to the DAG in Fig. 1). The time taken to process a leaf or a node close to a leaf is typically much less than that for a node near the root. Furthermore, the dependencies between the nodes close to the root are such that at most one or two nodes can be processed in parallel leaving most of the cores unutilized. When the number of cores is increased from 2 to 5–15, the leaves and the nodes close to the leaves get processed in parallel, reducing the overall time taken by the algorithm. However, the time taken to process the nodes close to the root remains more or less the same and starts to dominate the total time taken. At this point, even if the number of cores is further increased, it does not significantly reduce the total time taken. This behaviour limits the speed-ups of our algorithm. For the remaining experiments, the number of cores used for \({\mathsf {ParSyn}}\) was 20.

\({\mathsf {\mathbf {ParSyn}}}\) vs \({\mathsf {\mathbf {CSk}}}\) : As can be seen from Fig. 2b, \({\mathsf {CSk}}\) ran successfully on only 12 of the 46 benchmarks, whereas \({\mathsf {ParSyn}}\) was successful on 39 benchmarks, timing out on 6 benchmarks and running out of memory on 1 benchmark. Of the benchmarks that \({\mathsf {CSk}}\) was successful on, 9 belonged to the arithmetic category, 2 to the factorization and 1 to the disjunctive decomposition category. On further examination, we found that factorization and arithmetic benchmarks (except the intermediate problems) were conjunctive formulae whereas disjunctive decomposition benchmarks were arbitrary Boolean formulas. Since \({\mathsf {CSk}}\) has been specially designed to handle conjunctive formulas, it is successful on some of these benchmarks. On the other hand, since disjunctive decomposition benchmarks are not conjunctive, \({\mathsf {CSk}}\) treats the entire formula as one factor, and the algorithm reduces to [12, 29]. The performance hit is therefore not surprising; it has been shown in [15] and [11] that the algorithms of [12, 29] do not scale to large benchmarks that are not conjunctions of small factors. In fact, among the disjunctive decomposition benchmarks, \({\mathsf {CSk}}\) was successful on only the smallest one.

Fig. 3.
figure 3

Legend: Ar: arithmetic, Fa: factorization, Dd: disjunctive decomposition. FL: benchmarks for which the corresponding algorithm was unsuccessful.

\({\mathsf {\mathbf {ParSyn}}}\) vs \({{\textsf {\textit{RSynth}}}}\) : As seen in Fig. 3a, \({\textsf {RSynth}}\) was successful only on 3 of the 46 benchmarks; it timed out on 37 and ran out of memory on 6 benchmarks. The 3 benchmarks that RSynth was successful on were the smaller factorization benchmarks. Note that the arithmetic benchmarks used in [11] are semantically the same as the ones used in our experiments. In [11], custom variable orders were used to construct the ROBDDs, which resulted in compact ROBDDs. In our case, we use the variable ordering heuristic mentioned earlier, and include the considerable time taken to build BDDs from cnf representation. As mentioned in Sect. 1, if we know a better variable ordering, then the time taken can potentially reduce. However, we may not know the optimal variable order for an arbitrary specification in general. We also found the memory footprint of \({\textsf {RSynth}}\) to be higher as indicated by the memory-outs. This is not surprising, as \({\textsf {RSynth}}\) uses BDDs to represent Boolean formulas and it is well-known that BDDs can have large memory requirements.

\({\mathsf {\mathbf {ParSyn}}}\) vs \({{\textsf {\textit{Bloqqer}}}}\) : Since \({\textsf {Bloqqer}}\) cannot synthesize Boolean functions for formulas wherein \(\forall X \exists Y \varphi (X,Y)\) is not valid, we restricted our comparison to only the disjunctive decomposition and arithmetic benchmarks, totalling 42 in number. From Fig. 3b, we can see that \({\textsf {Bloqqer}}\) successfully synthesizes Boolean functions for 25 of the 42 benchmarks. For several benchmarks for which it is successful, it outperforms \({\mathsf {ParSyn}}\). In line 14 of Algorithm 1, Perform_Cegar makes extensive use of the SAT solver, and this is reflected in the time taken by \({\mathsf {ParSyn}}\). However, for the remaining 17 benchmarks, \({\textsf {Bloqqer}}\) gave a Not Verified message indicating that it could not synthesize Boolean functions for these benchmarks. In comparison, \({\mathsf {ParSyn}}\) was successful on most of these benchmarks.

Effect of Timeouts on \({\mathsf {\mathbf {ParSyn}}}\) . Finally, we discuss the effect of the timeout optimization discussed in Sect. 3.3. Specifically, for 60 s (value set through a timeout parameter), starting from the leaves of the AIG representation of a specification, we synthesize exact Boolean functions for DAG nodes. After timeout, on the remaining intermediate nodes, we do not invoke the CEGAR step at all, except at the root node of the AIG.

This optimization enabled us to handle 3 more benchmarks, i.e., \({\mathsf {ParSyn}}\) with this optimization synthesized Boolean function vectors for all the equalization benchmarks (in <340 s). Interestingly, \({\mathsf {ParSyn}}\) without timeouts was unable to solve these problems. This can be explained by the fact that in these benchmarks many internal nodes required multiple iterations of the CEGAR loop to compute exact Boolean functions, which were, however, not needed to compute the solution at the root node.

5 Conclusion and Future Work

In this paper, we have presented the first parallel and compositional algorithm for complete Boolean functional synthesis from a relational specification. A key feature of our approach is that it is agnostic to the semantic variabilities of the input, and hence applies to a wide variety of benchmarks. In addition to the disjunctive decomposition of graphs and the arithmetic operation benchmarks, we considered the combinatorially hard problem of factorization and attempted to generate a functional characterization for it. We found that our implementation outperforms existing tools in a variety of benchmarks.

There are many avenues to extend our work. First, the ideas for compositional synthesis that we develop in this paper could potentially lead to parallel implementations of other synthesis tools, such as that described in [11]. Next, the factorization problem can be generalized to synthesis of inverse functions for classically hard one-way functions, as long as the function can be described efficiently by a circuit/AIG. Finally, we would like to explore improved ways of parallelizing our algorithm, perhaps exploiting features of specific classes of problems.