1 Introduction

Boolean satisfiability (SAT) has gathered applications in bounded model checking of hardware and software systems [5, 7, 51], classical planning [35] and scheduling [27]. Despite the worst-case hardness of SAT, the past few decades have witnessed a significant improvement in the runtime performance of the state-of-the-art SAT solvers [41]. This improvement has led to the usage of SAT solvers as oracles to handle problems whose complexity lies beyond NP. Among these problems, constrained sampling, that concerns with sampling from the space of solutions of a set of constraints F, subject to a user-defined weight function W, has witnessed a surge of interest owing to the wide range of applications ranging from machine learning, probabilistic reasoning, software and hardware verification to statistical physics [3, 32, 39, 45].

Not surprisingly, the problem of sampling is known to be computationally intractable. When the weight function W is fixed to a uniform distribution, the problem of constrained sampling is also known as uniform sampling. Uniform sampling has witnessed a long-standing interest from theoreticians and practitioners alike [4, 33, 38, 45]. The past few years, however, have witnessed a significant improvement in the runtime performance of the sampling tools when the weight function W is fixed to a uniform distribution owing to the rise of hashing-based paradigm [2, 11, 13, 22]. While the significant progress for uniform sampling has paved the way for its usage in constrained random simulation [45], the restriction of uniform distribution is limiting, and several applications of constrained sampling require the underlying techniques to be able to handle a wide variety of distributions and related problem formulations as listed below:

  • Literal-Weighted Sampling. In case of literal-weighted sampling, we consider the weight function over assignments defined as the product of the weight of literals, which is specified using a weight function \(W \! \left( \cdot \right) \) that assigns a non-negative weight to each literal l in a boolean formula F. As argued in [12], literal-weighted weight function suffices for most of the practical applications ranging from constrained random simulation, probabilistic reasoning, and reliability of power-grids [10, 14, 21, 45].

  • Projected Sampling. Typically, users define constraints in high-level modeling languages such as Verilog [1], Bayesian networks [14] and configuration of grids [21] and then \(\mathsf {CNF}\) encodings are employed to convert them into a \(\mathsf {CNF}\) [6]. Commonly used schemes like Tseitin encoding [50] introduce auxiliary variables during encoding; though the encoded formulas are equisatisfiable, they typically do not preserve the number of satisfying assignments. In particular, given an initial set of constraints G expressed over a set of variables X, we obtain another formula F such that \(G(X) = \exists Y F(X, Y)\). Therefore, we are concerned with sampling over solutions of F projected over a subset of variables (such as X in this case). In other words, we are concerned with sampling over \(\varSigma _1^1\) formulas.

  • Conditioned Sampling. Given a boolean formula \(\varPhi \) and a partial assignment \(\sigma \), conditioned sampling refers to sampling from the models of \(\varPhi \) that satisfy \(\sigma \). Conditioning has interesting applications in testing where one is interested in fuzzing the system with inputs that satisfy certain patterns (preconditions). Conditioning has been applied in the past for fault diagnosis [23], conformant planning [46] and databases [15].

Typically, practical applications require sampling techniques that can handle all the above formulations. While techniques based on interval propagation, binary decision diagrams and random perturbation of solution space [22, 25, 44] cannot handle projection, conditioned, and weighted sampling efficiently, the hashing-based techniques have significantly improved the scalability of sampling techniques and are capable of handling projection and literal-weighted scheme [11, 42]. However, the performance of hashing-based techniques is extremely limited in their ability to handle literal-weighted sampling, and one observes a drastic drop in their performance as the weight distribution shifts away from uniform. In this context, one wonders: whether it is possible to design techniques which can handle projection, conditioned, and literal-weighted sampling without degradation in their performance?

In this work, we answer the above question in affirmative: we extend our previously proposed knowledge compilation framework in the context of uniform sampling to handle all the three variants. We have implemented a prototype of our framework, named \(\mathsf {WAPS}\), and demonstrate that within a time limit of 1800 s, \(\mathsf {WAPS}\) performs better than the current state-of-the-art weighted and projected sampler \(\mathsf {WeightGen}\) [10], by up to 3 orders of magnitude in terms of runtime while achieving a geometric speedup of \(296{\times }\). Out of the 773 benchmarks available, \(\mathsf {WAPS}\) was able to sample from 588 benchmarks while \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks. Furthermore, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.

A significant advantage of our framework is its simplicity: we show that our previously proposed framework in the context of uniform sampling, \(\mathsf {KUS}\) [49], can be lifted to handle literal-weighted, projection and conditioned sampling. We demonstrate that unlike hashing-based techniques, the runtime performance of \(\mathsf {WAPS}\) is not dependent on the underlying weight distribution. We want to assert that the simplicity of our framework, combined with its runtime performance and its ability to be agnostic to the underlying distribution is a significant novel contribution to the area of constrained sampling. Besides, an important contribution of our work is the theoretical analysis of sampling techniques that employ knowledge compilation.

The rest of the paper is organized as follows. We first discuss the related work in Sect. 2. We then introduce notations and preliminaries in Sect. 3. In Sect. 4 we present \(\mathsf {WAPS}\) and do theoretical analysis of \(\mathsf {WAPS}\) in Sect. 5. We then describe the experimental methodology and discuss results in Sect. 6. Finally, we conclude in Sect. 7.

2 Related Work

Weighted sampling is extensively studied in the literature with the objective of providing scalability while ensuring strong theoretical guarantees. Markov Chain Monte Carlo (\(\mathsf {MCMC}\)) sampling [32, 40] is the most popular technique for weighted sampling; several algorithms like Metropolis-Hastings and simulated annealing have been extensively studied in the literature [36, 40]. While \(\mathsf {MCMC}\) based sampling is guaranteed to converge to a target distribution under mild requirements, convergence is often impractically slow [31]. The practical adaptations for \(\mathsf {MCMC}\)-based sampling in the context of constrained-random verification has been proposed in [37]. Unfortunately, practical \(\mathsf {MCMC}\) based sampling tools use heuristics that destroy the theoretical guarantees. Interval-propagation and belief networks have also been employed for sampling [20, 26, 29], but, though these techniques are scalable, the generated distributions can deviate significantly from the uniform distribution, as shown in [38].

To bridge the wide gap between scalable algorithms and those that give strong guarantees of uniformity several hashing-based techniques have been proposed [10, 11, 24, 28] for weighted sampling. The key idea behind hashing-based techniques is to employ random parity constraints as pairwise independent hash functions to partition the set of satisfying assignments of \(\mathsf {CNF}\) formula into cells. The hashing-based techniques have achieved significant runtime performance improvement in case of uniform sampling but their scalability suffers for weight distribution and depends strongly on parameters such as tilt, which are unlikely to be small for most practical distributions [42].

In recent past, a significant amount of work has been done to compile propositional theory, often represented as a propositional formula in \(\mathsf {CNF}\) into tractable knowledge representations. One of the prominent and earliest representations is Ordered Binary Decision Diagrams (OBDDs), which have been effectively used for circuit analysis and synthesis [9]. Another family of representations known as Deterministic Decomposable Negation Normal Form (d-DNNF) [19] have proved to be influential in many probabilistic reasoning applications [14, 17, 18]. Recently, another representation called as Sentential Decision Diagram (SDD) [16] was proposed which maintains canonicity and polytime support for boolean combinations and bridged the gap of succinctness between OBDDs and d-DNNFs. In our recent work [49], we were able to tackle the problem of uniform sampling by exploiting the properties of d-DNNF. Specifically, we were able to take advantage of recent advancements made in the field of knowledge compilation and use the compiled structure to generate uniform samples while competing with the state-of-the-art tools for uniform sampling.

3 Notations and Preliminaries

A literal is a boolean variable or its negation. A clause is a disjunction of a set of literals. A propositional formula F in conjunctive normal form (\(\mathsf {CNF}\)) is a conjunction of clauses. Let Vars(F) be the set of variables appearing in F. The set Vars(F) is called support of F. A satisfying assignment or witness of F, denoted by \(\sigma \), is an assignment of truth values to variables in its support such that F evaluates to true. We denote the set of all witnesses of F as \(R_{F}\). Let var(l) denote the variable of literal l, i.e., \(var(l) = var(\lnot l)\) and \(F_{|l}\) denotes the formula obtained when literal l is set to true in F. Given an assignment \(\sigma \) over Vars(F) and a set of variables \(P \subseteq Vars(F)\), define \(\sigma _P = \{l\) | \(l \in \sigma \), \(var(l) \in P\}\) and \(R_{F\downarrow P}\) to be the projection of \(R_{F}\) onto P, i.e., \(R_{F\downarrow P} = \{ \sigma _P | \sigma \in R_{F}\}\).

Given a propositional formula F and a weight function \(W(\cdot )\) that assigns a non-negative weight to every literal, the weight of assignment \(\sigma \) denoted as \(W(\sigma )\) is the product of weights of all the literals appearing in \(\sigma \), i.e., \(W(\sigma ) = \prod _{l \in \sigma } W(l)\). The weight of a set of assignments Y is given by \(W(Y) = \sum _{\sigma \in Y} W(\sigma )\). Note that, we have overloaded the definition of weight function \(W(\cdot )\) to support different arguments – a literal, an assignment and a set of assignments. We want to highlight that the assumption about weight distribution being generated solely by a literal-weighted function stands well, as many real-world applications like probabilistic inference can be efficiently reduced to literal-weighted sampling [14]. Also, for notational convenience, whenever the formula F, weight function W and sampling set P is clear from the context, we omit mentioning it.

3.1 Weighted and Projected Generators

A weighted and projected probabilistic generator is a probabilistic algorithm that generates a witness from \(R_{F\downarrow P}\) with respect to weight distribution generated by weight function W. A weighted and projected generator \(\mathcal {G}^{wp}(\cdot , \cdot , \cdot )\) is a probabilistic generator that guarantees

$$\begin{aligned} \forall y \in R_{F\downarrow P}, \mathsf {Pr}\left[ \mathcal {G}^{wp}(F, P, W) = y\right] = \frac{W(y)}{W(R_{F\downarrow P})}, \end{aligned}$$

An almost weighted and projected generator \(\mathcal {G}^{awp}(\cdot , \cdot , \cdot )\) relaxes this requirement, ensuring that: given a tolerance \( \varepsilon > 0\), \(\forall y \in R_{F\downarrow P}\) we have

$$\begin{aligned} \frac{W(y)}{(1 + \varepsilon )W(R_{F\downarrow P})} \le \mathsf {Pr}\left[ \mathcal {G}^{awp}(F, P, W) = y\right] \le \frac{(1 + \varepsilon )W(y)}{W(R_{F\downarrow P})}, \end{aligned}$$
Fig. 1.
figure 1

Example of d-DNNF

Fig. 2.
figure 2

The projected d-DNNF of Example 1

Probabilistic generators are allowed to occasionally “fail” in the sense that no witness may be returned even if \(R_{F\downarrow P}\) is non-empty. The failure probability for such generators must be bounded by a constant strictly less than 1.

3.2 Deterministic Decomposable Negation Normal Form (d-DNNF)

To formally define d-DNNF, we first define the Negation Normal Form (NNF):

Definition 1

[19]. Let X be the set of propositional variables. A sentence in NNF is a rooted, directed acyclic graph (DAG) where each leaf node i is labeled with true, false, x or \(\lnot x\), \(x \in X\); and each internal node is labeled with \(\vee \) or \(\wedge \) and can have arbitrarily many children.

d-DNNF further imposes that the representation is:

  • Deterministic: An NNF is deterministic if the operands of \(\vee \) in all well-formed boolean formula in the NNF are mutually inconsistent.

  • Decomposable: An NNF is decomposable if the operands of \(\wedge \) in all well-formed boolean formula in the NNF are expressed on a mutually disjoint set of variables.

The deterministic and decomposable properties are conveniently expressed by AND-OR graphs (DAGs) where a node is either an AND node, an OR node or a literal. The operands of AND/OR nodes appear as children of the node. Figure 1 shows an example of d-DNNF representation. For every node t, the subformula corresponding to t is the formula corresponding to d-DNNF obtained by removing all the nodes u such that there does not exist a path from t to u. T(t) represents the set of all partial satisfying assignments for the subformula corresponding to t. The siblings of a node t are the children of the parent of t excluding t itself and the set of such children is given by Siblings(t).

Decision-DNNF is a subset of d-DNNF where the deterministic OR nodes are decision nodes [18]. The state-of-the-art d-DNNF construction tools like C2D [18], \(\textsc {Dsharp}\) [43] and D4 [30], construct the Decision-DNNF representation where each OR node has exactly two children while an AND node may have multiple children. Since our framework \(\mathsf {WAPS}\) employs modern d-DNNF compilers, we assume that the OR node has exactly two children. This assumption is only for the simplicity of exposition as our algorithms can be trivially adopted to the general d-DNNF representations.

4 Algorithm

In this section, we discuss our primary technical contribution: \(\mathsf {WAPS}\), weighted and projected sampler that samples from \(R_{F\downarrow P}\) with respect to weight function W by employing the knowledge compilation techniques.

\(\mathsf {WAPS}\) takes a \(\mathsf {CNF}\) formula F, a set of sampling variables P, a function assigning weights to literals W and required number of samples s as inputs and returns \(\mathsf {SampleList}\), a list of size s which contain samples such that each sample is independently drawn from the weighted distribution generated by W over \(R_{F\downarrow P}\).

Similar to \(\mathsf {KUS}\), \(\mathsf {WAPS}\) (Algorithm 1) mainly comprises of three phases: Compilation, Annotation and Sampling. For d-DNNF compilation, \(\mathsf {WAPS}\) invokes a specialized compilation routine \(\mathsf {PCompile}\) over the formula F and the sampling set P (line 1). This is followed by the normalization of weights such that for any literal l, \(W'(l) + W'(\lnot l) = 1\), where \(W'\) is the normalized weight returned in line 2. Then for annotation, \(\mathsf {WAnnotate}\) is invoked in line 3 which uses the weight function \(W'\) to annotate weights to all the nodes of the d-DNNF tree. Finally, subroutine \(\mathsf {Sampler}\) (line 4) is invoked which returns s independently drawn samples over P following the weighted distribution generated by W over \(R_{F\downarrow P}\). We now describe these three phases in detail.

4.1 Compilation

The compilation phase is performed using the subroutine \(\mathsf {PCompile}\). \(\mathsf {PCompile}\) is a modified procedure over the component caching and clause learning based algorithm of the d-DNNF compiler \(\textsc {Dsharp}\) [43, 47]. It is presented in Algorithm 2. The changes from the existing algorithm are . The rest of the procedure which is similar to \(\textsc {Dsharp}\) is mentioned here for completeness. The description of \(\mathsf {PCompile}\) is as follows:

\(\mathsf {PCompile}\) takes in a \(\mathsf {CNF}\) formula F in the clausal form and a set of sampling variables P as input and returns a d-DNNF over P. If the formula does not contain any variable from P, \(\mathsf {PCompile}\) invokes \(\mathsf {SAT}\) (line 2) which returns a True node if the formula is satisfiable, else it returns a False node. Otherwise, \(\mathsf {DecideLiteral}\) is invoked to choose a literal appearing in F such that \(var(l) \in P\) (line 3). This decision is then recursively propagated by invoking \(\mathsf {CompileBranch}\) to create \(t_1\), the d-DNNF of \(F_{|l}\) (line 4) and \(t_2\), the d-DNNF of \(F_{|\lnot l}\) (line 5). \(\mathsf {Disjoin}\) is invoked in line 6 which takes \(t_1\) and \(t_2\) as input and returns \(t_2\) if \(t_1\) is False node, \(t_1\) if \(t_2\) is False node otherwise a new tree composed by an OR node as the parent of \(t_1\) and \(t_2\). The result of \(\mathsf {Disjoin}\) is then stored in the cache (line 7) and returned as an output of \(\mathsf {PCompile}\) (line 8).

We now discuss the subroutine \(\mathsf {CompileBranch}\). It is presented in Algorithm 3. It takes in a \(\mathsf {CNF}\) formula F, set of sampling variables P and literal l as input and returns a d-DNNF tree of \(F_{|l}\) on P. It first invokes \(\mathsf {BCP}\) (Binary Constraint Propagation), with F, l and P which performs unit-propagation to return a tuple of reduced formula \(F'\) and a set of implied literals (term) projected over variables in P (line 1). Then \(\mathsf {CompileBranch}\) checks if \(F'\) contains an empty clause and returns False node to indicate that \(F_{|l}\) is not satisfiable, else the formula is solved using component decomposition as described below.

At line 6 it breaks the formula \(F'\) into separate components formed by disjoint set of clauses such that no two components share any variables. Then each component is solved independently (lines 8–15). For each component, it first examines the cache to see if this component has been solved earlier and is present in the cache (line 9). If cache lookup fails, it solves the component with a recursive call to \(\mathsf {PCompile}\) (line 11). If any component is found to be unsatisfiable, False node is returned implying that the overall formula is unsatisfiable too, else \(\mathsf {CompileBranch}\) simply conjoins the components’ d-DNNFs together with the decided l and implied literals (term) and returns this after storing it in the cache for the formula \(F_{|l}\) (lines 16–18).

We illustrate \(\mathsf {PCompile}\) procedure on the following example formula F:

Example 1

\(F = \{\{x_1,x_2\}, \{\lnot x_3, \lnot x_5, x_6\}, \{\lnot x_2, x_4, \lnot x_1\}, \{x_3, \lnot x_6, \lnot x_1\}, \{x_6, x_5, \lnot x_1, x_3\}, \{x_3, x_6, \lnot x_5, \lnot x_1\}\}\)

Figure  2 represents the d-DNNF of Example 1 on \(P = \{x_1,x_2,x_3\}\). For detailed discussion about applying \(\mathsf {PCompile}\) on F please refer to Appendix.

figure b

4.2 Annotation

The subroutine \(\mathsf {WAnnotate}\) is presented in Algorithm 4. \(\mathsf {WAnnotate}\) takes in a d-DNNF \(\mathsf {dag}\) and a weight function W as inputs and returns an annotated d-DNNF \(\mathsf {dag}\) whose each node t is annotated with a weight, given by the sum of weights of all the partial assignments represented by subtree rooted at t. The weights subsequently annotated on children of an OR node indicate the probability with which it would be selected in the sampling phase.

\(\mathsf {WAnnotate}\) performs a bottom up traversal on d-DNNF \(\mathsf {dag}\) in a reverse topological order. For each node in the d-DNNF \(\mathsf {dag}\), \(\mathsf {WAnnotate}\) maintains an attribute, weight, as per the label of the node given as follows:

  • Literal (lines 23): The weight of a literal node is taken as the weight of the literal given by the weight function W.

  • OR (lines 48): The weight of OR node is made equal to the sum of weights of both of its children.

  • AND (lines 913): The weight of AND node is made equal to the product of weights of all its children.

figure c

4.3 Sampling

Algorithm \(\mathsf {Sampler}\) takes the annotated d-DNNF \(\mathsf {dag}\) and the required number of samples s and returns \(\mathsf {SampleList}\), a list of s samples conforming to the distribution of their weights as governed by weight function W given to \(\mathsf {WAnnotate}\). The subroutine \(\mathsf {Sampler}\) is very similar to the sampling procedure in our previous work [49] except that we take the annotated weight of the node instead of the annotated count in the previous work as the probability for Bernoulli trials. We refer the readers to Appendix for a detailed discussion.

4.4 Conditioned Sampling

The usage of samplers in testing environment necessitates sampling from F conditioned on fixing assignment to a subset of variables. The state of the art techniques, such as those based on universal hashing, treat every query as independent and are unable to reuse computation across different queries. In contrast, the compilation to d-DNNF allows \(\mathsf {WAPS}\) to reuse the same d-DNNF. In particular, for a given conditioning expressed as conjunction of different literals, i.e., \( \hat{C} = \bigwedge _{i} l_i\).

In particular, instead of modifying computationally expensive d-DNNF, we modify the weight function as follows:

$$\begin{aligned} \hat{W}(l) = {\left\{ \begin{array}{ll} 0, &{} l \notin \hat{C}\\ W(l) &{} otherwise \end{array}\right. } \end{aligned}$$

5 Theoretical Analysis

We now present theoretical analysis of \(\mathsf {WAPS}\), which consists of two components: correctness of \(\mathsf {WAPS}\) and analysis of behavior of \(\mathsf {Sampler}\) on the underlying d-DNNF graph. First, we prove that \(\mathsf {WAPS}\) is an exact weighted and projected sampler in Theorem 1. To this end, we prove the correctness of our projected d-DNNF \(\mathsf {dag}\) compilation procedure \(\mathsf {PCompile}\) in Lemma 1. In Lemma 2, we show that \(\mathsf {WAnnotate}\) annotates each node of the d-DNNF with weights that represent the weight of assignments represented by subtree rooted at that node. This enables us to sample as per the weight distribution in the sampling phase which is proved in Theorem 1 using Lemmas 1 and 2. Secondly, further probing into the behavior of subroutine \(\mathsf {Sampler}\), we provide an analysis of the probability of visiting any node in the d-DNNF \(\mathsf {dag}\) while sampling. For this, we first find a probability of visiting a node by following a particular path in Lemma 3 and then we use this result to prove an upper bound for the general case of visiting a node from all possible paths in Theorem 2. We believe that this analysis will motivate the researchers to find new ways to speed up or device new methods to find exact or approximate sampling techniques over a given compiled representation.

The proofs of Theorem 1 and Lemmas 12 and 3 can be found in Appendix.

Lemma 1

Given a formula F and set of sampling variables P, the tree returned by \(\mathsf {PCompile}\)(FP) is a d-DNNF \(\mathsf {dag}\) which represents the set of satisfying assignments of the formula F projected over the set of sampling variables P.

Lemma 2

Every node t in the d-DNNF \(\mathsf {dag}\) returned by \(\mathsf {WAnnotate}\) is annotated by W(T(t)), where T(t) is the set of all the partial assignments corresponding to the subtree rooted at t.

Theorem 1

For a given F, P and s, \(\mathsf {SampleList}\) is the list of samples generated by \(\mathsf {WAPS}\). Let \(\mathsf {SampleList}\)[i] indicate the sample at the \(i^{th}\) index of the list. Then for each \(y \in R_{F\downarrow P}\), \(\forall i \in [s]\), we have \(\mathsf {Pr}\left[ y = \mathsf {SampleList}[i]\right] = \frac{W(y)}{W(R_{F\downarrow P})}.\)

Lemma 3

For a given F and P, let \(\mathrm {fol}(\rho )\) be the event of following a path \(\rho \), which start at root and ends at node t, then \(\mathsf {Pr}\left[ \mathrm {fol}(\rho )\right] = \frac{W(T(t))\times c_\rho }{W(R_{F\downarrow P})}\) where \(c_\rho \) is the product of weight of all the OR nodes’ siblings encountered in the path \(\rho \) from root to t and T(t) is the set of all the partial satisfying assignments represented by subtree rooted at t.

Theorem 2

For a given F and P, let \(\mathrm {visit}(t)\) be the event of visiting a node t to fetch one sample as per subroutine \(\mathsf {Sampler}\), then \(\mathsf {Pr}\left[ \mathrm {visit}(t)\right] \le \frac{W(\varGamma (t))}{W(R_{F\downarrow P})}\) where \(\varGamma (t) = \{\sigma ~|~\sigma \in R_{F\downarrow P},~\sigma _{\downarrow Vars(t)} \in T(t)\}\) and T(t) is a set of all the partial satisfying assignments represented by subtree rooted at t.

Proof

In Lemma 3 we have calculated the probability of visiting a node t by taking a particular path from root to node t. So the probability of visiting a node t will be the sum of probability of visiting t by all possible paths. Let \(\mathcal {P} = \{\rho _1, \rho _2, \cdots , \rho _m\}\) be the set of all paths from root to node t and \(\mathrm {visit}(t)\) be the event of visiting a node t in subroutine \(\mathsf {Sampler}\) then,

$$\begin{aligned} \mathsf {Pr}\left[ \mathrm {visit}(t)\right] = \sum _{\rho \in \mathcal {P}} \mathsf {Pr}\left[ \mathrm {visit}(t)~|~\mathrm {fol}(\rho )\right] \times \mathsf {Pr}\left[ \mathrm {fol}(\rho )\right] = \sum _{\rho \in \mathcal {P}}1\times \mathsf {Pr}\left[ \mathrm {fol}(\rho )\right] \end{aligned}$$

From Lemma 3, \( \mathsf {Pr}\left[ \mathrm {visit}(t)\right] = \sum _{\rho \in \mathcal {P}} \frac{W(T(t))\times c_\rho }{W(R_{F\downarrow P})} \) where \(c_\rho \) is the product of the weight of all the OR nodes’ siblings encountered in a path \(\rho \) from root to t. For any such path, we call \(\{t^{1}_{\rho }, t^2_\rho , \cdots , t^n_\rho \}\) as the set of all the OR node siblings encountered on the path \(\rho \). Now, let \(\sigma ^{ext}_\rho \) be the set of assignments over P represented by path \(\rho \). Therefore,

$$\begin{aligned} \sigma ^{ext}_\rho = T(t^1_\rho ) \times T(t^2_\rho ) \cdots \times T(t^n_\rho ) \times T(t) \end{aligned}$$

where, \(T(\cdot )\) are set of assignments and \(\times \) is a cross product. Now, any tuple from \(\sigma ^{ext}_\rho \) represents a satisfying assignment in the d-DNNF. Therefore, \(\sigma ^{ext}_\rho \subseteq R_{F\downarrow P}\). Note that, from Lemma 2, it follows that weight annotated by \(\mathsf {WAnnotate}\) at t is equal to W(T(t)). Therefore,

$$\begin{aligned} c_\rho = W(T(t^1_\rho )) \times W(T(t^2_\rho )) \cdots \times W(T(t^n_\rho )) \end{aligned}$$

And, \( W(\sigma ^{ext}_\rho ) = W(T(t))\times c_\rho \). Notice that, \(\sigma ^{ext}_\rho \subseteq \varGamma (t)\) as \(\sigma ^{ext}_\rho \) represents satisfying extensions of partial assignments contained in T(t) itself. This is true \(\forall \) \(\rho \in \mathcal {P}\). Therefore as W(.) is an increasing function,

$$\begin{aligned} \bigcup \limits _{\rho \in \mathcal {P}} \sigma ^{ext}_\rho \quad \subseteq \quad \varGamma (t) \quad \quad \Longrightarrow \quad \quad \sum _{\rho \in \mathcal {P}} W(T(t))\times c_\rho \quad \le \quad W(\varGamma (t)) \end{aligned}$$

Note that, the inequality indeed holds as the intersection of sets of partial assignments represented by t and any other node not lying on the path from root to t may not be \(\phi \) (empty). Therefore,

$$\begin{aligned} \sum _{\rho \in \mathcal {P}} \frac{W(T(t))\times c_\rho }{W(R_{F\downarrow P})} \quad \le \quad \frac{W(\varGamma (t))}{W(R_{F\downarrow P})} \quad \Longrightarrow \quad \mathsf {Pr}\left[ \mathrm {visit}(t)\right] \quad \le \quad \frac{W(\varGamma (t))}{W(R_{F\downarrow P})} \end{aligned}$$

6 Evaluation

In order to evaluate the runtime performance and analyze the quality of samples generated by \(\mathsf {WAPS}\), we implemented a prototype in Python. For d-DNNF compilation, our prototype makes use of \(\textsc {Dsharp}\) [43] when sampling set is available else we use \(\mathsf {D4}\) [30]. We would have preferred to use state-of-the-art d-DNNF compiler \(\mathsf {D4}\) but owing to its closed source implementation, we could not modify it as per our customized compilation procedure \(\mathsf {PCompile}\). Therefore, for projected compilation, we have modified \(\textsc {Dsharp}\) which has an open-source implementation. We have conducted our experiments on a wide range of publicly available benchmarks. In all, our benchmark suite consisted of 773 benchmarks arising from a wide range of real-world applications. Specifically, we used constraints arising from DQMR networks, bit-blasted versions of SMT-LIB (SMT) benchmarks, and ISCAS89 circuits [8] with parity conditions on randomly chosen subsets of outputs and nextstate variables [34, 48]. We assigned random weights to literals wherever weights were not already available in our benchmarks. All our experiments were conducted on a high performance compute cluster whose each node consists of E5-2690 v3 CPU with 24 cores and 96 GB of RAM. We utilized single core per instance of benchmark with a timeout of 1800 s.

The objective of our evaluation was to answer the following questions:

  1. 1.

    How does \(\mathsf {WAPS}\) perform in terms of runtime in comparison to \(\mathsf {WeightGen}\), the current state-of-the-art weighted and projected sampler?

  2. 2.

    How does \(\mathsf {WAPS}\) perform for incremental sampling and scales when asked for different number of samples?

  3. 3.

    How does the distribution of samples generated by \(\mathsf {WAPS}\) compare with the distribution generated by an ideal weighted and projected sampler?

  4. 4.

    How does \(\mathsf {WAPS}\) perform for conditioning on arbitrary variables?

  5. 5.

    How does our knowledge compilation based sampling techniques perform in comparison to hashing based sampling techniques for the task of generalizing to arbitrary weight distributions?

Our experiment demonstrated that within a time limit of 1800 s, \(\mathsf {WAPS}\) is able to significantly outperform existing state-of-the-art weighted and projected sampler \(\mathsf {WeightGen}\), by up to 3 orders of magnitude in terms of runtime while achieving a geometric speedup of \(296{\times }\). Out of the 773 benchmarks available \(\mathsf {WAPS}\) was able to sample from 588 benchmarks while \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks. For incremental sampling, \(\mathsf {WAPS}\) achieves a geometric speedup of 3.69. Also, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested. Empirically, the distribution generated by \(\mathsf {WAPS}\) is statistically indistinguishable from that generated by an ideal weighted and projected sampler. Also, while performing conditioned sampling in \(\mathsf {WAPS}\), we incur no extra cost in terms of runtime in most of the cases. Moreover, the performance of our knowledge compilation based sampling technique is found to be oblivious to weight distribution. We present results for only a subset of representative benchmarks here. Detailed data along with the expanded versions of all the tables presented here is available at https://github.com/meelgroup/waps.

Table 1. Run time (in seconds) for 1000 samples

Number of Instances Solved. We compared the runtime performance of \(\mathsf {WAPS}\) with \(\mathsf {WeightGen}\) [10] (state-of-the-art weighted and projected sampler) by generating 1000 samples from each tool with a timeout of 1800 s. Figure 3 shows the cactus plot for \(\mathsf {WeightGen}\) and \(\mathsf {WAPS}\). We present the number of benchmarks on the \(x-\)axis and the time taken on \(y-\)axis. A point (xy) implies that x benchmarks took less than or equal to y seconds to sample. All our runtime statistics for \(\mathsf {WAPS}\) include the time for the knowledge compilation phase (via \(\mathsf {D4}\) or \(\textsc {Dsharp}\)). From all the 773 available benchmarks \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks while \(\mathsf {WAPS}\) was able to sample from 588 benchmarks. Table 1 shows the runtimes of some of the benchmarks on the two tools. The columns in the table give the benchmark name, number of variables, number of clauses, size of sampling set, time taken in seconds by \(\mathsf {WeightGen}\) and \(\mathsf {WAPS}\) divided into time taken by Compilation and A+S: Annotation and Sampling followed by speedup of \(\mathsf {WAPS}\) with respect to \(\mathsf {WeightGen}\). Table 1 clearly shows that \(\mathsf {WAPS}\) outperforms \(\mathsf {WeightGen}\) by upto 3 orders of magnitude. For all the 24 benchmarks that \(\mathsf {WeightGen}\) was able to solve \(\mathsf {WAPS}\) outperformed \(\mathsf {WeightGen}\) with a geometric speedup of \(296{\times }\).

Fig. 3.
figure 3

Cactus Plot comparing \(\mathsf {WeightGen}\) and \(\mathsf {WAPS}\).

Table 2. Runtimes (in sec.) of \(\mathsf {WAPS}\) for incremental sampling

Incremental Sampling. Incremental sampling involves fetching multiple, relatively small-sized samples until the objective (such as desired coverage or violation of property) is achieved. We benefit from pre-compiled knowledge representations in this scenario, as they allow us to perform repeated sampling as per varied distributions. If weights are changed, we simply Annotate the tree again followed by sampling, else, we directly move to the sampling phase, thus saving a significant amount of time by bypassing the compilation phase.

In our experiments, we have evaluated the time taken by \(\mathsf {WAPS}\) for 1000 samples in 10 successive calls with same weights. The results are presented in Table 2 for a subset of benchmarks. The first column mentions the benchmark name with the number of variables, clauses and size of sampling set in subsequent columns. The time taken by \(\mathsf {WAPS}\) for first run to fetch 1000 samples is given in the fifth column while the overall time taken for first run together with the subsequent 9 incremental runs is presented in sixth column. The final column shows the average gain in terms of speedup calculated by taking the ratio of time taken by \(\mathsf {WAPS}\) for first run with the average time taken by \(\mathsf {WAPS}\) for subsequent 9 incremental runs thus resulting in a total of 10000 samples. Overall, \(\mathsf {WAPS}\) achieves a geometric speedup of \(3.69{\times }\) on our set of benchmarks.

Table 3. Runtime (in sec.) of \(\mathsf {WAPS}\) to generate different size samples

Effect of Number of Samples. To check how \(\mathsf {WAPS}\) scales with different number of samples, we invoked \(\mathsf {WAPS}\) for fetching different number of samples: 1000, 2000, 4000, 8000, 10000 with a timeout of 1800 s. Table 3 presents the runtime of \(\mathsf {WAPS}\) for different samples on some benchmarks. The first column represents the benchmark name. Second, third and fourth columns represent the number of variables, clauses and size of sampling set. The next five columns represent the time taken by \(\mathsf {WAPS}\) for 1000, 2000, 4000, 8000 and 10000 samples. Table 3 clearly demonstrates that \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.

Uniform Sampling Generalized for Weighted Sampling. To explore the trend in performance between uniform and weighted sampling on the dimension of hashing based techniques pitched against our newly proposed sampling techniques based on knowledge compilation, we compared \(\mathsf {WAPS}\) to \(\mathsf {KUS}\) in a parallel comparison between \(\mathsf {WeightGen}\) and \(\mathsf {UniGen2}\). Specifically, we ran \(\mathsf {WAPS}\) for weighted sampling and \(\mathsf {KUS}\) for uniform sampling without utilizing the sampling set as \(\mathsf {KUS}\) does not support the sampling set. On the other hand, for hashing based sampling techniques, we compared \(\mathsf {WeightGen}\) to \(\mathsf {UniGen2}\) while using the sampling set. Figure 4 shows the cactus plot for \(\mathsf {WeightGen}\) and \(\mathsf {UniGen2}\) and Fig. 5 shows a cactus plot for \(\mathsf {WAPS}\) and \(\mathsf {KUS}\). From all the 773 benchmarks, \(\mathsf {WeightGen}\) was able to sample from only 24 benchmarks while \(\mathsf {UniGen2}\) was able to sample from 208 benchmarks. In comparison, \(\mathsf {WAPS}\) was able to sample from 606 benchmarks while \(\mathsf {KUS}\) was able to sample from 602 benchmarks. Our experiments demonstrated that the performance of hashing-based techniques is extremely limited in their ability to handle literal-weighted sampling and there is a drastic drop in their performance as the weight distribution shifts away from uniform. While for our knowledge compilation based sampling techniques we observe that their performance is oblivious to the weight distribution.

Distribution Comparison. We measure the distribution of \(\mathsf {WAPS}\) vis-a-vis an ideal weighted and projected sampler (\(\mathsf {IS}\)) and observed that \(\mathsf {WAPS}\) is statistically indistinguishable from \(\mathsf {IS}\). Please refer to Appendix for more detailed discussion.

Effect of Conditioning on Variables. We evaluated the performance of \(\mathsf {WAPS}\) in the context of conditioned sampling. We observed a slight improvement in average runtime as more and more variables get constrained. For detailed results, please refer to Appendix.

Fig. 4.
figure 4

Cactus Plot comparing \(\mathsf {WeightGen}\) and \(\mathsf {UniGen2}\).

Fig. 5.
figure 5

Cactus Plot comparing \(\mathsf {WAPS}\) and \(\mathsf {KUS}\).

7 Conclusion

In this paper, we designed a knowledge compilation-based framework, called \(\mathsf {WAPS}\), for literal-weighted, projected and conditional sampling. \(\mathsf {WAPS}\) provides strong theoretical guarantees and its runtime performance upon the existing state-of-the-art weighted and projected sampler WeightGen, by up to 3 orders of magnitude in terms of runtime. Out of the 773 benchmarks available, \(\mathsf {WAPS}\) is able to sample from 588 benchmarks while \(\mathsf {WeightGen}\) is only able to sample from 24 benchmarks. \(\mathsf {WAPS}\) achieves a geometric speedup of 3.69 for incremental sampling. It is worth noting that \(\mathsf {WeightGen}\) has weaker guarantees than \(\mathsf {WAPS}\). Furthermore, \(\mathsf {WAPS}\) is almost oblivious to the number of samples requested.