1 Background and Introduction

Given a Boolean formula \(\varphi \), the problem of model counting, also referred to as #SAT, is to compute the number of solutions of \(\varphi \). Model counting is a fundamental problem in computer science with a wide range of applications ranging from quantified information flow, reliability of networks, probabilistic programming, Bayesian networks, and others [4, 5, 10, 16, 21,22,23].

Given the computational intractability of #SAT, attention has been focused on the approximation of #SAT [28, 30]. In a breakthrough result, Stockmeyer provided a hashing-based randomized approximation scheme for counting that makes polynomially many invocations of an \(\mathsf {NP}\) oracle [27]. The procedure, however, was computationally prohibitive in practice at that time, and no practical tools existed based on Stockmeyer’s proposed algorithmic framework until the early 2000s [16]. Motivated by the success of \(\mathsf {SAT}\) solvers, there has been a surge of interest in the design of hashing-based techniques for approximate model counting in the past decade [8, 9, 13, 15, 24, 25].

The core idea of the hashing-based framework is to employ pairwise independent hash functionsFootnote 1 to partition the solution space into roughly equal-sized small cells, wherein a cell is called small if it has solutions less than or equal to a pre-computed threshold, denoted by \(thresh\). A \(\mathsf {SAT}\) solver is employed to check if a cell is small by enumerating solutions one-by-one until either there are no more solutions or we have already enumerated \(thresh+1\) solutions. The current state of the art techniques can be broadly classified into two categories:

  • The first category of techniques, henceforth called Cat1, consists of techniques that compute a constant factor approximation by setting \(thresh\) to a constant and use Stockmeyer’s technique of constructing multiple copies of the input formula. [1, 2, 12, 29, 31]

  • The second class of techniques, henceforth called Cat2, consists of techniques that directly compute an \((\varepsilon , \delta )\)-estimate by setting \(thresh\) to \( \mathcal {O}(\frac{1}{\varepsilon ^2})\), and hence invoking the underlying \(\mathsf {NP}\) oracle \(\mathcal {O}(\frac{1}{\varepsilon ^2})\) times [7,8,9, 20, 21, 24, 25].

The current state of the art technique, measured by runtime performance, is \(\mathsf {ApproxMC3}\), which falls into the class of Cat2 techniques [25]. The proofs of correctness for all the hashing-based techniques involve the use of concentration bounds due to pairwise independent hash functions.

The standard construction of pairwise independent hash functions employed in these techniques can be expressed as a conjunction of XOR constraints such that every variable is chosen with probability \(f = 1/2\) for each XORs. As such, each XOR contains, on an average, n/2 variables. A \(\mathsf {SAT}\) solver is invoked to enumerate solutions of the formula \(\varphi \) in conjunction with these XOR constraints. The performance of \(\mathsf {SAT}\) solvers, however, degrades with an increase in the size of XORs [15]. Therefore recent efforts have focused on the design of hash functions where each variable is chosen with probability \(f < 1/2\) [1, 2, 11, 14, 17]. We refer to the XOR constructed with \(f=1/2\) as dense XORs while those constructed with \(f< 1/2\) as sparse XORs. In particular, given a hash function, h and cell \(\alpha \), the random variable of interest, denoted by \(\mathsf {Cnt}_{\langle \varphi ,h,\alpha \rangle }\) is the number of solutions of \(\varphi \) that h maps to cell \(\alpha \). The pairwise independence of dense XORs is known to bound the variance of \(\mathsf {Cnt}_{\langle \varphi ,h,\alpha \rangle }\) by the expectation of \(\mathsf {Cnt}_{\langle \varphi ,h,\alpha \rangle }\), which is sufficient for their usage for both Cat1 and Cat2 techniques.

In a significant result, Asteris and Dimakis [3], and Zhao et al. [31] showed that \(f = \mathcal {O}(\log n/n)\) asymptotically suffices for Cat1 techniques. It is worth pointing that \(f = \mathcal {O}(\log n/n)\) provides weaker guarantees on the variance of \(\mathsf {Cnt}_{\langle \varphi ,h,\alpha \rangle }\) as compared to the case when \(f=1/2\). However, Zhao et al. showed that the weaker guarantees are sufficient for Cat1 techniques with only polynomial overhead on the time complexity. Furthermore, Zhao et al. provided necessary and sufficient conditions on the required asymptotic value of f and proposed a new algorithm \(\mathsf {SparseCount}\) that uses the proposed family of hash functions. One would expect that the result of Zhao et al. would settle the quest for efficient hash functions. However, upon closer examination, few questions have been left unanswered in Zhao et al.’s work and subsequent follow-up studies [1, 9, 21].

  1. 1.

    Can the hash function constructed by Zhao et al. be used for Cat2 techniques, in particular for state of the art hashing-based techniques like \(\mathsf {ApproxMC3}\)?

  2. 2.

    In practice, can the overhead due to the weakness of theoretical guarantees of sparse XORs proposed by Zhao et al. be compensated by the gain of performance due to sparse XORs in the runtime of \(\mathsf {SparseCount}\)?

  3. 3.

    Is the runtime performance of \(\mathsf {SparseCount}\) competitive to that of \(\mathsf {ApproxMC3}\)? The reader may observe that Zhao et al.’s paper does not compare their proposed algorithm for \((\varepsilon ,\delta )\)-guarantees, called \(\mathsf {SparseCount}\), with state of the art algorithms at that time such as \(\mathsf {ApproxMC2}\), which is now in its third version, \(\mathsf {ApproxMC3}\) [25]. Therefore the question of whether the proposed sparse XORs are efficient in runtime was not settled. It is perhaps worth remarking that another line of work based on the construction of sparse XORs using low-density parity codes is known to introduce significant slowdown [1, 2] (See Section 9 of [1]).

The primary contribution of this paper is a rigorous theoretical and empirical analysis to understand the effect of sparse XORs for approximate model counters. In particular, we make the following key contributions:

  1. 1.

    We prove that the bounds obtained by Zhao et al., which are the strongest known bounds at this point, for the variance of \({\mathsf {Cnt}_{\langle \varphi ,h,\alpha \rangle }}\), are still too weak for the analysis of \(\mathsf {ApproxMC3}\). To the best of our knowledge, this is the first time the need for stronger bounds in the context of Cat2 techniques has been identified.

  2. 2.

    Since the weakness of bounds prevents usage of sparse hash functions in \(\mathsf {ApproxMC3}\), we design the most efficient algorithm, to the best of our knowledge, using sparse hash functions. To this end, we propose an improvement of \(\mathsf {SparseCount}\), called \(\mathsf {SparseCount2}\), that reduces the number of \(\mathsf {SAT}\) calls from linear to logarithmic and significantly improves the runtime performance of \(\mathsf {SparseCount}\). The improvement from linear to logarithmic uses the idea of prefix-slicing introduced by Chakraborty, Meel, and Vardi [9] for \(\mathsf {ApproxMC2}\).

  3. 3.

    We next present a rigorous empirical study involving a benchmark suite totaling over 1800 instances of runtime performance of \(\mathsf {SparseCount2}\) vis-a-vis the state of the art approximate counting technique, \(\mathsf {ApproxMC3}\). Surprisingly and contrary to current beliefs, we discover that \(\mathsf {ApproxMC3}\), which uses dense XORs significantly outperforms \(\mathsf {SparseCount2}\) for every benchmark. It is worth remarking that both \(\mathsf {ApproxMC3}\) and \(\mathsf {SparseCount2}\) use identical \(\mathsf {SAT}\) solver for underlying \(\mathsf {SAT}\) calls and similar to other hashing-based techniques, over 99% for each of the algorithms is indeed consumed by the underlying \(\mathsf {SAT}\) solver.

Given the surprising nature of our results, few words are in order. First of all, our work identifies the tradeoffs involved in the usage of sparse hash functions and demonstrates that the variance bounds offered by sparse hash functions are too weak to be employed in the state of the art techniques. Secondly, our work demonstrates that the weakness of variance bounds leads to such a large overhead that the algorithms using sparse hash functions scale much worse compared to the algorithms without sparse XORs. Thirdly and finally, we believe the negative results showcase that the question of the usage of sparse XORs to achieve scalability while providing strong theoretical guarantees is still wide open. In an upcoming work, Meel AkshayFootnote 2 [20] define a new family of hash functions, called concentrated hashing, and provide a new construction of sparse hash functions belonging to concentrated hashing, and design a new algorithmic framework on top of \(\mathsf {ApproxMC}\), which is shown to achieve runtime improvements.

The rest of the paper is organized as follows. We discuss notations and preliminaries in Sect. 2. We then discuss the weakness of guarantees offered by sparse XORs in Sect. 3. In Sect. 4, we seek to design an efficient algorithm that utilizes all the advancements, to the best of our knowledge, in approximate model counting community. We present a rigorous empirical study comparing performance of \(\mathsf {SparseCount}\), \(\mathsf {SparseCount2}\), and \(\mathsf {ApproxMC3}\) in Sect. 5 and conclude in Sect. 6.

2 Preliminaries and Notations

Let \(\varphi \) be a Boolean formula in conjunctive normal form (CNF), and let \(\mathsf {Vars }(\varphi )\) be the set of variables appearing in \(\varphi \). The set \(\mathsf {Vars }(\varphi )\) is also called the support of \(\varphi \). Unless otherwise stated, we will use n to denote the number of variables in \(\varphi \) i.e., \(|\mathsf {Vars }(\varphi )|\). An assignment of truth values to the variables in \(\mathsf {Vars }(\varphi )\) is called a satisfying assignment or witness of \(\varphi \) if it makes \(\varphi \) evaluate to true. We denote the set of all witnesses of \(\varphi \) by \(R_{\varphi }\).

We write \(\mathsf {Pr}\left[ \mathcal {Z} \right] \) to denote the probability of outcome \(\mathcal {Z}\). The expected value of \(\mathcal {Z}\) is denoted \(\mathsf {E}\left[ \mathcal {Z}\right] \) and its variance is denoted \(\sigma ^2\left[ \mathcal {Z}\right] \).

The propositional model counting problem is to compute \(|R_{\varphi }|\) for a given CNF formula \(\varphi \). A probably approximately correct (\(\mathsf {PAC}\)) counter is a probabilistic algorithm \({\mathsf {ApproxCount}}(\cdot , \cdot ,\cdot )\) that takes as inputs a formula F, a tolerance \(\varepsilon >0\), and a confidence parameter \(\delta \in (0, 1]\), and returns a count c with \((\varepsilon ,\delta )\)-guarantees, i.e., \(\mathsf {Pr}\Big [|R_{\varphi }|/(1+\varepsilon ) \le c \le (1+\varepsilon )|R_{\varphi }|\Big ] \ge 1-\delta \).

In this work, we employ a family of universal hash functions. Let \(H(n,m) \triangleq \{ h:\{0,1\}^{n} \rightarrow \{0,1\}^m \}\) be a family of hash functions mapping \(\{0,1\}^n\) to \(\{0,1\}^m\). We use \(h \xleftarrow {R} H\) to denote the probability space obtained by choosing a function h uniformly at random from H.

In this work, we will use the concept of prefix-slicing introduced by Chakraborty et al. [9]. For \(h \in H(n,m)\), formally, for every \(j \in \{1, \ldots ,m\}\), the \(j^{th}\) prefix-slice of h, denoted \(h^{(j)}\), is a map from \(\{0,1\}^{n}\) to \(\{0,1\}^j\), such that \(h^{(j)}(y)[i] = h(y)[i]\), for all \(y \in \{0,1\}^{n}\) and for all \(i \in \{1, \ldots j\}\). Similarly, the \(j^{th}\) prefix-slice of \(\alpha \), denoted \(\alpha ^{(j)}\), is an element of \(\{0,1\}^m\) such that \(\alpha ^{(j)}[i] = \alpha [i]\) for all \(i \in \{1, \ldots j\}\). The randomness in the choices of h and \(\alpha \) induce randomness in the choices of \(h^{(m)}\) and \(\alpha ^{(m)}\). However, the \((h^{(j)}, \alpha ^{(j)})\) pairs chosen for different values of j are no longer independent. Specifically, \(h^{(k)}(y)[i] = h^{(\ell )}(y)[i]\) and \(\alpha ^{(k)}[i] = \alpha ^{(l)}[i]\) for \(1 \le k \le \ell \le m\) and for all \(i \in \{1, \ldots k\}\).

For a formula \(\varphi \), \(h \in H(n,m)\), and \(\alpha \in \{0,1\}^{m}\), we define \(\mathsf {Cnt}_{\langle F,h^{(i)},\alpha ^{(i)} \rangle } := |\{y\in R_{\varphi } \mid h^{(i)}(y) = \alpha ^{(i)} \}|\), i.e. the number of solutions of \(\varphi \) mapped to \(\alpha ^{(i)}\) by \(h^{(i)}\). For the sake of notational clarity, whenever \(h^{(i)}\) and \(\alpha ^{(i)}\) are clear from the context, we will use \(\mathsf {Cnt}_{\langle i \rangle }\) as a shorthand for \(\mathsf {Cnt}_{\langle F,h^{(i)},\alpha ^{(i)} \rangle }.\)

Definition 1

 [6] A family of hash functions H(nm) is pairwise independent (also known as strongly 2-universal) if \(\forall \) \(\alpha _1, \alpha _2 \in \{0,1\}^m\), \(\forall \) distinct \(y_1, y_2 \in \{0,1\}^n\), \(h \xleftarrow {R} H\), we have \(\mathsf {Pr}[h(y_1) = \alpha _1 \wedge h(y_2) = \alpha _2] = \frac{1}{2^{2m}}\).

Definition 2

Let \(A \in \{0,1\}^{m \times n}\) be a random matrix whose entries are Bernoulli i.i.d. random variables such that \(f_i = \mathsf {Pr}\left[ A[i,j] = 1\right] \) for all \(j \in [n]\). Let \(b \in \{0,1\}^{m}\) be chosen uniformly at random, independently from A. Let \(h_{A,b}(y) = Ay+b\) and \(H^{\{f_i\}}(n,m)\) = \( \{h_{A,b}\) : \(\{0,1\}^{n} \rightarrow \) \(\{0,1\}^{m}\}\), where \( h_{A,b}\) \(\xleftarrow {R}\) \(H^{\{f_i\}}(n,m)\) is chosen randomly according to this process. Then, \(H^{\{f_i\}}(n,m)\) is defined as hash family with \(\{f_i\}\)-sparsity.

Since we can represent hash functions in \(H^{\{f_i\}}(n,m)\) using a set of XORs; we will use dense XORs to refer to hash functions with \(f_i=\frac{1}{2} \) for all i while we use sparse XORs to refer to hash functions with \(f_i < \frac{1}{2}\) for some i. Note that \({H}^{\{f_i = \frac{1}{2}\}}(n,m)\) is the standard pairwise independent hash family, also denoted as \(H_{xor}(n,m)\) in earlier works [21].

Definition 3

 [11] Let \(k \ge 0\) and \(\delta > 2\). Let Z be a random variable with \(\mu = E[Z]\). Then Z is strongly \((k, \delta )\)-concentrated if \(Pr[|Z - \mu | \ge \sqrt{k}] \le 1/\delta \) and weakly \((k, \delta )\)-concentrated if both \(Pr[ Z \le \) \(\mu - \sqrt{k}\)] \( \le 1/\delta \) and Pr[Z \(\ge \) \(\mu + \sqrt{k}\)] \(\le 1/\delta \).

2.1 Related Work

Gomes et al. [14] first identified the improvements in solving time due to the usage of sparse XORs in approximate model counting algorithms. The question of whether sparse XORs can provide the required theoretical guarantees was left open. A significant progress in this direction was achieved by Ermon et al. [11], who provided the first rigorous analysis of the usage of sparse XOR constraints. Building on Ermon et al., Zhao et al. [31] and Asteris and Dimakis [3] independently provided further improved analysis of Ermon et al. and showed that probability \(f = \mathcal {O}(\frac{\log n}{n}) \) suffices to provide constant factor approximation, which can be amplified to \((1+\varepsilon )\) approximation.

While the above mentioned efforts focused on each entry of A to be i.i.d., Achlioptas and Theodorpoulos [2], Achlioptas, Hammoudeh, and Theodorpoulos [1] investigated the design of hash functions where A is a structured matrix by drawing on connections to the error correcting codes. While their techniques provide a construction of sparse constraints, the constants involved in asymptotics lead to impractical algorithms for \((\varepsilon ,\delta )\) guarantees (See Sect. 9 of [1]). The work of Achlioptas et al. demonstrates the promise and limitations of structured random matrices in the design of hashing-based algorithms; however, there is no such study in the case when all the entries are i.i.d. In this paper, we theoretically improve the construction proposed by Asteris and Dimakis [3], and Zhao et al. [31] and perform a rigorous empirical study to understand the tradeoffs of sparsity.

3 Weakness of Guarantees Offered by Sparse XORs

In this section, we present the first contribution of this paper: demonstration of the weakness of theoretical guarantees obtained in prior work [3, 11, 31] for sparse XORs. To this end, we investigate whether the bounds offered by Zhao et al. on the variance of \(\mathsf {Cnt}_{\langle i \rangle }\), which are the strongest bounds known on sparse XORs, can be employed in the analysis of Cat2 techniques. For clarity of exposition, we focus on the usage of sparse XOR bounds in \(\mathsf {ApproxMC3}\), but our conclusions extend to other Cat2 techniques, as pointed out below.

The analysis of \(\mathsf {ApproxMC3}\) employs the bounds on the variance of \(\mathsf {Cnt}_{\langle i \rangle }\) using the following standard concentration bounds.

Lemma 1

For every \(\beta > 0\),\(0 < \varepsilon \le 1\), \(0 \le i \le n\), we have:

  1. 1.

    Chebyshev Inequality

    $$\begin{aligned} \mathsf {Pr}\left[ \left| \mathsf {Cnt}_{\langle i \rangle }-\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]\right| \ge \dfrac{\varepsilon }{1+\varepsilon }\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] \right] \le \dfrac{(1+\varepsilon )^{2}\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }]}{\varepsilon ^{2}\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2}} \end{aligned}$$
  2. 2.

    Paley-Zygmund Inequality

    $$\begin{aligned} \mathsf {Pr}[\mathsf {Cnt}_{\langle i \rangle } \le \beta \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]] \le \dfrac{1}{1+\dfrac{(1-\beta )^{2}\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2}}{\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }]}} \end{aligned}$$

The analysis of Cat2 techniques (and \(\mathsf {ApproxMC3}\) in particular) bounds the failure probability of the underlying algorithm by upper bounding the above expressions for appropriately chosen values of i. To obtain meaningful upper bounds, these techniques employ the inequality \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]\) obtained via the usage of 2-universal hash functionsFootnote 3.

Recall, that the core idea of the hashing-based framework is to employ 2-universal hash functions to partition the solution space into roughly equal sized small cells, wherein a cell is called small if it has solutions less than or equal to a pre-computed threshold, denoted by \(thresh\), which is chosen as \(\mathcal {O}(1/\varepsilon ^2)\). To this end, the analysis lower bounds \(\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]\) by \(\frac{thresh}{2}\), which allows the denominator to be lower bounded by a constant. Given that \(thresh\) can be set to \(\mathcal {O}(\frac{1}{\varepsilon ^2})^{1/c}\) for some \(c > 0\), we can relax the requirement on the chosen hash family to ensuring \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2-c}\) for some \(c > 0\). Note that pairwise independent hash functions based on dense XORs ensure \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]\) (i.e., \(c= 1\)).

We now investigate the guarantees provided by sparse XORs. To this end, we first recall the following result, which follows from combining Theorem 1 and Theorem 3 of [11].

Lemma 2

 [11]Footnote 4 For \(2\le |R_{F}| \le 2^{n}\), let

$$\begin{aligned} w^{*}&= max \left\{ w \bigm | \sum ^{w}_{j=1} \left( {\begin{array}{c}n\\ j\end{array}}\right) \le |R_{F}| -1 \right\} \\ q^{*}&= |R_{F}| -1 - \sum _{w=1}^{w^{*}} \left( {\begin{array}{c}n\\ w\end{array}}\right) \\ \eta&= \frac{1}{|R_{F}| -1} \left( q^{*}\left( \dfrac{1}{2} + \dfrac{1}{2}(1 -2f)^{w^{*}+1}\right) ^{m} +\sum _{w=1}^{w^{*}} \left( {\begin{array}{c}n\\ w\end{array}}\right) \left( \dfrac{1}{2} + \dfrac{1}{2}(1 -2f)^{w}\right) ^{m} \right) \end{aligned}$$

For \(h \xleftarrow {R}\) \(H^{\{f_j\} }(n,m)\), we have:

$$\begin{aligned} \sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] + \eta \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] (|R_{F}| -1 ) - \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2}. \end{aligned}$$

Zhao et al. [31], building on Ermon et al. [11], obtain the following bound (see, Lemma 8 and Lemma 10 of [31]).

Lemma 3

 [31] Define \(k = 2^{m} \eta (1 -\frac{1}{|R_{F}|})\). Then \(k \le \gamma \) for \(\gamma > 1\).

The bound on \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }]\) from Zhao et al. can be stated as:

Theorem 1

\(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \zeta \) where \( \zeta \in \varOmega (\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^2)\).

Proof

$$\begin{aligned} \sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] + \eta \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] (|R_{F}|-1) - \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2}. \end{aligned}$$

(Substituting \(|R_{F}| = \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] \times 2^m, \text { we have) }\)

$$\begin{aligned} \sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] + 2^{m}\eta \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2}(1 -1/|R_{F}|) -\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2} \end{aligned}$$

Substituting \(k = 2^{m} \eta (1- \frac{1}{|R_{F}|})\), we have:

$$\begin{aligned} \sigma ^2[\mathsf {Cnt}_{\langle i \rangle }] \le \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }] + (k-1) \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^2 = \zeta . \end{aligned}$$

Using Corollary 3, we have \(\zeta \in \varOmega (\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^2)\).

Recall, the analysis of \(\mathsf {ApproxMC3}\) requires us to upper bound \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }]\) by \(\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^{2-c}\) for \(c > 0\). Since the best-known bounds on \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }]\) lower bound \(\sigma ^2[\mathsf {Cnt}_{\langle i \rangle }]\) by \(\mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]^2\), these bounds are not sufficient to be used by \(\mathsf {ApproxMC3}\). At this point, one may wonder as to what is the key algorithmic difference between Cat1 and Cat2 that necessitates the use of stronger bounds: Cat1 techniques compute a constant factor approximation and then make use of Stockmeyer’s argument to lift a constant factor approximation to \((1+\varepsilon )\)-approximation, whereas, Cat2 techniques directly compute a \((1+\varepsilon )\)-approximation, which necessitates the usage of stronger concentration bounds.

4 \(\mathsf {SparseCount2}\): An Efficient Algorithm for Sparse XORs

The inability of sparse XORs to provide good enough bounds on variance for usage in Cat2 techniques, in particular \(\mathsf {ApproxMC3}\), leads us to ask: how do we design the most efficient algorithm for approximate model counting making use of the existing gadgets in the model counting literature. We recall that Zhao et al. [31] provided matching necessary and sufficient conditions on the required asymptotic density of matrix A. Furthermore, they proposed a hashing-based algorithm, \(\mathsf {SparseCount}\), that utilizes sparser constraints.

As mentioned earlier, Chakraborty et al. [9] proposed the technique of using prefix-slicing of hash functions in the context of hashing-based techniques and their empirical evaluation demonstrated significant theoretical and empirical improvements owing to the usage of prefix hashing. In this work, we first show a dramatic reduction in the complexity of \(\mathsf {SparseCount}\) by utilizing the concept of prefix-slicing and thereby improving the number of \(\mathsf {SAT}\) calls from \(\mathcal {O}(n \log n)\) to \(\mathcal {O}((\log n)^2)\) for fixed \(\varepsilon \) and \(\delta \) The modified algorithm, called \(\mathsf {SparseCount2}\), significantly outperforms \(\mathsf {SparseCount}\), as demonstrated in Sect. 5.

Algorithm 1 shows the pseudo-code for \(\mathsf {SparseCount2}\). \(\mathsf {SparseCount2}\) assumes access to \(\mathsf {SAT}\) oracle that takes in a formula \(\varphi \) and returns YES if \(\varphi \) is satisfiable, otherwise it returns NO. Furthermore, \(\mathsf {SparseCount2}\) assumes access to the subroutine \(\mathsf {MakeCopies}\) that creates multiple copies of a given formula, a standard technique first proposed by Stockmeyer [27] to lift a constant factor approximation to that of \((1+\varepsilon )\)-factor approximation for arbitrary \(\varepsilon \). Similar to Algorithm 1 of [11], we choose \(\{f_j\}\) in line 5, such that the resulting hash functions guarantee weak \((\mu _{i}^{2}, 9/4)\)-concentration for the random variable \(\mathsf {Cnt}_{\langle i \rangle }\) for all i, where \(\mu _{i} = \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }]\). \(\mathsf {SparseCount2}\) shares similarity with \(\mathsf {SparseCount}\) with the core difference in the replacement of linear search in \(\mathsf {SparseCount}\) with the procedure \(\mathsf {CoreSearch}\). \(\mathsf {CoreSearch}\) shares similarity with the procedure \(\mathsf {ApproxMC2Core}\) of Chakraborty et al. [9]. The subroutine \(\mathsf {CoreSearch}\) employs prefix search, which ensures that for all i, \(\mathsf {Cnt}_{\langle i \rangle } \ge \mathsf {Cnt}_{\langle i+1 \rangle }\). The monotonicity of \(\mathsf {Cnt}_{\langle i \rangle }\) allows us to perform a binary search to find the value of i for which \(\mathsf {Cnt}_{\langle i \rangle } \ge 1\) and \(\mathsf {Cnt}_{\langle i+1 \rangle } = 0\). Consequently, we make \(\mathcal {O}(\log n)\) calls to the underlying \(\mathsf {NP}\) oracle during each invocation of \(\mathsf {CoreSearch}\) instead of \(\mathcal {O}(n)\) calls in case of \(\mathsf {SparseCount}\). Note that \(\mathsf {CoreSearch}\) is invoked T times, where (\(\hat{n}, \delta , \varDelta \) as defined in the algorithm) and the returned value is added to the list \(\mathcal {C}\). We then return the median of \(\mathcal {C}\).

It is worth noting that \(\mathsf {SparseCount2}\) and \(\mathsf {ApproxMC3}\) differ only in the usage of \(thresh\), which is set to 1 for \(\mathsf {SparseCount2}\) and a function of \(\varepsilon \) for \(\mathsf {ApproxMC3}\), as observed in the discussion following Lemma 1. The usage of \(thresh\) dependent on \(\varepsilon \) requires stronger bounds on variance, which can not be provided by sparse XORs as discussed in the previous section.

figure a
figure b
figure c

4.1 Analysis of Correctness of \(\mathsf {SparseCount2}\)

We now present the theoretical analysis of \(\mathsf {SparseCount2}\). It is worth asserting that the proof structure and technique for \(\mathsf {SparseCount2}\) and \(\mathsf {ApproxMC3}\) are significantly different, as is evident from the inability of \(\mathsf {ApproxMC3}\) to use sparse XORs. Therefore, while the algorithmic change might look minor, the proof of correctness requires a different analysis.

Theorem 2

Let \(\mathsf {SparseCount2}\) employ \(H^{\{f_{j}\}_{j=0}^{n}}\) hash families, where \(\{f_{j}\}_{j=0}^{n}\) is chosen such that it guarantees weak \((\mu _i^{2}, 9/4)\)-concentration for the random variable \(\mathsf {Cnt}_{\langle i \rangle }\) for all i, then \(\mathsf {SparseCount2}\) returns count c such that

$$\begin{aligned} \mathsf {Pr}\left[ \frac{|R_{\varphi }|}{1+\varepsilon } \le c \le (1+\varepsilon )\times |R_{\varphi }|\right] \ge 1-\delta \end{aligned}$$

Proof

Similar to [31], we assume that \(|R_{\varphi }|\) is a power of 2; a relaxation of the assumption simply introduces a constant factor in the approximation. Let \(|R_{\psi }| = 2^{i^*}\) and for we define the variable \(\mathsf {Cnt}_{\langle i \rangle }^{t}\) to denote the value of \(\mathsf {Cnt}_{\langle i \rangle }\) when iter = t. Let \(\mu _{i}^{t} = \mathsf {E}[\mathsf {Cnt}_{\langle i \rangle }^{t}] = \frac{2^{i^*}}{2^i}\). Note that the choice of \(f_{i}\) ensures that \(\mathsf {Cnt}_{\langle i \rangle }^{t}\) is weakly \(((\mu _{i}^{t})^{2}, 9/4) \) concentrated.

Let \(\mathcal {E}\) denote the event that \(\hat{c} > 4\times |R_{\psi }|\) or \(\hat{c} < \frac{|R_{\psi }|}{4}\). We denote the event \(\hat{c} > 4 \times |R_{\psi }|\) as \(\mathcal {E}_{H}\) and the event \(\hat{c} < \frac{|R_{\psi }|}{4}\) as \(\mathcal {E}_L\). Note that \(\mathsf {Pr}[\mathcal {E}] = \mathsf {Pr}[\mathcal {E}_L] + \mathsf {Pr}[\mathcal {E}_H]\). We now compute \(\mathsf {Pr}[\mathcal {E}_L]\) and \(\mathsf {Pr}[\mathcal {E}_H]\) as follows:

  1. 1.

    From Algorithm 1, we have \(\hat{c} =\) Median\((\mathcal {C})\). For \(\hat{c} < \frac{|R_{\psi }|}{4}\), we have that for at least \(\frac{T}{2}\) iterations of \(\mathsf {CoreSearch}\) returns \(m < i^*-2\). For t-th invocation of \(\mathsf {CoreSearch}\) (i.e., \(iter = t\)) to return \(m-1\), then it is necessarily the case that \({\mathsf {Cnt}_{\langle m \rangle }^{t}} =0\). Since \(\{f_{j}\}_{j=0}^{n}\) is chosen such that the resulting hash function guarantees \(((\mu _{m}^{t})^{2}, 9/4)\)-concentration for the random variable \(\mathsf {Cnt}_{\langle m \rangle }^{t}\), we have \(\mathsf {Pr}[\mathsf {Cnt}_{\langle m \rangle }^{t} \ge 1] \ge 5/9\) for \(m \le i^*-2\). Let us denote, by \(\mathcal {E}^{i}_L\), the event that at least for \(\frac{T}{2}\) of \(\{\mathsf {Cnt}_{\langle i \rangle }^{t}\}_{t=0}^{T}\) we have \(\mathsf {Cnt}_{\langle i \rangle }^{t} =0\) . Therefore, by Chernoff bound we have \(\mathsf {Pr}[\mathcal {E}^{i}_L] \le e^{-\nu ^{(1)}T}\) where \(\nu ^{(1)} = 2(4/9 - 1/2)^{2}\). By applying union bound, we have \(\mathsf {Pr}[\mathcal {E}_L] \le ne^{-\nu ^{(1)}T} \)

  2. 2.

    Again, from the Algorithm 1, we have \(\hat{c} =\) Median\((\mathcal {C})\). Therefore, for \(\hat{c} > 4\times |R_{\psi }|\), we have at least \(\frac{T}{2}\) invocations of \(\mathsf {CoreSearch}\) return \(m > i^*+2\). For t-th invocation of \(\mathsf {CoreSearch}\) (i.e., \(iter = t\)) to return m, then it is necessarily the case that \({\mathsf {Cnt}_{\langle m-1 \rangle }^{t}} \ge 1\). Noting, \(\mathsf {E}[\mathsf {Cnt}_{\langle m \rangle }^{t}] = 2^{i^* - m}\). For \(m \ge i^*+2\), we have for \(m \ge i^*+2\)

    $$\begin{aligned} \mathsf {Pr}[\mathsf {Cnt}_{\langle m \rangle }^{t} \ge 1] \le 1/4. \end{aligned}$$

    Let us denote by \(\mathcal {E}^{i}_H\), the event that for at least \(\frac{T}{2}\) of \(\{\mathsf {Cnt}_{\langle i \rangle }^{t}\}_{t=0}^{T}\) values, we have \(\mathsf {Cnt}_{\langle i \rangle }^{t} \ge 1\). By Chernoff bound for \(m \ge i^*+2\), we have \(\mathsf {Pr}[\mathcal {E}^{i}_H] \le e^{-\nu ^{(2)}T}\) where \(\nu ^{(2)} = 2(1/4 - 1/2)^{2}\). By applying union bound, we have \(\mathsf {Pr}[\mathcal {E}_H] \le ne^{-\nu ^{(2)}T}\).

Therefore, we have \(\mathsf {Pr}[\mathcal {E}] = \mathsf {Pr}[\mathcal {E}_L ] + \mathsf {Pr}[\mathcal {E}_H ] \le ne^{-\nu ^{(1)}T} + ne^{-\nu ^{(2)}T} \). Substituting T, we have

$$\begin{aligned} \mathsf {Pr}\left[ \frac{|R_{\psi }|}{4} \le \hat{c} \le 4 \times |R_{\psi }| \right] \ge 1-\delta . \end{aligned}$$

Now notice that \(|R_{\psi }| = |R_{\varphi }|^\frac{1}{\log _4 (1+\varepsilon )} \); Therefore, \(\frac{|R_{\psi }|}{4} \le \hat{c} \le 4 \times |R_{\psi }|\) ensures that we have \(\frac{|R_{\varphi }|}{1+\varepsilon } \le c \le (1+\varepsilon ) \times |R_{\varphi }| \). Therefore,

$$\begin{aligned} \mathsf {Pr}\left[ \frac{|R_{\varphi }|}{1+\varepsilon } \le c \le (1+\varepsilon ) \times |R_{\varphi }| \right] \ge 1-\delta . \end{aligned}$$

5 Empirical Studies

We focus on empirical study for comparison of runtime performance of \(\mathsf {SparseCount}\), \(\mathsf {SparseCount2}\), and \(\mathsf {ApproxMC3}\). All the three algorithms, \(\mathsf {SparseCount}\), \(\mathsf {SparseCount2}\), and \(\mathsf {ApproxMC3}\), are implemented in C++ and use the same underlying \(\mathsf {SAT}\) solver, \(\mathsf {CryptoMiniSat}\) [26] augmented with the BIRD framework introduced in [24, 25]. \(\mathsf {CryptoMiniSat}\) augmented with BIRD is state of the art \(\mathsf {SAT}\) solver equipped to handle XOR constraints natively. It is worth noting that for hashing-based techniques, over 99% of the runtime is consumed by the underlying \(\mathsf {SAT}\) solver [25]. Therefore, the difference in the performance of the algorithms is primarily due to the number of \(\mathsf {SAT}\) calls and the formulas over which the \(\mathsf {SAT}\) solver is invoked. Furthermore, our empirical conclusions do not change even using the older versions of \(\mathsf {CryptoMiniSat}\).

Table 1. Table of comparison between \(\mathsf {SparseCount}\), \(\mathsf {SparseCount2}\), and \(\mathsf {ApproxMC3}\)

We conducted experiments on a wide variety of publicly available benchmarks. Our benchmark suite consists of 1896 formulas arising from probabilistic inference in grid networks, synthetic grid structured random interaction Ising models, plan recognition, DQMR networks, bit-blasted versions of SMTLIB benchmarks, ISCAS89 combinational circuits, and program synthesis examples. Every experiment consisted of running a counting algorithm on a particular instance with a timeout of 4500 s. The experiments were conducted on a high-performance cluster, where each node consists of E5-2690 v3 CPU with 24 cores and 96GB of RAM. We set \(\varepsilon = 0.8\) and \(\delta = 0.2\) for all the tools.

The objective of our empirical study was to seek answers to the following questions:

  1. 1.

    How does \(\mathsf {SparseCount}\) compare against \(\mathsf {SparseCount2}\) in terms of runtime performance?

  2. 2.

    How does \(\mathsf {SparseCount2}\) perform against \(\mathsf {ApproxMC3}\) in terms of runtime?

Overall, we observe that \(\mathsf {SparseCount2}\) significantly outperforms \(\mathsf {SparseCount}\). On the other hand, \(\mathsf {ApproxMC3}\) outperforms \(\mathsf {SparseCount2}\) with a mean speedup of \(568.53\times \).

Our conclusions are surprising and stand in stark contrast to the widely held belief that the current construction of sparse XORs by Zhao et al. [31] and Ermon et al. [11] lead to runtime improvement [1, 18, 19].

Figure 1 shows the cactus plot for \(\mathsf {SparseCount}\), \(\mathsf {SparseCount2}\), and \(\mathsf {ApproxMC3}\). We present the number of benchmarks on \(x-\)axis and the time taken on \(y-\)axis. A point (xy) implies that x benchmarks took less than or equal to y seconds for the corresponding tool to execute. We present a runtime comparison of \(\mathsf {SparseCount2}\) vis-a-vis \(\mathsf {SparseCount}\) and \(\mathsf {ApproxMC3}\) in Table 1. Column 1 of this table gives the benchmark name while column 2 and 3 list the number of variables and clauses, respectively. Column 4, 5, and 6 list the runtime (in seconds) of \(\mathsf {SparseCount}\), \(\mathsf {SparseCount2}\) and \(\mathsf {ApproxMC3}\), respectively. Note that “TO” stands for timeout. For lack of space, we present results only on a subset of benchmarks. The detailed logs along with list of benchmarks and the binaries employed to run the experiments are available at http://doi.org/10.5281/zenodo.3792748

We present relative comparisons separately for ease of exposition and clarity.

5.1 \(\mathsf {SparseCount}\) vis-a-vis \(\mathsf {SparseCount2}\)

As shown in Fig. 1, with a timeout of 4500 s, \(\mathsf {SparseCount}\) could only finish execution on 90 benchmarks while \(\mathsf {SparseCount2}\) completed on 379 benchmarks. Note that \(\mathsf {SparseCount2}\) retains the same theoretical guarantees of \(\mathsf {SparseCount}\).

Fig. 1.
figure 1

Cactus plot of runtime performance (best viewed in color)

For a clear picture of performance gain achieved by \(\mathsf {SparseCount2}\) over \(\mathsf {SparseCount}\), we turn to Table 1. Table 1 clearly demonstrates that \(\mathsf {SparseCount2}\) outperforms \(\mathsf {SparseCount}\) significantly. In particular, for all the benchmarks where both \(\mathsf {SparseCount}\) and \(\mathsf {SparseCount2}\) did not timeout, the mean speedup is 10.94\(\times \).

Explanation. The stark difference in the performance of \(\mathsf {SparseCount}\) and \(\mathsf {SparseCount2}\) is primarily due to a significant reduction in the number of \(\mathsf {SAT}\) calls in \(\mathsf {SparseCount2}\). Recall, \(\mathsf {SparseCount}\) invokes the underlying \(\mathsf {SAT}\) solver \(\mathcal {O}(n \log n)\) times while \(\mathsf {SparseCount}\) invokes the underlying \(\mathsf {SAT}\) solver only \(\mathcal {O}(\log ^2 n)\) times. As discussed above, such a difference was achieved due to the usage of prefix-slices.

5.2 \(\mathsf {ApproxMC3}\) vis-a-vis \(\mathsf {SparseCount2}\)

With a timeout of 4500 s, \(\mathsf {SparseCount2}\) could only finish execution on 379 benchmarks while \(\mathsf {ApproxMC3}\) finishes execution on 1169 benchmarks. Furthermore, Table 1 clearly demonstrates that \(\mathsf {ApproxMC3}\) significantly outperforms \(\mathsf {SparseCount2}\). In particular, for all the formulas where both \(\mathsf {SparseCount2}\) and \(\mathsf {ApproxMC3}\) did not timeout, the mean speedup is 568.53\(\times \). In light of recent improvements in \(\mathsf {CryptoMiniSat}\), one may wonder if the observations reported in this paper are mere artifacts of how the \(\mathsf {SAT}\) solvers have changed in the past few years and perhaps such a study on an earlier version of \(\mathsf {CryptoMiniSat}\) may have led to a different conclusion. To account for this threat of validity, we conducted preliminary experiments using the old versions of \(\mathsf {CryptoMiniSat}\) and again observed that similar observations hold. In particular, the latest improvements in \(\mathsf {CryptoMiniSat}\) such as BIRD framework [24, 25] favor \(\mathsf {SparseCount}\) and \(\mathsf {SparseCount2}\) relatively in comparison to \(\mathsf {ApproxMC3}\).

Explanation. The primary contributing factor for the difference in the runtime performance of \(\mathsf {SparseCount2}\) and \(\mathsf {ApproxMC3}\) is the fact that weaker guarantees for the variance of \(\mathsf {Cnt}_{\langle i \rangle }\) necessitates the usage of Stockmeyer’s trick of usage of the amplification technique wherein the underlying routines are invoked over \({\psi }\) instead of \({\varphi }\). Furthermore, the weak theoretical guarantees also lead to a larger value of T as compared to its analogous parameter in \(\mathsf {ApproxMC3}\). It is worth noticing that prior work on the design of sparse hash function has claimed that the usage of sparse hash functions leads to runtime performance improvement of the underlying techniques. Such inference may perhaps be drawn based only on observing the time taken by a \(\mathsf {SAT}\) solver on CNF formula with a fixed number of XORs and only varying the density of XORs. While such an observation does indeed highlight that sparse XORs are easy for \(\mathsf {SAT}\) solvers, but it fails, as has been the case in prior work, to take into account the tradeoffs due to the weakness of theoretical guarantees of sparse hash functions. To emphasize this further, the best known theoretical guarantees offered by sparse XORs are so weak that one can not merely replace the dense XORs with sparse XORs. The state of the art counters such as \(\mathsf {ApproxMC3}\) require stronger guarantees than those known today.

6 Conclusion

Hashing-based techniques have emerged as a promising paradigm to attain scalability and rigorous guarantees in the context of approximate model counting. Since the performance of \(\mathsf {SAT}\) solvers was observed to degrade with an increase in the size of XORs, efforts have focused on the design of sparse XORs. In this paper, we performed the first rigorous analysis to understand the theoretical and empirical effect of sparse XORs. Our conclusions are surprising and stand in stark contrast to the widely held belief that the current construction of sparse XORs by Zhao et al. [31] and Ermon et al. [11] lead to runtime improvement. We demonstrate that the theoretical guarantees offered by the construction as mentioned earlier are still too weak to be employed in the state of the art approximate counters such as \(\mathsf {ApproxMC3}\). Furthermore, the most efficient algorithm using sparse XORs, to the best of our knowledge, still falls significantly short of \(\mathsf {ApproxMC3}\) in runtime performance. While our analysis leads to negative results for the current state of the art sparse construction of hash functions, we hope our work would motivate other researchers in the community to investigate the construction of efficient hash functions rigorously. In this spirit, concurrent work of Meel Akshay [20] proposes a new family of hash functions called concentrated hash functions, and design a new family of sparse hash functions of the form \({A}y+b\) wherein every entry of A[i] is chosen with probability \(p_i \in \mathcal {O}(\frac{\log n}{n})\). Meel Akshay propose an adaption of \(\mathsf {ApproxMC3}\) that can make use of the newly designed sparse hash functions, and in turn, obtain promising speedups on a subset of benchmarks.