Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Learning Parity with Noise (LPN) problem [4] and its generalization to arbitrary moduli, the Learning with Errors (LWE) problem [29], lie at the heart of our most promising coding-based and lattice-based post-quantum cryptographic constructions  [2, 26, 28]. With the NIST standardization [1], we have the urgent pressure to identify LPN/LWE instantiations that allow for efficient constructions, but yet give us the desired level of classic and quantum security.

Hence, we have to run cryptanalytic algorithms on medium-scale parameter sets in order to properly extrapolate towards cryptographic instantiations.

In LPN of dimension k and error-rate , one has to recover a secret from samples for uniformly random and inner products with Bernoulli distributed error \(e_i\), i.e., \(\Pr [e_i=1] = p\).

It is not hard to see that for constant error p, any candidate solution can be checked for correctness in time polynomial in p and k. This gives a simple brute-force LPN algorithm with time complexity \(2^k\) and constant memory.

The algorithm of Blum, Kalai and Wassermann [5] solves the faulty system of linear equations from LPN/LWE by a block-wise Gaussian elimination. In a nutshell, BKW takes sums of two vectors whenever they cancel a block of \({\varTheta }(\frac{k}{\log k})\) bits to keep the accumulating error under control. In cryptographic constructions we usually have constant p, for which the BKW algorithm runs in time \(2^{\frac{k}{\log k}(1+o(1))}\), albeit using the same amount of memory.

These two algorithms, brute-force and BKW, settle the initial endpoints for our time-memory trade-offs. Hence, for gaining a \({\log (k)}\)-factor in the run time exponent one has to invest memory up to the point where memory equals run time. Interestingly, this behavior occurs in the second method of choice for measuring LWE security, lattice reduction, too.

Whereas on lattices of dimension n, lattice enumeration such as Kannan’s algorithm [21] takes time \(2^{\mathcal {O}(n \log n)}\) with polynomial memory only, lattice sieving methods  [12, 22,23,24] require \(2^{\mathcal {O}(n)}\) time and memory. Due to their large memory requirements, in practice lattice sieving is currently outperformed by enumeration, and there is an increasing interest in constructing time-memory trade-offs, e.g., by lattice tuple sieving [3, 18, 19].

For BKW, the research so far mainly concentrated on run time, where many optimizations have been proposed in the cryptographic literature, e.g. [7, 16, 17, 25, 32]. While these improvements may significantly reduce the running time for breaking concrete LPN instances such as \((k,p) = (512, \nicefrac 1 4)\) or \((512, \nicefrac 1 8)\), to the best of our knowledge for the running time exponent \(\frac{k}{\log k}(1+o(1))\) all improvements only affect the o(1)-term. Moreover, all proposals share the same huge memory requirements as original BKW, making it impossible to run them even in moderately large dimensions.

As a consequence state-of-the-art BKW implementations are currently only possible in dimension k up to around 100. For instance [14] reported a break of \((k, p) = (135, \nicefrac 1 4)\). However, of the total 6 days running time, the authors spent 2.5 days for an exponential preprocessing, followed by less than 2 h BKW in dimension 99, and another 3.5 days of exponential decoding. The reason for this run-time imbalance is that BKW in dimension 99 had already consumed all available memory, namely \(2^{40}\) bits.

Hence, if we really want to break larger LPN instances in practice, we must study time-memory trade-offs that sacrifice a bit of running time, but stay in the sub-exponential time regime at the benefit of a manageable memory.

Our contribution. We provide the first time-memory trade-offs for the BKW algorithm. These trade-offs give us a smooth interpolation for the complexities of solving LPN between the known endpoints \(2^{k}\) time for brute-force and \(2^{\frac{k}{\log k}}\) for BKW.

Since our algorithms’ running times remain subexponential even for given memory below the requirement \(2^{\frac{k}{\log k}}\) of classic BKW, we (asymptotically) outperform all previous algorithms (e.g. [14]) that solved LPN in exponential time when classic BKW was not applicable due to memory restrictions.

As a starting point, we consider—instead of 2-sums as in the original BKW algorithm—c-sums for (constant) \(c>2\) that cancel some blocks of bits. The use of sums of more than 2 vectors has already been discussed in the literature for improving the running time of BKW, e.g. by Zhang et al. [32] as an extension LF(k) of the BKW variants LF1 and LF2 by Levieil and Fouque [25], using Wagner’s k-list algorithm [31].

Since the number of c-sums grows exponentially in c, so does the number of c-sums whose sum cancels some block of bits. In turn, we systematically use c-sums to significantly lower the number of samples N that need to be stored, at the slightly increased cost of finding such c-sums.

We show that the complexities of any c-sum BKW algorithm are dominated by the cost of computing c-sums. As a consequence we abstract the c-sum-problem from the BKW algorithm and study various memory-friendly algorithms to solve it. We ensure that our c-sum algorithms do not require more memory than already consumed by c-sum BKW for storing its samples. In fact, our BKW algorithms have sample and (thus) memory complexity as little as \(N_c:={M_{\mathrm {BKW}}}^{\frac{\log c}{c-1}}\) for any constant \(c\in \mathbb {N}\), where \(M_\mathrm {BKW}:=2^{\frac{k}{\log k}}\) denotes the memory (and sample) requirement of classic BKW.

In Table 1, we give a brief overview of our c-sum algorithms complexities, and therefore also of our c-sum BKW complexities. We stress that all c-sum algorithms from Table 1, including those that use the Dissection technique [10, 30], may be studied for arbitrary list sizes outside of the BKW context.

Table 1. c-sum-algorithms: Given a list L and some target t, the algorithms output |L| sets each containing c entries from L adding to t. Memory consumption of the algorithms coincides with the size of list L. Let \(N_c:={(M_{\mathrm {BKW}})}^{\frac{\log c}{c-1}}=2^{\frac{\log c}{c-1}\frac{k}{\log k}}\).

Naive.     We first consider a naive approach that computes all \((c-1)\)-sums of list entries and looks for some matching \(c^\mathrm {th}\) vector. This naive approach already gives us a smooth first time-memory trade-off, informally captured in the following theorem.

Theorem 1.1

(Naive BKW Trade-Off, informal). Let \(c\in \mathbb {N}\). The LPN problem in dimension k can be solved in Time T and space M where

$$\begin{aligned} \log T = \log c \cdot \frac{k}{\log k},\quad \log M = \frac{\log c}{c-1}\cdot \frac{k}{\log k}. \end{aligned}$$

Observe that the trade-off behaves quite nicely as a function of c. While we can reduce the memory consumption almost by a factor of \(\frac{1}{c}\) this comes at the cost of only a \((\log c)\)-factor in the run time exponent.

Note that for \(c=2\) Theorem 1.1 yields the well-known BKW complexities. While we consider constant c throughout this work, we point out that our results hold up to a choice of \(c(k)=k^{1-\frac{\log \log k}{\log k}}\) for which the formulas in Theorem 1.1 (as well as for the upcoming trade-offs) result in exponential running time in k with polynomial memory, matching the endpoint of the LPN brute-force algorithm. See Fig. 1 (stars) for an illustration of this time-memory trade-off.

QBKW.     Using a standard Grover-search in our naive c-sum algorithm to identify \((c-1)\)-sums for which there exists a matching \(c^\mathrm {th}\) vector, we halve the running time complexity exponent from \(\log c \cdot \frac{k}{\log k}\) down to \(\frac{\log c}{2} \cdot \frac{k}{\log k}\). See Fig. 1 (triangles) for the resulting trade-off curve.

Dissection.     We replace our naive c-sum algorithm by more advanced time-memory techniques like Schroeppel-Shamir [30] and its generalization, Dissection [10], to reduce the classic running time. We call the resulting algorithm Dissection-BKW. To give some illustrative results, with the Schroeppel-Shamir technique Dissection-BKW achieves exponents

$$\begin{aligned} \log T = \frac{4}{3} \frac{k}{\log k},\quad \log M = \frac{2}{3} \frac{k}{\log k}\end{aligned}$$

(see the diamond at \(\log M = \frac{2}{3} \frac{k}{\log k}\) in Fig. 1). Using 7-Dissection, Dissection-BKW achieves exponents

$$\begin{aligned} \log T = 1.87 \frac{k}{\log k},\quad \log M = 0.47 \frac{k}{\log k}\end{aligned}$$

(see the diamond at \(\log M \approx 0.47 \frac{k}{\log k}\) in Fig. 1).

Theorem 1.2

(Dissection BKW Trade-Off, informal). Let \(c\in \mathbb {N}\) be sufficiently large. The LPN problem in dimension k can be solved in Time T and space M where

$$\begin{aligned} \log T = \left( 1-\sqrt{\frac{2}{c}}\right) \cdot \log c \cdot \frac{k}{\log k},\quad \log M = \frac{\log c}{c-1}\cdot \frac{k}{\log k}. \end{aligned}$$

Hence, in comparison to Theorem 1.1 Dissection mitigates the price we pay for saving a factor of \(\frac{\log c}{c-1}\) in memory from 1 down to \(\left( 1-\sqrt{2/c}\right) \).

The trade-off is depicted by the diamonds in Fig. 1. Interestingly, when classically employing Schroeppel-Shamir we are on par (see point \((\frac{2}{3},\frac{4}{3})\) in Fig. 1) with the complexities from the quantum trade-off as Schroeppel-Shamir allows for a square-root gain in the running time; the same as using a Grover-search in a quantum algorithm.

Tailored Dissection.     Eventually, we introduce a new slight generalization of the Dissection technique that we call tailored Dissection. It allows us to achieve a piece-wise continuous trade-off (line segments depicted in Fig. 1) covering the sub-exponential memory regime entirely.

Fig. 1.
figure 1

Illustration of our BKW trade-offs. Instantiations exist exactly for marks as well as everywhere on solid lines. Naive BKW trade-off in stars (see Theorem 1.1), QBKW trade-off in triangles and Dissection-BKW trade-offs in diamonds and solid line segments (see Theorem  1.2).

The full version of this work [13] also contains a discussion how our results translate from the LPN setting to LWE.

2 Preliminaries

2.1 Notation

For \(a\le b\in \mathbb {N}\) let \([a,b]:=\{a,a+1,\ldots ,b\}\) and \([a]:=[1,a]\). For a set S and \(s\le |S|\) let \(\left( {\begin{array}{c}S\\ s\end{array}}\right) \) denote the set of all size-s subsets of S. A list \(L=(l_1,\ldots ,l_i)\) is an element \(L\in {S}^i\) and is of length \(|L|=i\). We let \(\emptyset \) denote the empty list. For two lists \(L_1\), \(L_2\) we write \(L_1\subseteq L_2\) if all elements from \(L_1\) are contained in \(L_2\) at least as often as in \(L_1\). We write shortly \(l\in L_2\) for \((l)\subseteq L_2\). For lists \(L_1=(l_1,\ldots ,l_i)\) and \(L_2=(l_{i+1},\ldots ,l_j)\) we let \(L_1\cup L_2:=(l_1,\ldots ,l_i,l_{i+1},\ldots ,l_j)\). Logarithms are always base 2.

For \(v\in \mathbb {F}_2^a\) and \(b\le a\) we denote the last b coordinates of v by \(\mathrm {low}_b(v)\). We let denote the \(i^\mathrm {th}\) unit vector. \(0^b\) denotes the zero vector of dimension b.

By we denote the uniform distribution on a finite set M, by \(\mathrm {Ber}_p\) we denote the Bernoulli distribution, i.e., \(X\sim \mathrm {Ber}_p\) means that \(\Pr \left[ X=1\right] = 1-\Pr \left[ X=0\right] =p\). For n independent random variables \(X_1,\ldots ,X_n{\mathop {\sim }\limits ^{iid}}\mathrm {Ber}_p\) their sum X is binomial distributed with parameters n and p, denoted \(X\sim \mathrm {Bin}_{n,p}\). A probability p(k) is called overwhelming in k, if \(1-p(k)\) is negligible in k. We denote deterministic assignments in algorithms by \(\leftarrow \).

Theorem 2.1

(Chernoff Bound, [27]). Let \(X\sim \mathrm {Bin}_{n,p}\). Then

$$\begin{aligned} \Pr \left[ X\le (1-r)np\right]&\le \exp \left( -\frac{1}{2}r^ 2np\right) \hbox { for any }r\in [0,1]. \end{aligned}$$
(1)

2.2 The LPN Problem

Definition 2.1

((Search) LPN Problem). Let \(k\in \mathbb {N}\), and \(p\in [0,\frac{1}{2})\) be a constant. Let \(\mathtt {Sample}\) denote an oracle that, when queried, samples , \(e \sim \mathrm {Ber}_p\) and outputs a sample of the form . The \(\mathtt {LPN}_k\) problem consists of recovering given access to \(\mathtt {Sample}\). In the following we call k the dimension, the secret, p the error rate, b the label of and e the noise.

Brute-Force Sampling Algorithm. A straight-forward way to recover the first bit \(s_1\) of the secret is to query \(\mathtt {Sample}\) until we obtain a sample of the form . Then . Hence, \(\Pr \left[ s_1=b\right] =1-p>\frac{1}{2}\). However, as \(\mathtt {Sample}\) draws uniformly from \(\mathbb {F}_2^k\), we expect to have to query the oracle \(2^k\) times to have .

Further, to boost the confidence in recovering \(s_1\) correctly from merely \(1-p\) to being overwhelming (in k) one may collect many samples and decide on \(s_1\) by majority vote, whose error is bounded in the following lemma.

Lemma 2.1

(Majority Vote). Let \(q>\frac{1}{2}\) and \(X_1,\ldots ,X_n\sim Ber _q\) independently. Then

$$\begin{aligned} \Pr \left[ \sum _{i=1}^{n}X_i>\frac{n}{2}\right] \ge 1-\exp \left( -\frac{n}{2q}\left( q-\frac{1}{2}\right) ^2\right) . \end{aligned}$$

Proof

Since \(X:=\sum _{i=1}^nX_i\sim \mathrm {Bin}_{n,q}\), using Theorem 2.1 with \(r=1-\frac{1}{2q}\) gives

$$\begin{aligned} \Pr \left[ X>\frac{n}{2}\right] \ge 1-\exp \left( -\frac{1}{2}\left( 1-\frac{1}{2q}\right) ^2nq\right) =1-\exp \left( -\frac{n}{2q}\left( q-\frac{1}{2}\right) ^2\right) . \end{aligned}$$

   \(\square \)

Corollary 2.1

For \( n:=\frac{2(1-p)k}{(\frac{1}{2}-p)^2} \) many samples, a majority vote recovers \(s_1\) correctly with probability at least \(1-\exp (-k)\).

Proof

Let us define \(X_i=1\) iff \(b_i=s_1\), which implies \(q=1-p\). Lemma 2.1 yields the desired result.    \(\square \)

Therefore, for any constant error rate p, the majority vote requires only a linear number \(n = \mathcal {O}(k)\) of labels of the form . Clearly, we can recover the remaining bits \(s_j\), \(j=2,\ldots ,k\) of the secret by querying \(\mathtt {Sample}\) until we obtained sufficiently many samples with . Overall, the probability that we recover all bits of correctly is at least \((1-\exp (-k))^k\ge 1-k\cdot \exp (-k)=1-\mathrm {negl}(k)\).

2.3 Combining Samples

In [5], Blum, Kalai and Wasserman introduced the idea to construct from a collection of N arbitrary samples rather than merely waiting for a sample where . Their core idea is based on synthesizing a new sample from two existing ones \((\mathbf {a}_1, b_1)\), \((\mathbf {a}_2,b_2)\) via addition

$$\begin{aligned} (\mathbf {a}_1\oplus \mathbf {a}_2,b_1\oplus b_2)=(\mathbf {a}_1\oplus \mathbf {a}_2, \langle \mathbf {a}_1\oplus \mathbf {a}_2,s \rangle \oplus e_1\oplus e_2). \end{aligned}$$

For such a synthesized sample, which we call a 2-sum of samples, we have and \(e_1\oplus e_2\sim \mathrm {Ber}_{p'}\) where \(p'=\frac{1}{2}-\frac{1}{2}(1-2p)^2>p\) according to the following Piling-Up lemma.

Lemma 2.2

(Piling-Up Lemma [14]). Let \(p\in [0,1]\) and \(e_i\sim Ber _p\), \(i\in [n]\) be identically, independently distributed. Then

$$\begin{aligned} \bigoplus _{i=1}^{n}e_i\sim \mathrm {Ber}_{\frac{1}{2}-\frac{1}{2}(1-2p)^n}. \end{aligned}$$

Summing up two or more samples enables us to synthesize samples at the expense of an error rate approaching \(\frac{1}{2}\) exponentially fast in the number of summands.

3 The c-Sum-Problem and its Application to BKW

3.1 A Generalized BKW Algorithm

While BKW repeatedly adds pairs of samples to zero out chunks of coordinates, we add \(c>2\) samples to accomplish the same. Beneficially, as the number of size-c subsets grows exponentially in c, this allows for a drastic reduction in the number of initially required samples (thus, memory as well) while still finding sufficiently many sums of c vectors adding up to zero on a block of coordinates.

We give our in Algorithm 1. For a block-size b and \(j\in [a]\) we refer to the coordinates \([k-jb+1,k-(j-1)b]\) as the \(j^\mathrm {th}\) stripe. Essentially, consists of a for-loop (line 4) that generates zeros (resp. the first unit vector) on the \(j^\mathrm {th}\) stripe for \(j\in [a-1]\) (resp. the \(a^\mathrm {th}\) stripe). This step is repeated multiple times (see line 2) to obtain sufficiently many labels of samples in order to let a majority vote determine the first bit \(s_1\) of the secret with overwhelming probability. This process is repeated k times to recover all bits of .

For a list L as constructed in line 3, \(j\in [a]\) and \(t\in \mathbb {F}_2^b\) we let denote an algorithm that outputs a new list L where the coordinates of each entry match t on the \(j^\mathrm {th}\) stripe (see lines 5, 6). If L should shrink to size 0 throughout an execution, we abort and return a failure symbol (see lines 7, 8).

figure a

We point out that (essentially) the original BKW algorithm may be obtained by letting add pairs of vectors whose sum matches t on the \(j^\mathrm {th}\) stripe to \(L'\).

Let us introduce the c-sum-problem lying at the heart of any algorithm that shall be used to instantiate . In short, given a list of vectors \(L \in (\mathbb {F}_2^b)^*\), i.e., a stripe from the point of view, the \(c\)-sum-problem asks to collect sums of c vectors that add up to some target \(t \in \mathbb {F}_2^b\).

Definition 3.1

(The c-Sum-Problem ( )). Let \(b,c,N \in \mathbb {N}\) with \(c\ge 2\). Let \(L:=(l_1,\ldots ,l_{N})\) be a list where for all i and let \(t\in \mathbb {F}_2^b\) be a target. A single-solution of the is a size-c set such that

A solution is a set of at least N distinct single-solutions. The c-sum-problem consists of finding a solution when given L, t while c is usually fixed in advance. We refer to (Ltc) as an instance of the , concisely (Lt) if c is clear from the context.

Note that by definition a solution to the \(c\)-sum-problem consists of at least N single-solutions. These may again be combined into a new list of size (at least) N. Thus, we may apply a solving algorithm on different b-bit stripes of a list. Further, the list does not shrink if a solution exists in each iteration.

Obviously, a solution should exist whenever the list size N is large enough, since then sufficiently many c-sums add up to some given target t. In the following, we show that the lower bound for the list-size N from Algorithm 1 guarantees the existence of such a solution with overwhelming probability under the following heuristic assumption that is regularly used in the analysis of BKW-type algorithms [6, 7, 32].

Independence Heuristic. Obviously, c-sums are stochastically dependent for with . However, their dependency should only mildly affect the runtime of an iterative collision search as in BKW-type algorithms. For instance, it has been shown in [9] that the dependence between 2-sums merely affects the overall runtime exponent by an o(1)-term. We heuristically assume that this also holds for \(c>2\), and therefore treat (iterative) c-sums as independent in our run time analyses.

We provide various experiments in Sect. 6.1 that support the Independence Heuristic.

For Algorithm 1, we need the following lemma only for the special case \(\alpha =1\). However, our Tailored Dissection approach in Sect. 5.3 also requires \(\alpha >1\).

Lemma 3.1

Let (Lt) be a instance with

$$\begin{aligned} |L| = N^{\alpha } \text {, where } N=2^{\frac{b+c \log c +1 }{c-1}} \text { and } \alpha \ge 1. \end{aligned}$$

Then, under the Independence Heuristic, (Lt) has at least \(N^{\alpha }\) single-solutions with probability \(1-\exp (-N/4)\).

Proof

For every with define an indicator variable that takes value iff .

Let be the number of single-solutions to the . Under the Independence Heuristic, the can be analyzed as if independent, thus \(X\sim \mathrm {Bin}_{\left( {\begin{array}{c}N^{\alpha }\\ c\end{array}}\right) ,{2^{-b}}}\) is binomially distributed. Hence,

$$\begin{aligned} \mathbb {E}[X] = \left( {\begin{array}{c}N^{\alpha }\\ c\end{array}}\right) \cdot 2^{-b} \ge \left( \frac{N^{\alpha }}{c} \right) ^c \cdot 2^{-b} . \end{aligned}$$
(2)

Since \(\log N = \frac{b+c \log c + 1}{c-1}\), we obtain

$$\begin{aligned} \log \mathbb {E}[X]&\ge c(\alpha \log N - \log c) - b \ge \alpha \left( c(\log N - \log c) - b\right) \nonumber \\&= \alpha \cdot \Big ( c \left( \frac{b+c \log c + 1 - (c-1)\log c}{c-1} \right) -b \Big )\nonumber \\&= \alpha \cdot \frac{cb + c\log c + c - (c-1)b}{c-1} = \alpha \cdot \frac{b + c \log c + 1 + c-1}{c-1} \nonumber \\&= \alpha \cdot (\log N + 1) \ge \alpha \log N +1. \end{aligned}$$
(3)

Thus, our choice of N guarantees \(\mathbb {E}[X] \ge 2N^{\alpha }\). We can upper-bound the probability that the \(c\)-sum-problem has less than \(N^{\alpha }\) single-solutions solution using our Chernoff bound from Theorem 2.1 as

$$\begin{aligned} \Pr \left[ X<N^{\alpha }\right] \le \Pr \left[ X < \frac{1}{2} \mathbb {E}[X]\right] \le \exp \left( -\frac{\mathbb {E}[X]}{8} \right) \le \exp \left( -\frac{N}{4}\right) . \end{aligned}$$

   \(\square \)

Observation 3.1

Clearly, any algorithm solving the \(c\)-sum-problem may be used to replace in lines 5 and 6 of (Algorithm 1). As a consequence we do not distinguish between and an algorithm solving the \(c\)-sum-problem.

We now give Theorem 3.2 stating that inherits its complexities from . As a result, we may focus on solving the \(c\)-sum-problem in the remainder of this work.

Theorem 3.2

(Correctness and Complexities of ). Let denote an algorithm solving the in expected time \({T_{c,N}}\) and expected memory \(M_{c,N}\) with overwhelming success probability, where \(N \ge 2^{\frac{b+c\log c +1}{c-1}}\). Under the Independence Heuristic solves the \(\mathtt {LPN}_k\) problem with overwhelming probability in time T, memory M, where

$$\begin{aligned} T = (T_{c,N})^{1+o(1)},\quad M = (M_{c,N})^{1+o(1)}, \end{aligned}$$

using \(N^{1+o(1)}\) samples.

Proof

Let \(a=\frac{\log k}{(1+\varepsilon _a)\log c}\) and \(b=\frac{k}{a}\) as in Algorithm 1. Consider one iteration of the for-loop in line 2. As stated in Lemma 3.1 there exists a solution to the instance (implicitly defined via the \(j^\mathrm {th}\) stripe of L and target \(0^b\) (resp. )) with probability at least \(1-\exp (-\frac{N}{4})\). Hence, the probability that there is a solution to the instance in each iteration is greater than

$$\begin{aligned} {\left( 1-\exp \left( -\frac{N}{4}\right) \right) }^{an}\ge 1-an \cdot \exp \left( -\frac{N}{4}\right) . \end{aligned}$$

We now analyze the probability that a single bit of the secret gets recovered correctly. Since we build c-sums iteratively a times in lines 4–6, eventually we obtain vectors being a sum of at most \(c^{a}\) samples, having an error of \(\frac{1}{2}-\frac{1}{2}(1-2p)^{c^{a}}\) according to Lemma 2.2.

Note that the labels \(b_1,\ldots ,b_n\) collected in line 10 are stochastically independent as each \(b_i\) is obtained from freshly drawn samples. Now Corollary 2.1 yields that \(n:=\frac{8(1-p) k}{(1-2p)^{2c^a}}\) samples are sufficient for the majority vote to determine a bit of correctly with error probability at most \(\exp (-k)\). Using the Union Bound, one bit of the secret gets successfully recovered with probability at least

$$\begin{aligned} 1-an\cdot \exp \left( -\frac{N}{4}\right) -\exp (-k). \end{aligned}$$

As we have to correctly recover all, i.e., k bits of , the overall success probability of is at least

$$\begin{aligned} {\left( 1-2an \cdot \exp \left( -\frac{N}{4} \right) -\exp (-k)\right) }^k\ge 1-2ank \cdot \exp \left( -\frac{N}{4}\right) -k\exp (-k). \end{aligned}$$

Let us look at the term \(2ank \cdot \exp \left( -\frac{N}{4}\right) \). Since \(n = \tilde{\mathcal {O}}(2^{\kappa _p c^a})\) for some constant \(\kappa _p\), we obtain for constant c

(4)

Thus, the factor \(\exp \left( -\frac{N}{4}\right) \) clearly dominates and makes the overall success probability overwhelming in k.

Let us now analyze the time and memory complexity of . Since has only expected time/memory complexity, this expectation will be inherited to . We later show how to remove the expectation from ’s complexities by a standard technique.

Let us start with the running time of , where we ignore the negligible overhead of iterations of line 2 caused by failures in the -algorithm. Clearly, \(T_{c,N}\ge N\). Hence, one iteration of the for-loop can be carried out in time \(\tilde{\mathcal {O}}(\max \{N,a\cdot T_{c,N}\}) = \tilde{\mathcal {O}}(T_{c,N})\). Thus, for recovering the whole secret we get a running time of \(\tilde{\mathcal {O}}(n\cdot T_{c,N})\). From Equation (4) we already know that N dominates n, which implies that \(T_{c,N}\) dominates n. More precisely, we have \(n\cdot T_{c,N}= (T_{c,N})^{1+o(1)}\). Hence, we can also express the overall expected running time as \((T_{c,N})^{1+o(1)}\).

The memory consumption is dominated by , which gives us in total expected \(\tilde{\mathcal {O}}(M_{c,N})=(M_{c,N})^{1+o(1)}\). The sample complexity of is \( \tilde{\mathcal {O}}(knN)=\tilde{\mathcal {O}}(nN)=N^{1+o(1)}.\)

It remains to remove the expectations from the complexity statements for time and memory, while keeping the success probability overwhelming. We run k times aborting each run if it exceeds its expected running time or memory by a factor of 4. A standard Markov bound then shows that this modified algorithm fails to provide a solution with probability at most \(2^{-k}\). For details see the full version [13].    \(\square \)

In the following section, we discuss an algorithm to naively solve the \(c\)-sum-problem leading to a first trade-off in the subexponential time/memory regime that also allows for a quantum speedup.

4 First Time-Memory Trade-Offs for BKW

4.1 A Classic Time-Memory Trade-Off

A straight-forward algorithm to solve an instance of the \(c\)-sum-problem is to compute all sums of c elements from L. For \(N=|L|\) this approach takes time \(\mathcal {O}(N^c)\) while it requires memory \(\mathcal {O}(N)\).

With little effort, we can do slightly better: Let L be sorted. Now let us brute-force all \((c-1)\)-sums from L, add t, and check whether the result appears in L. This gives us by construction a c-sum that matches t.

The details of this approach are given in Algorithm 2. Notice that whenever we call , we should first sort the input list L, which can be done in additional time \({\tilde{\mathcal {O}}}(N)\).

figure b

The following lemma shows correctness and the complexities of Algorithm 2.

Lemma 4.1

finds a solution of the in time \(\tilde{\mathcal {O}}\left( N^{c-1}\right) \) and memory \(\tilde{\mathcal {O}}(N)\).

Proof

See full version [13].

Let us now replace in the algorithm the subroutine with Algorithm 2 , and call the resulting algorithm .

Theorem 4.1

(Naive Trade-Off). Let \(c\in \mathbb {N}\). For all \(\varepsilon >0\) and sufficiently large k, under the Independence Heuristic solves the \(\mathtt {LPN}_{k}\) problem with overwhelming success probability in time \(T=2^{\vartheta (1+\varepsilon )}\), using \(M=2^{\mu (1+\varepsilon )}\) memory and samples, where

$$\begin{aligned} \vartheta =\log c\cdot \frac{k}{\log k},\quad \mu =\frac{ \log c}{c-1}\cdot \frac{k}{\log k }. \end{aligned}$$

Proof

Let \(N:=2^{\frac{b+c\cdot \log c + 1}{c-1}}\). According to Lemma 4.1 is correct and we can apply Theorem 3.2 to conclude that solves LPN with overwhelming success probability.

Further Lemma 4.1 shows that runs in time

$$\begin{aligned} T_{c,N}=\tilde{\mathcal {O}}(N^{c-1}) = \tilde{\mathcal {O}}(2^{b+c \log c + 1}) = \tilde{\mathcal {O}}(2^b) \text { for constant } c. \end{aligned}$$

Thus, by Theorem 3.2 runs in time \(T=2^{b(1+o(1))}\). Since operates on stripes of width \(b=\log c \cdot \frac{k(1+\varepsilon _{a})}{\log k}\) (see the definition in Algorithm 1), we obtain the claimed complexity \(T=2^{\vartheta (1+\varepsilon _a+o(1))}=2^{\vartheta (1+\varepsilon )}\) for every \(\varepsilon >\varepsilon _a\) and sufficiently large k.

Since requires memory \(M_{c,N}=\tilde{\mathcal {O}}(N)\), by Theorem 3.2 the memory complexity of is (for constant c)

$$\begin{aligned} M=(M_{c,N})^{1+o(1)} = N^{1+o(1)} = (2^{\frac{b}{c-1}})^{1+o(1)} = \big (2^{\frac{\log c}{c-1}\cdot \frac{k}{\log k}}\big )^{1+o(1)}. \end{aligned}$$

The sample complexity of is \(N^{1+o(1)}\) (see Theorem 3.2) and therefore identical to M.    \(\square \)

Figure 1 shows the time-memory trade-off from Theorem 4.1 depicted by stars.

4.2 A Quantum Time-Memory Trade-Off

Grover’s algorithm [15] identifies a marked element in an unsorted database D in time \(\mathcal {O}(\sqrt{|D|})\) with overwhelming probability. A matching lower bound \({\varOmega }(\sqrt{|D|})\) by Donotaru and Høyer [11] shows that Grover’s algorithm is optimal. We use a modification of Grover’s algorithm due to [8], denoted Grover, that applies even in case the number of marked elements is unknown.

Theorem 4.2

(Grover Algorithm [8]). Let \(f:D\rightarrow \{0,1\}\) be a function with non-empty support. Then Grover outputs with overwhelming probability a uniformly random preimage of 1, making q queries to f, where

$$\begin{aligned} q=\tilde{\mathcal {O}}\left( {\sqrt{\frac{|D|}{\left| f^{-1}(1)\right| }}}\right) . \end{aligned}$$

We use Grover’s algorithm to speed up our naive approach to solving the \(c\)-sum-problem. While we previously brute-forced all \((c-1)\)-sums in list L and checked if there is a fitting \(c^\mathrm {th}\) entry in L, we now employ a Grover-search to immediately obtain \((c-1)\)-sums for which there exists a suitable \(c^\mathrm {th}\) element in L. Let us define the Grover function \(f_t\) as

Given some we can recover all \(i_c\) such that is a single-solution of instance (Lt) in time \(\tilde{\mathcal {O}}(\log |L|)\) if L is sorted.

figure c

Lines 3–7 of and are identical. We merely replaced the brute-force search (line 1 in Algorithm 2) by a Grover-search (lines 1, 2 in Algorithm 3).

Lemma 4.2

solves the with overwhelming probability in time \(\tilde{\mathcal {O}}\left( N^{c/2}\right) \) and memory \(\tilde{\mathcal {O}}(N)\).

Proof

See full version [13].

Let \(\mathtt {QBKW}\) denote algorithm where is instantiated using .

Theorem 4.3

Let \(c\in \mathbb {N}\). For all \(\varepsilon >0\) and sufficiently large k, under the Independence Heuristic \(\mathtt {QBKW}\) solves the \(\mathtt {LPN}_{k}\) problem with overwhelming success probability in time \(T=2^{\vartheta (1+\varepsilon )}\), using \(M=2^{\mu (1+\varepsilon )}\) memory and samples, where

$$\begin{aligned} \vartheta =\frac{c}{2\cdot (c-1)}\cdot \log c\cdot \frac{k}{\log k},\quad \mu =\frac{ \log c}{c-1}\cdot \frac{k}{\log k }. \end{aligned}$$

Proof

The proof proceeds along the lines of the proof of Theorem 4.1.

The trade-off from Theorem 4.3 is depicted in Fig. 1 on Page 5 by triangles.

5 Time-Memory Trade-Offs for BKW via Dissection

While a naive approach already led to a first time-memory trade-off, a meet-in-the-middle approach for solving the \(c\)-sum-problem prohibits a trade-off, as we explain in Sect. 5.1. As a consequence, we resort to the more advanced Dissection technique [10, 30] in Sect. 5.2. However, as Dissection merely leads to instantiations for a rather sparse choice of available memory, we give a slight generalization of Dissection tailored to any given amount of memory. This allows for a trade-off covering the whole range of possible memory (see Sect. 5.3).

Let us define the specific structure of single-solutions that are recovered by all \(c\)-sum-problem algorithms in this section.

Definition 5.1

(Equally-split (Single-)Solution). Let (Lt) be an instance. Partition L into lists \(L_1,\ldots ,L_c\) of size \(\frac{L}{c}\) each. A single-solution is equally-split (wrt. \(L_1,\ldots ,L_c\)) if it corresponds to list elements \(l_1,\ldots ,l_c\) whereby \(l_i \in L_i\) for all \(i\in [c]\). An equally-split solution is a collection of N equally-split single-solutions.

Lemma 5.1

Let (Lt) be a instance with

$$\begin{aligned} |L| = N^{\alpha } \text {, where } N=2^{\frac{b+c \log c +1 }{c-1}} \text { and } \alpha \ge 1. \end{aligned}$$

Then, under the Independence Heuristic, (Lt) has at least \(N^{\alpha }\) equally-split single-solutions with probability \(1-\exp (-N/4)\).

Proof

Let X be a random variable for the number of equally-split single-solutions. Then \(\mathbb {E}[X] = \left( \frac{N^{\alpha }}{c}\right) ^c \cdot 2^{-b}\). Hence, Eq. (2) is satisfied, and the rest follows as in the proof of Lemma 3.1.    \(\square \)

5.1 Meet-in-the-Middle and Beyond

Let (Lt) be an instance of the . In a meet-in-the-middle approach one splits a c-sum \(t=v_{i_1}\oplus \ldots \oplus v_{i_c}\) into two parts \(t=\big (v_{i_1}\oplus \ldots \oplus v_{i_\frac{c}{2}}\big )\oplus \big (v_{i_{\frac{c}{2}+1}}\oplus \ldots \oplus v_{i_c}\big )\). Let us define \(L_1:=(v_1,\ldots ,v_{\frac{\left| L\right| }{2}})\), \(L_2:=(v_{\frac{\left| L\right| }{2}+1},\ldots ,v_{\left| L\right| })\) and consider a single-solution corresponding to \(\frac{c}{2}\) elements from \(L_1\) and \(L_2\) each:

$$\begin{aligned} (v_{i_1},\ldots , v_{i_\frac{c}{2}})\subseteq L_1\text {and}\quad (v_{i_{\frac{c}{2}+1}},\ldots , v_{i_c})\subseteq L_2. \end{aligned}$$

If we compute all \(\frac{c}{2}\)-sums of elements from \(L_1\) and save them in a new list \(L_1^{\frac{c}{2}}\), then for each \(\frac{c}{2}\)-sum v of \(L_2\) we can check if \(w:=v\oplus t \in L_1^{\frac{c}{2}}\). If so, v and w form a single-solution since \(v\oplus w=t\). Obviously, this approach has expected time and memory complexity \(\tilde{\mathcal {O}}(\max (|L|,|L|^\frac{c}{2}))=\tilde{\mathcal {O}}(|L|^\frac{c}{2})\) for \(c\ge 2\). Yet, choosing \(c>2\) only leads to worse complexities, time and memory-wise, while for \(c=2\) the complexities for the meet-in-the-middle approach are as bad as for .

Schroeppel and Shamir’s Meet-in-the-Middle. We present a heuristic simplification of the Schroeppel-Shamir algorithm [30] due to Howgrave-Graham and Joux [20].

In a nutshell, the idea of Schroeppel and Shamir is to run a meet-in-the-middle attack but impose an artificial constraint \(\tau \) on the \(\frac{c}{2}\)-sums. This results in lists \(L_1^{\frac{c}{2},\tau }\) that are significantly smaller than \(L_1^{\frac{c}{2}}\) in the original meet-in-the-middle approach. In order to find all single-solutions, one iterates \(\tau \) over its whole domain. \(L_1^{\frac{c}{2},\tau }\) is in turn built from smaller lists as follows. Let \(t=v_{i_1}\oplus \ldots \oplus v_{i_c}\) and write

$$\begin{aligned} t= \underbrace{v_{i_1}\oplus \ldots \oplus v_{i_\frac{c}{4}}}_{\ell _{11}}\oplus \underbrace{v_{i_{\frac{c}{4}+1}}\oplus \ldots \oplus v_{i_{\frac{c}{2}}}}_{\ell _{12}}\oplus \underbrace{v_{i_{\frac{c}{2}+1}}\oplus \ldots \oplus v_{i_{\frac{3c}{4}}}}_{\ell _{21}}\oplus \underbrace{v_{i_{\frac{3c}{4}+1}}\oplus \ldots \oplus v_{i_c}}_{\ell _{22}}. \end{aligned}$$

Create four lists \(L_{11}^{\frac{c}{4}}\), \(L_{12}^{\frac{c}{4}}\), \(L_{21}^{\frac{c}{4}}\), \(L_{22}^{\frac{c}{4}}\) containing all \(\frac{c}{4}\)-sums of elements from the first, second, third, and fourth quarter of L. Let \(\ell _{i,j}\) denote elements from \(L_{ij}^{\frac{c}{4}}\) for \(i,j=1,2\). As a constraint we choose \(\mathrm {low}_{\frac{c}{4}\log |L|}(\ell _{11}\oplus \ell _{12})=\tau \) for some fixed \(\tau \). Now we construct \(L_1^{\frac{c}{2},\tau }\) from \(L_{11}^{\frac{c}{4}}\) and \(L_{12}^{\frac{c}{4}}\) using a meet-in-the-middle approach requiring expected time \(\tilde{\mathcal {O}}(|L|^{\frac{c}{4}})\). Similarly, we can compute all elements of list \(L_{2}^{\frac{c}{2},t\oplus \tau }\) to obtain sums of \(\frac{c}{2}\) vectors (\(\frac{c}{4}\) from \(L_{2,1}^{\frac{c}{4}}\) and \(L_{2,2}^{\frac{c}{4}}\) each) matching \(t\oplus \tau \) on the last \(|\tau |=\frac{c}{4}\log |L|\) bits. Eventually, given the elements from \(L_{2}^{\frac{c}{2},t\oplus \tau }\) and list \(L_1^{\frac{c}{2},\tau }\) one can some recover single-solutions as before. Thereby list \(L_1^{\frac{c}{2},\tau }\) is of expected size

$$\begin{aligned} \mathbb {E}\left[ \left| L_1^{\frac{c}{2},\tau }\right| \right] = \left| L_{11}^{\frac{c}{4}}\right| \cdot \left| L_{12}^{\frac{c}{4}}\right| \cdot {2^{-\left| \tau \right| }} = |L|^{2\cdot \frac{c}{4}-\frac{c}{4}} = |L|^{\frac{c}{4}}. \end{aligned}$$

We conclude that the expected memory consumption of the algorithm is given by \(\tilde{\mathcal {O}}(\max \left\{ |L|,|L|^{\frac{c}{4}}\right\} )\).

As we iterate over \(2^{|\tau |}=|L|^{\frac{c}{4}}\) choices of \(\tau \) requiring expected time \(\tilde{\mathcal {O}}(|L|^{\frac{c}{4}})\) per iteration, the overall expected running time is given by \(\tilde{\mathcal {O}}(|L|^{\frac{c}{2}})\) for all \(c\ge 4\). Hence, for \(c=4\) we obtain an algorithm as fast as the meet-in-the-middle approach, while consuming only expected memory \(\tilde{\mathcal {O}}(|L|)\).

Note that it is sufficient to store list \(L_1^{\frac{c}{2},\tau }\) while all elements from \(L_2^{\frac{c}{2},t\oplus \tau }\) may be computed and checked on-the-fly, i.e., without storing its elements.

The Schroeppel-Shamir algorithm is a special case of Dissection run on 4 lists in the subsequent section, which further exploits the asymmetry between storing lists and computing (memory-free) lists on the fly.

5.2 Dissection

Dissection can be considered as a memory-friendly member of the class of k-list algorithms [31]. A Dissection algorithm is given c lists \(L_c,\ldots ,L_1\) and a target t. For simplicity the algorithm merely sums list elements to obtain target t rather than outputting a single-solution explicitly. One could keep track of indices of those elements that sum to t, but for ease of notation we omit this. Instead, we employ some abstract “index recovering” procedure later.

High-level Structure of Dissection.     A Dissection identifying sums of vectors adding up to some target t consists of the following steps

(1) :

A loop iterates over an artificially introduced constraint \(\tau \). For each \(\tau \):

(1.1) :

A meet-in-the-middle approach is run to obtain a list L of sums from the first few lists that add up to \(\tau \) on a some bits. List L is kept in memory.

(1.2) :

Dissection is called recursively to find sums from the remaining lists that sum to \(t \oplus \tau \) on a some bits. Found sums are passed to the parent call on-the-fly.

(1.3) :

For all sums passed from the recursive call in step (1.2) list L is searched to construct sums adding up to \(\tau \) on some more bits.

Before giving the fully-fledged Dissection algorithm, let us give an 11-Dissection example.

Fig. 2.
figure 2

High-level structure of an 11-Dissection on input lists \(L_{11},\ldots ,L_1\). Recursively called 7- and 4-Dissection enclosed in dashed boxes. Arrows entering a list from the right side indicate a check on-the-fly. Arrows leaving a list on the left side indicate that a found match is returned on-the-fly. Arrows entering a list from the top indicate that the list below is populated with entries from above and stored entirely.

Example 5.1

(11-Dissection). An 11-Dissection is run on lists \(L_{11},\ldots ,L_1\) and some target t. It loops through an artificial constraint \(\tau _3\). Within each iteration: A list \(L_{(11,8)}\) containing sums \(l_{11} \oplus \ldots \oplus l_8\) consistent with constraint \(\tau _3\) where \(l_{11}\in L_{11}\), ..., \(l_8\in L_8\) is computed and stored. Then, still within the first iteration of the loop, a 7-Dissection is run (see Fig. 2) on lists \(L_7,\ldots ,L_1\) and a modified target \(t\oplus \tau _3\). The 7-Dissection itself introduces a constraint \(\tau _2\) and stores a list \(L_{(7,5)}\) containing sums of elements from \(L_7, L_6, L_5\) fulfilling \(\tau _2\). It recursively calls a 4-Dissection, i.e. Schroeppel-Shamir (Sect. 5.1), on target \(t \oplus \tau _3 \oplus \tau _2\). Internally, the 4-Dissection introduces another constraint \(\tau _1\).

Whenever a partial sum is identified by the 4-Dissection, it is passed to the 7-Dissection on-the-fly while the 4-Dissection carries on. See the “chain” at the bottom in Fig. 2 traversing lists from right to left. Once, the 7-Dissection receives a sum, it immediately checks for a match in list \(L_{(7,5)}\) and discards or returns it—in case of the latter—enriched by a matching element from \(L_{(7,5)}\) to the 11-Dissection and continues.

Whenever the 11-Dissection receives a sum from the 7-Dissection, it instantly checks for a match in list \(L_{(11,8)}\) to detect if a single-solution has been found.

Definition 5.2

(Join Operator). Let \(d\in \mathbb {N}\) and \(L_1\), \(L_2\in {(\mathbb {F}_2^d)}^*\) be lists. The join of \(L_1\) and \(L_2\) is defined as

$$\begin{aligned} L_1\bowtie L_2:=(l_1\oplus l_2:l_1\in L_1, l_2\in L_2). \end{aligned}$$

For \(t\in \mathbb {F}_2^{\le d}\) the join of \(L_1\) and \(L_2\) on t is defined as

$$\begin{aligned} L_1\bowtie _t L_2:=(l_1\oplus l_2:l_1\in L_1, l_2\in L_2 \wedge \mathrm {low}_{|t|}(l_1\oplus l_2)=t). \end{aligned}$$

If \(L_2=(l_2)\) we write \(L_1\bowtie _t l_2\) instead of \(L_1\bowtie _t(l_2)\).

Clearly, computing elements contained in a sequence of joins can be implemented memory-friendly without having to explicitly compute intermediate lists.

Definition 5.3

(The Magic Sequence [10]). Let \(c_{-1}:=1\) and

$$\begin{aligned} c_i:=c_{i-1}+i+1 \end{aligned}$$
(5)

for all \(i \in \mathbb {N}\cup \{0\}\). The magic sequence is defined as \(\mathsf {magic}:=(c_i)_{i\in \mathbb {N}^{\ge 1}}\).

One easily verifies that, alternatively,

$$\begin{aligned} \mathsf {magic}=\left( \frac{1}{2}\cdot \left( i^2+3i+4\right) \right) _{i\in \mathbb {N}^{\ge 1}}. \end{aligned}$$
(6)

As Dissection formulated in the language of k-list algorithms might be of independent interest we deter from adding -solving related details to the Dissection algorithm presented next, but rather introduce a simple wrapper algorithm (see Algorithm 5) to solve the \(c\)-sum-problem afterwards.

We define the class of Dissection algorithms recursively. Let \(L_2,L_1\in {(\mathbb {F}_2^d)}^*\), \(d\in \mathbb {N}\) and \(t\in \mathbb {F}_2^{\le d}\). Then

(7)

Here, “\(\texttt {inner}\)” indicates that was called recursively by another Dissection algorithm in contrast to an initial, explicit call to run Dissection that will be called with parameter “\(\texttt {outer}\)”.

We proceed to define for \(i \ge 1\). Lists are denoted by L and input-lists are numbered \(L_{c_i}\) down to \(L_1\) for \(c_i\in \mathsf {magic}\). As a reading aid, list and element indices keep track of input-lists they originated from: A list \(L_{(j,i)}\) is the output of a join of lists \(L_j \bowtie \dots \bowtie L_i\) for \(j>i\). List elements are denoted l. We write \(l_{(j,i)}\) to indicate that \(l_{(j,i)}\) is a sum of elements \(l_\kappa \in L_\kappa \) for \(\kappa =j,\ldots ,i\).

figure d
figure e

The optimality of this recursive structure is shown in [10]. We now establish the required properties of Algorithm 4 in a series of lemmata. We give detailed proofs in the full version [13].

Lemma 5.2

(Correctness of ). For some fixed \(j_a\) let \(l_a:=L_a(j_a)\) denote the \(j_a^\mathrm {th}\) element of list \(L_a\). When halts, set S contains \((j_{c_i},\ldots ,j_1)\in {[2^\lambda ]}^{c_i}\) if and only if \(\bigoplus _{a=1}^{c_i} l_a=t\).

Proof

See full version [13].

Lemma 5.3

(Memory Consumption of ). For all \(i\ge 1\) algorithm requires expected memory \(\tilde{\mathcal {O}}(\max \{2^\lambda ,\mathbb {E}[|S|]\})\).

Proof

See full version [13].

Lemma 5.4

Let \(i\ge 1\) and consider one iteration of the for-loop in line 1 within a call of . Then, in total, expected \(\tilde{\mathcal {O}}(2^{c_{i-2}\cdot \lambda })\) elements are returned in line 5.

Proof

See full version [13].

Lemma 5.5

Let \(i\ge 1\). Algorithm runs in expected time \(\tilde{\mathcal {O}}(2^{c_{i-1}\cdot \lambda })\).

Proof

See full version [13].

Lemma 5.6

(Running Time of ). Let \(i\ge 1\). A call of algorithm runs in expected time \(\tilde{\mathcal {O}}(\max \{2^{c_{i-1}\cdot \lambda },\mathbb {E}[|S|]\})\).

Proof

See full version [13].

Lemma 5.7

Let \(b\in \mathbb {N}\) and \(c_i \in \mathsf {magic}\). For \(t\in \mathbb {F}_2^b\) let (Lt) be an instance of the where \(|L|=N:= 2^{\frac{b+c_i\cdot \log c_i +1}{c_i-1}}\). Under the Independence Heuristic solves the with overwhelming probability in expected time \(T=\tilde{\mathcal {O}}(N^{c_{i-1}})\) and expected memory \(M=\tilde{\mathcal {O}}(N)\).

Proof

From Lemma 5.1 we know that at least N equally-split single-solutions exist with overwhelming probability. Lemma 5.2 ensures that recovers all of them. Note that the lists defined in line 1 are of length

$$\begin{aligned} 2^\lambda = \frac{N}{c_i} = 2^{\frac{b+c_i\cdot \log c_i +1}{c_i-1} - \log c_i} = 2^{\frac{b+\log c_i +1}{c_i-1}}. \end{aligned}$$

One easily verifies that \(\lambda = \frac{b+\log c_i +1}{c_i-1} \le \frac{b}{i}\) as syntactically required by . Hence, Algorithm 5 solves the with overwhelming probability.

From Lemma 5.3 we have

$$\begin{aligned} M= \tilde{\mathcal {O}}(\max \{2^\lambda ,\mathbb {E}[|S|]\})= \tilde{\mathcal {O}}(\max \{N,\mathbb {E}[|S|]\}). \end{aligned}$$

Under the Independence Heuristic we obtain \( \mathbb {E}\big [|S|\big ]=\left( \frac{N}{c_i}\right) ^{c_i}\cdot 2^{-b}=2N \), where the last equality follows from Eq. (3). Therefore, \(M= \tilde{\mathcal {O}}(N)\).

From Lemma 5.6 we conclude

$$\begin{aligned} T= \tilde{\mathcal {O}}(\max \{2^{c_{i-1}\cdot \lambda },\mathbb {E}[|S|]\}) = \tilde{\mathcal {O}}(N^{c_{i-1}}). \end{aligned}$$

   \(\square \)

Since \(c_{i-1}=c_i-i-1\) by Eq. (5) and \(i\approx \sqrt{2c_i}\) by Eq. (6), we see that reduces the time complexity of solving the \(c\)-sum-problem from \(N^{c-1}\) () down to roughly \(N^{c-\sqrt{2c}}\) (see also Table 1). Let denote the variant of , where is instantiated using .

Theorem 5.1

(Dissection Trade-Off). Let \(c_i\in \mathsf {magic}\), \(\varepsilon >0\) and \(k\in \mathbb {N}\) sufficiently large. Under the Independence Heuristic solves the \(\mathtt {LPN}_{k}\) problem with overwhelming success probability in time \(T=2^{\vartheta (1+\varepsilon )}\), using \(M=2^{\mu (1+\varepsilon )}\) memory and samples, where

$$\begin{aligned} \vartheta = \left( 1-\frac{i}{c_i-1}\right) \cdot \log c_i\cdot \frac{k}{\log k},\quad \mu = \frac{\log c_i}{c_i-1}\cdot \frac{k}{\log k}. \end{aligned}$$

Proof

Let \(N:= 2^{\frac{b+c_i\cdot \log c_i + 1}{c_i-1}}\). It follows from Lemma 5.7 and Theorem 3.2 that solves LPN with overwhelming probability. We now combine Lemma 5.7 and Theorem 3.2 to compute the running time T and memory complexity M of . We have

$$\begin{aligned} \log T = c_{i-1} \cdot \frac{b+c_i\cdot \log c_i+1}{c_i-1} \cdot (1+o(1)), \end{aligned}$$

whereby operates on stripes of size \(b=\log c_i \cdot \frac{k \cdot (1+\varepsilon _a)}{\log k}\). Hence

$$\begin{aligned} \log T= & {} {\left( \frac{c_{i-1}\log c_i}{c_i-1}\cdot \frac{k \cdot (1+\varepsilon _a)}{\log k}+\frac{c_{i-1} \cdot (c_i\log c_i+1)}{c_i-1}\right) \cdot (1+o(1))}\\= & {} {\frac{c_{i-1}\log c_i}{c_i-1}\cdot \frac{k}{\log k}\cdot (1+\varepsilon _a+o(1))}\\= & {} \frac{c_{i-1}}{c_i-1}\cdot \log c_i\cdot \frac{k}{\log k}\cdot (1+\varepsilon ) \end{aligned}$$

for every \(\varepsilon >\varepsilon _a\) and sufficiently large k. Finally

$$\begin{aligned} \log T {\mathop {=}\limits ^{(5)}}{\left( 1-\frac{i}{c_i-1}\right) \cdot \log c_i\cdot \frac{k}{\log k}\cdot (1+\varepsilon )}. \end{aligned}$$

Analogously we have for M:

$$\begin{aligned} \log M ={\frac{\log c_i}{c_i-1}\cdot \frac{k}{\log k}\cdot (1+\varepsilon )}, \end{aligned}$$

for every \(\varepsilon >\varepsilon _a\) and sufficiently large k. The sample complexity of is \(N^{1+o(1)}=M\).    \(\square \)

The trade-off of Theorem 5.1 clearly improves over the naive trade-off initially obtained in Theorem 4.1. While the memory consumption of the Dissection approach remains the same as for the naive trade-off, we can reduce the price we have to pay in time from \(\log c_i\) down to \((1-\frac{i}{c_i-1})\cdot \log c_i\).

The trade-off from Theorem 5.1 is given in Fig. 1 on Page 5 as diamonds. Although it improves over the naive tradeoff from Sect. 4.1, this improvement comes at the price of covering only \(c_i\)-sums with \(c_i\in \mathsf {magic}\). Given some available memory M, one would choose the minimal \(\mathtt {c}_{\mathtt {i}}\in \mathsf {magic}\) such that consumes at most memory M. However, such a choice of \(c_i\) is unlikely to fully use M. In the following section, we show how to further speed up the algorithm by using M entirely.

5.3 Tailored Dissection

Assume, we run and our available memory is not fully used. Recall that collects single-solutions while iterating an outmost loop over some constraint. In a nutshell, access to additional memory allows us to increase the size of lists \(L_{c_i},\ldots ,L_1\), where \(N=|L_{c_i}|+ \ldots + |L_1|\). Thereby, the number of equally-split single-solutions of a \(c\)-sum-problem instance increases significantly beyond N. As it suffices to identify (roughly) N single-solutions, we may prune the outmost loop of to recover N (rather than all) equally-split single-solutions.

Yet, examining a fraction of the solution space only leads to recovering a respective fraction of single-solutions if the latter are distributed sufficiently uniformly.Footnote 1

Let us briefly recall that the total number of existing single-solutions taken over the initial choice of input lists is close to a binomial distribution under the Independence Heuristic. This allowed us to show that succeeds with high probability as sufficiently many single-solutions exist with high probability.

Now, let us denote the random variable of the number of single-solutions gained in the \(j^\mathrm {th}\) iteration of the outmost loop of by \(Z_j\). In order to prove that a certain number of iterations is already sufficient to collect enough single-solutions with high probability we require information on the distribution of sums of \(Z_j\). However, as we show shortly, already \(Z_j\) is distributed rather awkwardly and it seems to be a challenging task to obtain Chernoff-style results ensuring that the sum of \(Z_j\) does not fall too short from its expectation with high probability. In turn, we resort to the following heuristic.

Tailoring Heuristic. Let \(c_i\in \mathsf {magic}\). Let random variable \(Z_j\) denote the number of single-solutions gained in the \(j^\mathrm {th}\) iteration of the outmost for-loop of taken over the initial choice of input lists. We heuristically assume that there exists a polynomial function \(\mathsf {poly}(\lambda )\), such that for all we have

(8)

In particular, it follows from Equation (8) that for all \(\iota \le 2^{i\lambda }\) we have

$$\begin{aligned} \Pr \left[ \sum _{j=1}^{\iota \cdot \mathsf {poly}(\lambda )} Z_j \ge \mathbb {E}\bigg [\sum _{j=1}^\iota Z_j\bigg ]\right] \ge 1-\mathsf {negl}(\lambda ). \end{aligned}$$

That is, we can compensate the deviation of the sums of \(Z_j\) below its expectation by iterating \(\mathsf {poly}(\lambda )\) more often.Footnote 2

As for the Independence Heuristic we ran experiments to verify the Tailoring Heuristic (Sect. 6.2).

figure f

We stress that the only syntactical difference of compared to (Algorithm 5) is the increase of the size of list L from N to \(N^\alpha \) for \(\alpha \ge 1\) and the halt condition added as a comment in line 2.

Lemma 5.8

(Tailored Dissection). Let \(b\in \mathbb {N}, c_i \in \mathsf {magic}\) and \(\alpha \in [1,\frac{c_{i}-1}{c_{i-1}}]\). For \(t\in \mathbb {F}_2^b\) let (Lt) be an instance of the for \(|L|=N^\alpha \) whereby \(N:=2^{\frac{b+c_i\cdot \log c_i+1}{c_i-1}}\). Under the Independence and Tailoring Heuristic \(\mathtt {tailored}\)-\(\mathtt {c}_{\mathtt {i}}\)- solves the with overwhelming probability in expected time \(T=\tilde{\mathcal {O}}\left( N^{c_{i-1}-i\cdot (\alpha -1)}\right) \) and expected memory \(M=\tilde{\mathcal {O}}\left( N^{\alpha }\right) \).

Proof

See full version [13].

The complexities of \(\mathtt {tailored}\)-\(\mathtt {c}_{\mathtt {i}}\)- are given in Table 1 where we simplified \(\alpha \)’s upper bound using \(c_{i-1}\approx c_i-\sqrt{2c_i}\). Let denote , where the \(c\)-sum-problem is solved via .

Theorem 5.2

(Tailored-Dissection Trade-Off). Let \(c_i\in \mathsf {magic}\) and further \(\alpha \in [1,\frac{c_{i}-1}{c_{i-1}}]\), \(\varepsilon >0\) and \(k\in \mathbb {N}\) sufficiently large. Under the Independence and Tailoring Heuristic solves the \(\mathtt {LPN}_{k}\) problem with overwhelming success probability in time \(T=2^{\vartheta (1+\varepsilon )}\), using \(M=2^{\mu (1+\varepsilon )}\) memory and samples, where

$$\begin{aligned} \vartheta =\left( 1-\frac{\alpha \cdot i}{c_i-1}\right) \cdot \log c_{i} \cdot \frac{k}{\log k},\quad \mu = \frac{\alpha \cdot \log c_i}{c_i-1}\cdot \frac{k}{\log k}. \end{aligned}$$

Proof

See full version [13].

Intuitively, it is clear how to choose optimal parameters for for any given amount of memory M: Find the minimal \(c_i\in \mathsf {magic}\) such that uses at most memory M. Then, resort to using memory M entirely by choosing the right \(\alpha \). The trade-off achieved via this approach is given in Fig. 1 on page 5 (line segments). A formal justification of the approach is contained in the full version [13].

Fig. 3.
figure 3

Distribution of the number of single-solutions over successive runs of . Under the Independence Heuristic this distribution is close to \(\mathrm {Bin}_{{(N/c_i)}^{c_i},2^{-b}}\). All parameters are given in Table 2.

6 Experimental Verification of Heuristics

We present experimental results to verify our Independence Heuristic as well as our Tailoring Heuristic.

6.1 Experiments for the Independence Heuristic

We tested the Independence Heuristic for \(c_i\in \{4,7\}\). We iteratively monitored the number of single-solutions found after a run of on successive stripes starting with a list of size N. For each \(c_i\) we repeatedly called on three stripes. After each call we stored the number of single-solutions found. To analyze the impact of dependencies amongst the list elements—rather than influences due to variations in the number of single-solutions found—we pruned lists of more than N single-solutions down to N before starting the next run, and discarded lists where less than N single-solutions were found.

Note that during the first run all list elements are independent and uniformly distributed, even if their c-sums are not. While the list elements remain uniform on a “fresh” stripe in subsequent runs of they are not independent anymore what could affect (besides c-sums being dependent) the distribution of existing single-solutions even further. Under the Independence Heuristic the number of single-solutions found is close to being \(\text {Bin}_{{(N/c_i)}^{c_i},2^{-b}}\) distributed after any run of .

Our experiments with parameters as given in Table 2 lead to the plots given in Fig. 3. The measured relative frequencies are given by points, while the continuous plots are the benchmark distributions \(\text {Bin}_{{(N/c_i)}^{c_i},2^{-b}}\) for our Independence Heuristic.

Table 2. Parameters for testing the Independence Heuristic. Parameter a denotes the number of stripes of width b.

We see from Fig. 3 that the distribution of the output list-size is close to their benchmark, even after three iterations of the subroutine, where already \(4^3=64\)-sums (resp. \(7^3=343\)-sums) haven been built.

We also used the Independence Heuristic in Lemma 5.8 to show that the random variables \(Z_j\) of the number of single-solutions gained in the \(j^\mathrm {th}\) iteration over a constraint is close to being binomially \(\text {Bin}_{x\cdot y,2^{-(b-i\lambda )}}\) distributed, where \(x \sim \text {Bin}_{2^{(i+1)\cdot \lambda },2^{-i\lambda }}\) and \(y \sim \text {Bin}_{2^{\lambda c_{i-1}},2^{-i\lambda }}\). In order to verify this experimentally, we computed several instances with parameter set

$$\begin{aligned} i=1\ (c_1=4)\text {, }b=25\text {, }a=3,\ \ N=8192. \end{aligned}$$

Each time we performed three consecutive runs of on successive stripes and stored the number of single-solutions obtained per constraint after each iteration. The results are given in Fig. 4. Again, the obtained relative frequencies accurately match the benchmark curve \(\text {Bin}_{x\cdot y,2^{-(b-i\lambda )}}\).

Fig. 4.
figure 4

Distribution of the number of single-solutions per constraint in the first (Fig. 4a), second (Fig. 4b) and third run (Fig. 4c) of .

6.2 Experiments on the Tailoring Heuristic

As in the previous subsection, let \(Z_j\) be the number of single-solutions obtained per constraint iteration in . In Tailored Dissection it is required that the sum of these random variables is close to its expectation. The Tailoring Heuristic states that this is true with overwhelming probability, if we slightly increase the expected number of required iterations by a polynomial factor.

Table 3. Parameters and results for testing the Tailoring Heuristic for , \(N=8192\).
Fig. 5.
figure 5

Required number of iterations to collect at least N single-solutions. \(N=8192\).

To test the Tailoring Heuristic we ran (without tailoring) on three stripes (\(a=3\)) with \(N=8192\) and varying choice of b. We summed the numbers \(Z_j\) of single-solutions found per randomized constraint \(\tau _j\) during the last run in a list, until their sum exceeded N. The results can be found in Table 3 and Fig. 5.

We see in Table 3 that the experimentally required numbers of iteration very accurately match their theoretical predictions (that were computed under the Independence Heuristic). Moreover, we experimentally need only a small factor to achieve a 99% confidence interval, even for low expectations. This means that the distribution has small variance and sharply concentrates around its mean, as can also been seen in Fig. 5. This all supports the validity of our Tailoring Heuristic for the analysis of Tailored Dissection BKW.