Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The Learning With Errors (\(\mathsf {LWE}\)) problem has been an important problem in cryptography since its introduction by Regev in [34]. Many cryptosystems have been proven secure assuming the hardness of this problem, including Fully Homomorphic Encryption schemes [11, 16]. The decision version of the problem can be described as follows: given m samples of the form \((\mathbf {\mathrm {a}},b)\in (\mathbb {Z}_q)^n\times \mathbb {Z}_q\), where \(\mathbf {\mathrm {a}}\) are uniformy distributed in \((\mathbb {Z}_q)^n\), distinguish whether b is uniformly chosen in \(\mathbb {Z}_q\) or is equal to \(\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {s}}\rangle +e\) for a fixed secret \(\mathbf {\mathrm {s}} \in (\mathbb {Z}_q)^n\) and e a noise value in \(\mathbb {Z}_q\) chosen according to some probability distribution. Typically, the noise is sampled from some distribution concentrated on small numbers, such as a discrete Gaussian distribution with standard deviation \(\alpha q\) for \(\alpha =o(1)\). In the search version of the problem, the goal is to recover \(\mathbf {\mathrm {s}}\) given the promise that the sample instances come from the latter distribution. Initially, Regev showed that if \(\alpha q \ge 2\sqrt{n}\), solving \(\mathsf {LWE}\) on average is at least as hard as approximating lattice problems in the worst case to within \(\tilde{\mathcal {O}}(n/\alpha )\) factors with a quantum algorithm. Peikert shows a classical reduction when the modulus is large \(q\ge 2^n\) in [32]. Finally, in [10], Brakerski et al. prove that solving \(\mathsf {LWE}\) instances with polynomial-size modulus in polynomial time implies an efficient solution to \(\mathsf {GapSVP}\).

There are basically three approaches to solving \(\mathsf {LWE}\): the first relies on lattice reduction techniques such as the \(\mathsf {LLL}\) [23] algorithm and further improvements [12] as exposed in [25, 26]; the second uses combinatorial techniques [9, 35]; and the third uses algebraic techniques [6]. According to Regev in [1], the best known algorithm to solve \(\mathsf {LWE}\) is the algorithm by Blum, Kalai and Wasserman in [9], originally proposed to solve the Learning Parities with Noise (\(\mathsf {LPN}\)) problem, which can be viewed as a special case of \(\mathsf {LWE}\) where \(q=2\). The time and memory requirements of this algorithm are both exponential for \(\mathsf {LWE}\) and subexponential for \(\mathsf {LPN}\) in \(2^{\mathcal {O}(n/\log n)}\). During the first stage of the algorithm, the dimension of \(\mathbf {\mathrm {a}}\) is reduced, at the cost of a (controlled) decrease of the bias of b. During the second stage, the algorithm distinguishes between \(\mathsf {LWE}\) and uniform by evaluating the bias.

Since the introduction of \(\mathsf {LWE}\), some variants of the problem have been proposed in order to build more efficient cryptosystems. Some of the most interesting variants are \(\mathsf {Ring\text {-}LWE}\) by Lyubashevsky, Peikert and Regev in [29], which aims to reduce the space of the public key using cyclic samples; and the cryptosystem by Döttling and Müller-Quade [14], which uses short secret and error. In 2013, Micciancio and Peikert [30] as well as Brakerski et al. [10] proposed a binary version of the \(\mathsf {LWE}\) problem and obtained a hardness result.

Related Work. Albrecht et al. have presented an analysis of the BKW algorithm as applied to \(\mathsf {LWE}\) in [3, 4]. It has been recently revisited by Duc et al., who use a multi-dimensional FFT in the second stage of the algorithm [15]. However, the main bottleneck is the first BKW step and since the proposed algorithms do not improve this stage, the overall asymptotic complexity is unchanged.

In the case of the \(\mathsf {BinaryLWE}\) variant, where the error and secret are binary (or sufficiently small), Micciancio and Peikert show that solving this problem using \(m=n(1+\mathrm {\varOmega }(1/\log (n)))\) samples is at least as hard as approximating lattice problems in the worst case in dimension \(\varTheta (n/\log (n))\) with approximation factor \(\tilde{\mathcal {O}}(\sqrt{n} q)\). We show in the full version that existing lattice reduction techniques require exponential time. Arora and Ge describe a \(2^{\tilde{\mathcal {O}}(\alpha q)^2}\)-time algorithm when \(q>n\) to solve the \(\mathsf {LWE}\) problem [6]. This leads to a subexponential time algorithm when the error magnitude \(\alpha q\) is less than \(\sqrt{n}\). The idea is to transform this system into a noise-free polynomial system and then use root finding algorithms for multivariate polynomials to solve it, using either relinearization in [6] or Gröbner basis in [2]. In this last work, Albrecht et al. present an algorithm whose time complexity is \(2^{\frac{(\omega +o(1)) n \log \log \log n}{8\log \log n}}\) when the number of samples \(m=(1+o(1))n \log \log n\) is super-linear, where \(\omega < 2.3728\) is the linear algebra constant, under some assumption on the regularity of the polynomial system of equations; and when \(m=\mathcal {O}(n)\), the complexity becomes exponential.

Contribution. Our first contribution is to present in a unified framework the BKW algorithm and all its previous improvements in the binary case [8, 18, 21, 24] and in the general case [4]. We introduce a new quantization step, which generalizes modulus switching [4]. This yields a significant decrease in the constant of the exponential of the complexity for \(\mathsf {LWE}\). Moreover our proof does not require Gaussian noise, and does not rely on unproven independence assumptions. Our algorithm is also able to tackle problems with larger noise.

We then introduce generalizations of the \(\mathsf {BDD}\), \(\mathsf {GapSVP}\) and \(\mathsf {UniqueSVP}\) problems, and prove a reduction from these variants to \(\mathsf {LWE}\). When particular parameters are set, these variants impose that the lattice point of interest (the point of the lattice that the problem essentially asks to locate: for instance, in the case of \(\mathsf {BDD}\), the point of the lattice closest to the target point) lie in the fundamental parallelepiped; or more generally, we ask that the coordinates of this point relative to the basis defined by the input matrix \(\mathbf {\mathrm {A}}\) has small infinity norm, bounded by some value B. For small B, our main algorithm yields a subexponential-time algorithm for these variants of \(\mathsf {BDD}\), \(\mathsf {GapSVP}\) and \(\mathsf {UniqueSVP}\).

Through a reduction to our variant of \(\mathsf {BDD}\), we are then able to solve the subset-sum problem in subexponential time when the density is o(1), and in time \(2^{(\ln 2/2+o(1))n/\log \log n}\) if the density is \(\mathcal {O}(1/\log n)\). This is of independent interest, as existing techniques for density o(1), based on lattice reduction, require exponential time. As a consequence, the cryptosystems of Lyubashevsky, Palacio and Segev at TCC 2010 [28] can be solved in subexponential time.

As another application of our main algorithm, we show that \(\mathsf {BinaryLWE}\) with reasonable noise can be solved in time \(2^{(\ln 2/2+o(1))n/\log \log n}\) instead of \(2^{\varOmega (n)}\); and the same complexity holds for secret of size up to \(2^{\log ^{o(1)} n}\). As a consequence, we can heuristically recover the secret polynomials \(\mathbf {\mathrm {f}},\mathbf {\mathrm {g}}\) of the \(\mathsf {NTRU}\) problem in subexponential time \(2^{(\ln 2/2+o(1))n/\log \log n}\) (without contradicting its security assumption). The heuristic assumption comes from the fact that \(\mathsf {NTRU}\) samples are not random, since they are rotations of each other: the heuristic assumption is that this does not significantly hinder \(\mathsf {BKW}\)-type algorithms. Note that there is a large value hidden in the o(1) term, so that our algorithm does not yield practical attacks for recommended \(\mathsf {NTRU}\) parameters.

Our results are extended to the case where the secret is small with respect to the L2 norm in the full version.

2 Preliminaries

We identify any element of \(\mathbb {Z}/q\mathbb {Z}\) to the smallest of its equivalence class, the positive one in case of tie. Any vector \(\mathbf {\mathrm {x}} \in \big (\mathbb {Z}/q\mathbb {Z}\big )^n\) has an Euclidean norm \(||\mathbf {\mathrm {x}}||=\sqrt{\sum _{i=0}^{n-1} x_i^2}\) and \(||\mathbf {\mathrm {x}}||_{\infty }=\mathrm {max}_i |x_i|\). A matrix \(\mathbf {\mathrm {B}}\) can be Gram-Schmidt orthogonalized in \(\widetilde{\mathbf {\mathrm {B}}}\), and its norm \(||\mathbf {\mathrm {B}}||\) is the maximum of the norm of its columns. We denote by \((\mathbf {\mathrm {x}}|\mathbf {\mathrm {y}})\) the vector obtained as the concatenation of vectors \(\mathbf {\mathrm {x}},\mathbf {\mathrm {y}}\). Let \(\mathbf {\mathrm {I}}\) be the identity matrix and we denote by \(\ln \) the neperian logarithm and \(\log \) the binary logarithm. A lattice is the set of all integer linear combinations \(\mathrm {\Lambda }(\mathbf {\mathrm {b}}_1,\ldots ,\mathbf {\mathrm {b}}_n)=\sum _i \mathbf {\mathrm {b_i}}\cdot x_i\) (where \(x_i \in \mathbb {Z}\)) of a set of linearly independent vectors \(\mathbf {\mathrm {b}}_1,\ldots ,\mathbf {\mathrm {b}}_n\) called the basis of the lattice. If \(\mathbf {\mathrm {B}}=[\mathbf {\mathrm {b}}_1,\ldots ,\mathbf {\mathrm {b}}_n]\) is the matrix basis, lattice vectors can be written as \(\mathbf {\mathrm {B}}\mathbf {\mathrm {x}}\) for \(\mathbf {\mathrm {x}}\in \mathbb {Z}^n\). Its dual \(\mathrm {\Lambda }^*\) is the set of \(\mathbf {\mathrm {x}}\in \mathbb {R}^n\) such that \(\langle \mathbf {\mathrm {x}} , \mathrm {\Lambda } \rangle \subset \mathbb {Z}^n\). We have \(\mathrm {\Lambda }^{**}=\mathrm {\Lambda }\). We borrow Bleichenbacher’s definition of bias [31].

Definition 1

The bias of a probability distribution \(\phi \) over \(\mathbb {Z}/q\mathbb {Z}\) is

$$\begin{aligned} \mathbb {E}_{x\sim \phi }[\exp (2i\pi x/q)]. \end{aligned}$$

This definition extends the usual definition of the bias of a coin in \(\mathbb {Z}/2\mathbb {Z}\): it preserves the fact that any distribution with bias b can be distinguished from uniform with constant probability using \(\mathrm {\varOmega }(1/b^{2})\) samples, as a consequence of Hoeffding’s inequality; moreover the bias of the sum of two independent variable is still the product of their biases. We also have the following simple lemma:

Lemma 1

The bias of the Gaussian distribution of mean 0 and standard deviation \(q\alpha \) is \(\exp (-2\pi ^2 \alpha ^2)\).

Proof

The bias is the value of the Fourier transform at \(-1/q\).   \(\square \)

We introduce a non standard definition for the \(\mathsf {LWE}\) problem. However as a consequence of Lemma 1, this new definition naturally extends the usual Gaussian case (as well as its standard extensions such as the bounded noise variant [10, Definition 2.14]), and it will prove easier to work with.

Definition 2

Let \(n\ge 0\) and \(q \ge 2\) be integers. Given parameters \(\alpha \) and \(\epsilon \), the \(\mathsf {LWE}\) distribution is, for \(\mathbf {\mathrm {s}} \in (\mathbb {Z}/q\mathbb {Z})^n\), a distribution on pairs \((\mathbf {\mathrm {a}},b)\in (\mathbb {Z}/q\mathbb {Z})^n \times (\mathbb {R}/q\mathbb {Z})\) such that \(\mathbf {\mathrm {a}}\) is sampled uniformly, and for all \(\mathbf {\mathrm {a}}\),

$$\begin{aligned} |\mathbb {E}[\exp (2i\pi (\langle \mathbf {a} , \mathbf {s} \rangle -b)/q)|\mathbf {\mathrm {a}}]\exp (\alpha '^2)-1|\le \epsilon \end{aligned}$$

for some universal \(\alpha '\le \alpha \).

For convenience, we define \(\beta =\sqrt{n/2}/\alpha \). In the remainder, \(\alpha \) is called the noise parameterFootnote 1, and \(\epsilon \) the distortion parameter. Also, we say that a \(\mathsf {LWE}\) distribution has a noise distribution \(\phi \) if b is distributed as \(\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {s}}\rangle +\phi \).

Definition 3

The \(\mathsf {Decision\text {-}LWE}\) problem is to distinguish a \(\mathsf {LWE}\) distribution from the uniform distribution over \((\mathbf {\mathrm {a}},b)\). The \(\mathsf {Search\text {-}LWE}\) problem is, given samples from a \(\mathsf {LWE}\) distribution, to find \(\mathbf {\mathrm {s}}\).

Definition 4

The real \(\lambda _i\) is the radius of the smallest ball, centered in \(\mathbf {0}\), such that it contains i vectors of the lattice \(\mathrm {\Lambda }\) which are linearly independent.

We define \(\rho _s(\mathbf {\mathrm {x}})=\exp (-\pi ||\mathbf {\mathrm {x}}||^2/s^2)\) and \(\rho _s(S)=\sum _{\mathbf {\mathrm {x}} \in S} \rho _s(\mathbf {\mathrm {x}})\) (and similarly for other functions). The discrete Gaussian distribution \(D_{E,s}\) over a set E and of parameter s is such that the probability of \(D_{E,s}(\mathbf {\mathrm {x}})\) of drawing \(\mathbf {\mathrm {x}} \in E\) is equal to \(\rho _s(\mathbf {\mathrm {x}})/\rho _s(E)\). To simplify notation, we will denote by \(D_E\) the distribution \(D_{E,1}\).

Definition 5

The smoothing parameter \(\eta _\epsilon \) of the lattice \(\mathrm {\Lambda }\) is the smallest s such that \(\rho _{1/s}(\mathrm {\Lambda }^*)=1+\epsilon \).

Now, we will generalize the BDD, UniqueSVP and GapSVP problems by using another parameter B that bounds the target lattice vector. For \(B=2^n\), we recover the usual definitions if the input matrix is reduced.

Definition 6

The \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\) (resp. \(\mathsf {BDD}_{B,\beta }^{||.||}\) \()\) problem is, given a basis \(\mathbf {\mathrm {A}}\) of the lattice \(\mathrm {\Lambda }\), and a point \(\mathbf {\mathrm {x}}\) such that \(||\mathbf {\mathrm {As}}-\mathbf {\mathrm {x}}||\le \lambda _1/\beta < \lambda _1/2\) and \(||\mathbf {\mathrm {s}}||_{\infty }\le B\) (resp. \(||\mathbf {\mathrm {s}}||\le B\) \()\), to find \(\mathbf {\mathrm {s}}\).

Definition 7

The \(\mathsf {UniqueSVP}_{B,\beta }^{||.||_\infty }\) (resp. \(\mathsf {UniqueSVP}_{B,\beta }^{||.||}\) \()\) problem is, given a basis \(\mathbf {\mathrm {A}}\) of the lattice \(\mathrm {\Lambda }\), such that \(\lambda _2/\lambda _1\ge \beta \) and there exists a vector \(\mathbf {\mathrm {s}}\) such that \(||\mathbf {\mathrm {As}}||=\lambda _1\) with \(||\mathbf {\mathrm {s}}||_{\infty }\le B\) (resp. \(||\mathbf {\mathrm {s}}||\le B\) \()\), to find \(\mathbf {\mathrm {s}}\).

Definition 8

The \(\mathsf {GapSVP}_{B,\beta }^{||.||_\infty }\) (resp. \(\mathsf {GapSVP}_{B,\beta }^{||.||}\) \()\) problem is, given a basis \(\mathbf {\mathrm {A}}\) of the lattice \(\mathrm {\Lambda }\) to distinguish between \(\lambda _1(\mathrm {\Lambda }) \ge \beta \) and if there exists \(\mathbf {\mathrm {s}} \ne \mathbf {0}\) such that \(||\mathbf {\mathrm {s}}||_{\infty }\le B\) (resp. \(||\mathbf {\mathrm {s}}||\le B\) \()\) and \(||\mathbf {\mathrm {As}}||\le 1\).

Definition 9

Given two probability distributions P and Q on a finite set S, the Kullback-Leibler (or \(\mathsf {KL}\)) divergence between P and Q is

$$D_{\mathsf {KL}}(P||Q)=\sum _{x\in S} \ln \bigg (\frac{P(x)}{Q(x)}\bigg )P(x) \ \text{ with } \ \ln (x/0)=+\infty \ \text{ if } \ x>0.$$

The following two lemmata are proven in [33]:

Lemma 2

Let P and Q be two distributions over S, such that for all x, \(|P(x)-Q(x)|\le \delta (x) P(x)\) with \(\delta (x) \le 1/4\). Then:

$$\begin{aligned} D_{\mathsf {KL}}(P||Q)\le 2\sum _{x\in S}\delta (x)^2P(x). \end{aligned}$$

Lemma 3

Let A be an algorithm which takes as input m samples of S and outputs a bit. Let x (resp. y) be the probability that it returns 1 when the input is sampled from P (resp. Q). Then:

$$\begin{aligned} |x-y|\le \sqrt{mD_{\mathsf {KL}}(P||Q)/2}. \end{aligned}$$

Finally, we say that an algorithm has a negligible probability of failure if its probability of failure is \(2^{-\mathrm {\varOmega }(n)}\).Footnote 2

2.1 Secret-Error Switching

At a small cost in samples, it is possible to reduce any \(\mathsf {LWE}\) instance to an instance where the secret follows the same distribution as the error [5, 10].

Theorem 1

Given an oracle that solves \(\mathsf {LWE}\) with m samples in time t with the secret coming from the rounded error distribution, it is possible to solve \(\mathsf {LWE}\) with \(m+\mathcal {O}(n\log \log q)\) samples with the same error distribution (and any distribution on the secret) in time \(t + \mathcal {O}(mn^2+(n\log \log q)^3)\), with negligible probability of failure.

Furthermore, if q is prime, we lose \(n+k\) samples with probability of failure bounded by \(q^{-1-k}\).

Proof

First, select an invertible matrix \(\mathbf {\mathrm {A}}\) from the vectorial part of \(\mathcal {O}(n\log \log q)\) samples in time \(\mathcal {O}((n\log \log q)^3)\) [10, Claim 2.13].

Let \(\mathbf {b}\) be the corresponding rounded noisy dot products. Let \(\mathbf {\mathrm {s}}\) be the \(\mathsf {LWE}\) secret and \(\mathbf {\mathrm {e}}\) such that \(\mathbf {\mathrm {As}}+\mathbf {\mathrm {e}}=\mathbf {\mathrm {b}}\). Then the subsequent m samples are transformed in the following way. For each new sample \((\mathbf {\mathrm {a'}},b')\) with \(b'=\langle \mathbf {\mathrm {a'}},\mathbf {\mathrm {s}} \rangle + e'\), we give the sample \((-^{t}\mathbf {\mathrm {A}}^{-1}\mathbf {\mathrm {a'}},b'-\langle ^{t}\mathbf {\mathrm {A}}^{-1}\mathbf {\mathrm {a'}}, \mathbf {\mathrm {b}} \rangle )\) to our \(\mathsf {LWE}\) oracle.

Clearly, the vectorial part of the new samples remains uniform and since

$$\begin{aligned} b'-\langle ^{t}\mathbf {\mathrm {A}}^{-1}\mathbf {\mathrm {a'}} , \mathbf {\mathrm {b}} \rangle = \langle -^{t}\mathbf {\mathrm {A}}^{-1}\mathbf {\mathrm {a'}} , \mathbf {\mathrm {b}}-\mathbf {\mathrm {As}} \rangle + b'-\langle \mathbf {\mathrm {a'}} ,\mathbf {\mathrm {s}} \rangle = \langle -^{t}\mathbf {\mathrm {A}}^{-1}\mathbf {\mathrm {a'}} , \mathbf {\mathrm {e}} \rangle + e' \end{aligned}$$

the new errors follow the same distribution as the original, and the new secret is \(\mathbf {\mathrm {e}}\). Hence the oracle outputs \(\mathbf {\mathrm {e}}\) in time t, and we can recover \(\mathbf {\mathrm {s}}\) as \(\mathbf {\mathrm {s}}=\mathbf {\mathrm {A}}^{-1}(\mathbf {\mathrm {b}}-\mathbf {\mathrm {e}})\).

If q is prime, the probability that the \(n+k\) first samples are in some hyperplane is bounded by \(q^{n-1}q^{-n-k}=q^{-1-k}\).   \(\square \)

2.2 Low Dimension Algorithms

Our main algorithm will return samples from a LWE distribution, while the bias decreases. We describe two fast algorithms when the dimension is small enough.

Theorem 2

If \(n=0\) and \(m=k/b^2\), with b smaller than the real part of the bias, the \(\mathsf {Decision\text {-}LWE}\) problem can be solved with advantage \(1-2^{-\mathrm {\varOmega }(k)}\) in time \(\mathcal {O}(m)\).

Proof

The algorithm Distinguish computes \(x=\frac{1}{m}\sum _{i=0}^{m-1} \cos (2i\pi b_i/q)\) and returns the boolean \(x\ge b/2\). If we have a uniform distribution then the average of x is 0, else it is larger than b / 2. The Hoeffding inequality shows that the probability of \(|x-\mathbb {E}[x]|\ge b/2\) is \(2^{-k/8}\), which gives the result.    \(\square \)

figure a

Lemma 4

For all \(\mathbf {\mathrm {s}}\ne \mathbf {0}\), if \(\mathbf {\mathrm {a}}\) is sampled uniformly, \(\mathbb {E}[\exp (2 i \pi \langle \mathbf {\mathrm {a}} , \mathbf {\mathrm {s}}\rangle /q)]=0\).

Proof

Multiplication by \(s_0\) in \(\mathbb {Z}_q\) is a \(\gcd (s_0,q)\)-to-one map because it is a group morphism, therefore \(a_0s_0\) is uniform over \(\gcd (s_0,q)\mathbb {Z}_q\). Thus, by using \(k=\gcd (q,s_0,\dots ,s_{n-1}) < q\), \(\langle \mathbf {\mathrm {a}} , \mathbf {\mathrm {s}} \rangle \) is distributed uniformly over \(k\mathbb {Z}_q\) so

$$ \mathbb {E}[\exp (2 i \pi \langle \mathbf {\mathrm {a}} , \mathbf {\mathrm {s}} \rangle /q)]=\frac{q}{k}\sum _{j=0}^{q/k-1} \exp (2i\pi jk/q)=0. $$

   \(\square \)

Theorem 3

The algorithm FindSecret, when given \(m>(8n\log q+k)/b^2\) samples from a \(\mathsf {LWE}\) problem with bias whose real part is superior to b returns the correct secret in time \(\mathcal {O}(m+n\log ^2(q)q^n)\) except with probability \(2^{-\mathrm {\varOmega }(k)}\).

Proof

The fast Fourier transform needs \(\mathcal {O}(nq^n)\) operations on numbers of bit size \(\mathcal {O}(\log (q))\). The Hoeffding inequality shows that the difference between \(t[\mathbf {\mathrm {s'}}]\) and \(\mathbb {E}[\exp (2i\pi (b-\langle \mathbf {\mathrm {a}} , \mathbf {\mathrm {s'}} \rangle )/q)]\) is at most b / 2 except with probability at most \(2\exp (-mb^2/2)\). Consequently, it holds for all \(\mathbf {\mathrm {s'}}\) except with probability at most \(2q^n\exp (-mb^2/2)=2^{-\mathrm {\varOmega }(k)}\) using the union bound. Then \(t[\mathbf {\mathrm {s}}]\ge b-b/2=b/2\) and for all \(\mathbf {\mathrm {s'}} \ne \mathbf {\mathrm {s}}\), \(t[\mathbf {\mathrm {s'}}] < b/2\) so the algorithm returns \(\mathbf {\mathrm {s}}\).    \(\square \)

3 Main Algorithm

In this section, we present our main algorithm, prove its asymptotical complexity, and present practical results in dimension \(n=128\).

3.1 Rationale

A natural idea in order to distinguish between an instance of \(\mathsf {LWE}\) (or \(\mathsf {LPN}\)) and a uniform distribution is to select some k samples that add up to zero, yielding a new sample of the form \((\mathbf {0},e)\). It is then enough to distinguish between e and a uniform variable. However, if \(\delta \) is the bias of the error in the original samples, the new error e has bias \(\delta ^{k}\), hence roughly \(\delta ^{-2k}\) samples are necessary to distinguish it from uniform. Thus it is crucial that k be as small a possible.

The idea of the algorithm by Blum, Kalai and Wasserman BKW is to perform “blockwise” Gaussian elimination. The n coordinates are divided into k blocks of length \(b = n/k\). Then, samples that are equal on the first b coordinates are substracted together to produce new samples that are zero on the first block. This process is iterated over each consecutive block. Eventually samples of the form \((\mathbf {0},e)\) are obtained.

Each of these samples ultimately results from the addition of \(2^{k}\) starting samples, so k should be at most \(\mathcal {O}(\log (n))\) for the algorithm to make sense. On the other hand \(\mathrm {\varOmega }(q^{b})\) data are clearly required at each step in order to generate enough collisions on b consecutive coordinates of a block. This naturally results in a complexity roughly \(2^{(1 + o(1))n/\log (n)}\) in the original algorithm for \(\mathsf {LPN}\). This algorithm was later adapted to \(\mathsf {LWE}\) in [3], and then improved in [4].

The idea of the latter improvement is to use so-called “lazy modulus switching”. Instead of finding two vectors that are equal on a given block in order to generate a new vector that is zero on the block, one uses vectors that are merely close to each other. This may be seen as performing addition modulo p instead of q for some \(p < q\), by rounding every value \(x \in \mathbb {Z}_{q}\) to the value nearest xp / q in \(\mathbb {Z}_{p}\). Thus at each step of the algorithm, instead of generating vectors that are zero on each block, small vectors are produced. This introduces a new “rounding” error term, but essentially reduces the complexity from roughly \(q^{b}\) to \(p^{b}\). Balancing the new error term with this decrease in complexity results in a significant improvement.

However it may be observed that this rounding error is much more costly for the first few blocks than the last ones. Indeed samples produced after, say, one iteration step are bound to be added together \(2^{a-1}\) times to yield the final samples, resulting in a corresponding blowup of the rounding error. By contrast, later terms will undergo less additions. Thus it makes sense to allow for progressively coarser approximations (i.e. decreasing the modulus) at each step. On the other hand, to maintain comparable data requirements to find collisions on each block, the decrease in modulus is compensated by progressively longer blocks.

What we propose here is a more general view of the BKW algorithm that allows for this improvement, while giving a clear view of the different complexity costs incurred by various choice of parameters. Balancing these terms is the key to finding an optimal complexity. We forego the “modulus switching” point of view entirely, while retaining its core ideas. The resulting algorithm generalizes several variants of BKW, and will be later applied in a variety of settings.

3.2 Quantization

The goal of quantization is to associate to each point of \(\mathbb {R}^k\) a center from a small set, such that the expectancy of the distance between a point and its center is small. We will then be able to produce small vectors by substracting vectors associated to the same center.

Modulus switching amounts to a simple quantizer which rounds every coordinate to the nearest multiple of some constant. Our proven algorithm uses a similar quantizer, except the constant depends on the index of the coordinate.

It is possible to decrease the average distance from a point to its center by a constant factor for large moduli [17], but doing so would complicate our proof without improving the leading term of the complexity. When the modulus is small, it might be worthwhile to use error-correcting codes as in [18].

3.3 Main Algorithm

Let us denote by \(\mathcal {L}_{0}\) the set of starting samples, and \(\mathcal {L}_i\) the sample list after i reduction steps. The numbers \(d_{0} = 0 \le d_{1} \le \dots \le d_{k} = n\) partition the n coordinates of sample vectors into k buckets. Let \(\mathbf {\mathrm {D}} = (D_{0},\dots ,D_{k-1})\) be the vector of quantization coefficients associated to each bucket.

figure b

In order to allow for a uniform presentation of the BKW algorithm, applicable to different settings, we do not assume a specific distribution on the secret. Instead, we assume there exists some known \(\mathbf {\mathrm {B}} = (B_{0},\dots ,B_{n-1})\) such that \(\sum _i (s_i/B_i)^2 \le n\). Note that this is in particular true if \(|s_{i}| \le B_{i}\). We shall see how to adapt this to the standard Gaussian case later on. Without loss of generality, \(\mathbf {\mathrm {B}}\) is non increasing.

There are a phases in our reduction: in the i-th phase, the coordinates from \(d_i\) to \(d_{i+1}\) are reduced. We define \(m=|\mathcal {L}_0|\).

Lemma 5

Solve terminates in time \(\mathcal {O}(mn\log q)\).

Proof

The Reduce algorithm clearly runs in time \(\mathcal {O}(|\mathcal {L}|n \log q)\). Moreover, \(|\mathcal {L}_{i+1}|\le |\mathcal {L}_i|/2\) so that the total running time of Solve is \(\mathcal {O}(n\log q\sum _{i=0}^k m/2^i)=\mathcal {O}(mn\log q)\).   \(\square \)

Lemma 6

Write \(\mathcal {L}'_{i}\) for the samples of \(\mathcal {L}_{i}\) where the first \(d_{i}\) coordinates of each sample vector have been truncated. Assume \(|s_j|D_{i}<0.23q\) for all \(d_{i} \le j < d_{i+1}\). If \(\mathcal {L}'_{i}\) is sampled according to the \(\mathsf {LWE}\) distribution of secret \(\mathbf {\mathrm {s}}\) and noise parameters \(\alpha \) and \(\epsilon \le 1\), then \(\mathcal {L}'_{i+1}\) is sampled according to the \(\mathsf {LWE}\) distribution of the truncated secret with parameters:

$$\begin{aligned} \alpha '^2=2\alpha ^2+4\pi ^2\sum _{j=d_i}^{d_{i+1}-1}(s_jD_{i}/q)^2\quad \text {and }\quad \epsilon '=3\epsilon . \end{aligned}$$

On the other hand, if \(D_i=1\), then \(\alpha '^2=2\alpha ^2\).

Proof

The independence of the outputted samples and the uniformity of their vectorial part are clear. Let \((\mathbf {\mathrm {a}},b)\) be a sample obtained by substracting two samples from \(\mathcal {L}_{i}\). For \(\mathbf {\mathrm {a'}}\) the vectorial part of a sample, define \(\epsilon (\mathbf {\mathrm {a'}})\) such that \(\mathbb {E}[\exp (2i\pi (\langle \mathbf {\mathrm {a'}},\mathbf {\mathrm {s}} \rangle -b')/q)|\mathbf {\mathrm {a'}}]=(1+\epsilon (\mathbf {\mathrm {a'}}))\exp (-\alpha ^2)\). By definition of LWE, \(|\epsilon (\mathbf {\mathrm {a'}})| \le \epsilon \), and by independence:

$$\begin{aligned} \mathbb {E}[\exp (2i\pi (\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {s}}\rangle -b)/q)|\mathbf {\mathrm {a}}]=\exp (-2\alpha ^2)\mathbb {E}_{\mathbf {\mathrm {a'}}-\mathbf {\mathrm {a''}}=\mathbf {\mathrm {a}}}[(1+\epsilon (\mathbf {\mathrm {a'}}))(1+\epsilon (\mathbf {\mathrm {a''}}))], \end{aligned}$$

with \(|\mathbb {E}_{\mathbf {\mathrm {a'}}-\mathbf {\mathrm {a''}}=\mathbf {\mathrm {a}}}[(1+\epsilon (\mathbf {\mathrm {a'}}))(1+\epsilon (\mathbf {\mathrm {a''}}))]-1|\le 3\epsilon \).

Thus we computed the noise corresponding to adding two samples of \(\mathcal {L}_{i}\). To get the noise for a sample from \(\mathcal {L}_{i+1}\), it remains to truncate coordinates from \(d_{i}\) to \(d_{i+1}\). A straightforward induction on the coordinates shows that this noise is:

$$\begin{aligned} \exp (-2\alpha ^2)\mathbb {E}_{\mathbf {\mathrm {a'}}-\mathbf {\mathrm {a''}}=\mathbf {\mathrm {a}}}[(1+\epsilon (\mathbf {\mathrm {a'}}))(1+\epsilon (\mathbf {\mathrm {a''}}))]\prod _{j=d_{i}}^{d_{i+1}-1}\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]. \end{aligned}$$

Indeed, if we denote by \(\mathbf {\mathrm {a}}^{(j)}\) the vector \(\mathbf {\mathrm {a}}\) where the first j coordinates are truncated and \(\alpha _j\) the noise parameter of \(\mathbf {\mathrm {a}}^{(j)}\), we have:

$$\begin{aligned}&|\mathbb {E}[\exp (2i\pi (\langle \mathbf {\mathrm {a}}^{(j+1)},\mathbf {\mathrm {s}}^{(j+1)} \rangle -b)/q)|\mathbf {\mathrm {a}}^{(j+1)}]-\exp (-\alpha _n^2)\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]|\\ =\;&|\mathbb {E}[\exp (-2i\pi \mathbf {a}_j\mathbf {s}_j/q)(\exp (2i\pi (\langle \mathbf {\mathrm {a}}^{(j)},\mathbf {\mathrm {s}}^{(j)} \rangle -b)/q)-\exp (-\alpha _j^2))]|\\ \le \;&\epsilon ' \exp (-\alpha _j^2)\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]. \end{aligned}$$

It remains to compute \(\mathbb {E}[\exp (2i\pi \mathbf {a}_j\mathbf {s}_j/q)]\) for \(d_{i} \le j<d_{i+1}\). Let \(D = D_{i}\). The distribution of \(\mathbf {a}_j\) is even, so \(\mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j)]\) is real. Furthermore, since \(|\mathbf {a}_j|\le D\),

$$\begin{aligned} \mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j/q)]\ge \cos (2\pi \mathbf {s}_jD/q) . \end{aligned}$$

Assuming \(|\mathbf {s}_j|D<0.23q\), simple function analysis shows that

$$\begin{aligned} \mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j/q)]\ge \exp (-4\pi ^2 \mathbf {s}_j^2D^2/q^2). \end{aligned}$$

On the other hand, if \(D_i=1\) then \(\mathbf {a}_j=0\) and \(\mathbb {E}[\exp (2i\pi \mathbf {a}_j \mathbf {s}_j/q)]=1\).   \(\square \)

Finding optimal parameters for BKW amounts to balancing various costs: the baseline number of samples required so that the final list \(\mathcal {L}_{k}\) is non-empty, and the additional factor due to the need to distinguish the final error bias. This final bias itself comes both from the blowup of the original error bias by the BKW additions, and the “rounding errors” due to quantization. Balancing these costs essentially means solving a system.

For this purpose, it is convenient to set the overall target complexity as \(2^{n(x + o(1))}\) for some x to be determined. The following auxiliary lemma essentially gives optimal values for the parameters of Solve assuming a suitable value of x. The actual value of x will be decided later on.

Lemma 7

Pick some value x (dependent on \(\mathsf {LWE}\) parameters). Choose:

$$\begin{aligned} k&\le \bigg \lfloor \log \bigg (\frac{nx}{6\alpha ^2}\bigg ) \bigg \rfloor \quad&m&= n2^k2^{nx}\\ D_i&\le \frac{q\sqrt{x/6}}{\pi B_{d_i}2^{(a-i+1)/2}}\quad&d_{i+1}&= \min \bigg (d_{i} + \bigg \lfloor \frac{nx}{\log (1+q/D_i)} \bigg \rfloor , n\bigg ). \end{aligned}$$

Assume \(d_k = n\) and \(\epsilon \le 1/(\beta ^{2}x)^{\log 3}\), and for all i and \(d_i\le j < d_{i+1}\), \(|s_j|D_i<0.23q\). Solve runs in time \(\mathcal {O}(mn)\) with negligible failure probability.

Proof

Remark that for all i,

$$\begin{aligned} |\mathcal {L}_{i+1}|\ge (|\mathcal {L}_{i}|-(1+q/D_i)^{d_{i+1}-d_i})/2 \ge (|\mathcal {L}_i|-2^{nx})/2. \end{aligned}$$

Using induction, we then have \(|\mathcal {L}_i|\ge (|\mathcal {L}_0|+2^{nx})/2^i-2^{nx}\) so that \(|\mathcal {L}_k| \ge n2^{nx}\).

By induction and using the previous lemma, the input of Distinguish is sampled from a \(\mathsf {LWE}\) distribution with noise parameter:

$$\begin{aligned} \alpha '^2=2^k\alpha ^2+4\pi ^2\sum _{i=0}^{k-1}2^{k-i-1}\sum _{j=d_i}^{d_{i+1}-1}(s_jD_i/q)^2. \end{aligned}$$

By choice of k the first term is smaller than nx/6. As for the second term, since B is non increasing and by choice of \(D_{i}\), it is smaller than:

$$ 4\pi ^2\sum _{i=0}^{k-1}2^{k-i-1}\frac{x/6}{\pi ^22^{k-i+1}}\sum _{j=d_i}^{d_{i+1}-1}\Big (\frac{s_j}{B_j}\Big )^2 \le (x/6)\sum _{j=0}^{n-1}\Big (\frac{s_{j}}{B_{j}}\Big )^{2}\le nx/6. $$

Thus the real part of the bias is superior to \(\exp (-nx/3)(1-3^a\epsilon ) \ge 2^{-nx/2}\), and hence by Theorem 2.2, Distinguish fails with negligible probability.   \(\square \)

Theorem 4

Assume that for all i, \(|s_i|\le B\), \(B\ge 2\), \(\max (\beta ,\log (q))=2^{o(n/\log n)}\), \(\beta =\omega (1)\), and \(\epsilon \le 1/\beta ^4\). Then Solve takes time \(2^{(n/2+o(n))/\ln (1+\log \beta /\log B)}\).

Proof

We apply Lemma 7, choosing

$$\begin{aligned} k=\lfloor \log (\beta ^2/(12\ln (1+\log \beta ))) \rfloor =(2-o(1))\log \beta \in \omega (1) \end{aligned}$$

and we set \(D_i=q/(Bk2^{(k-i)/2})\). It now remains to show that this choice of parameters satisfies the conditions of the lemma.

First, observe that \(BD_i/q\le 1/k=o(1)\) so the condition \(|s_j|D_i<0.23q\) is fulfilled. Then, \(d_k \ge n\), which amounts to:

$$\begin{aligned} \sum _{i=0}^{k-1} \frac{x}{(k-i)/2+\log \mathcal {O}(kB)} \ge 2x\ln (1+k/2/\log \mathcal {O}(kB)) \ge 1+k/n=1+o(1) \end{aligned}$$

If we have \(\log k=\omega (\log \log B)\) (so in particular \(k = \omega (\log B)\)), we get \(\ln (1+k/2/\log \mathcal {O}(kB))=(1+o(1))\ln (k)=(1+o(1))\ln (1+\log \beta /\log B)\).

Else, \(\log k=\mathcal {O}(\log \log B)=o(\log B)\) (since necessarily \(B = \omega (1)\) in this case), so we get \(\ln (1+k/2/\log \mathcal {O}(kB))=(1+o(1))\ln (1+\log \beta /\log B)\).

Thus our choice of x fits both cases and we have \(1/x\le 2\ln (1+\log \beta )\). Second, we have \(1/k=o(\sqrt{x})\) so \(D_i\), \(\epsilon \) and k are also sufficiently small and the lemma applies. Finally, note that the algorithm has complexity \(2^{\varOmega (n/\log n)}\), so a factor \(n2^k\log (q)\) is negligible.   \(\square \)

This theorem can be improved when the use of the given parameters yields \(D<1\), since \(D=1\) already gives a lossless quantization.

Theorem 5

Assume that for all i, \(|s_i|\le B=n^{b+o(1)}\). Let \(\beta =n^{c}\) and \(q=n^d\) with \(d\ge b\) and \(c+b\ge d\). Assume \(\epsilon \le 1/\beta ^4\). Then Solve takes time \(2^{n/(2(c-d+b)/d+2\ln (d/b)-o(1))}\).

Proof

Once again we aim to apply Lemma 7, and choose k as above:

$$\begin{aligned} k=\log (\beta ^2/(12\ln (1+\log \beta )))=(2c-o(1))\log n \end{aligned}$$

If \(i<\lceil 2(c-d+b)\log n \rceil \), we take \(D_i=1\), else we choose \(q/D_i=\varTheta (B2^{(a-i)/2})\). Satisfying \(d_{a} \ge n-1\) amounts to:

$$\begin{aligned}&2x(c-d+b)\log n/\log q+\sum _{i=\lceil 2(c-d+b)\log n \rceil }^{a-1} \frac{x}{(a-i)/2+\log \mathcal {O}(B)} \\ \ge \;&2x(c-d+b)/d+2x\ln ((a-2(c-d+b)\log n+2\log B)/2/\log \mathcal {O}(B)) \\ \ge \;&1+a/n=1+o(1) \end{aligned}$$

So that we can choose \(1/x=2(c-d+b)/d+2\ln (d/b)-o(1)\).   \(\square \)

Corollary 1

Given a \(\mathsf {LWE}\) problem with \(q=n^d\), Gaussian errors with \(\beta =n^c\), \(c>1/2\) and \(\epsilon \le n^{-4c}\), we can find a solution in \(2^{n/(1/d+2\ln (d/(1/2+d-c))-o(1))}\) time.

Proof

Apply Theorem 1: with probability 2/3, the secret is now bounded by \(B=\mathcal {O}(q\sqrt{n}/\beta \sqrt{\log n})\). The previous theorem gives the complexity of an algorithm discovering the secret, using \(b=1/2-c+d\), and which works with probability \(2/3-2^{-\varOmega (n)}\). Repeating n times with different samples, the correct secret will be outputted at least \(n/2+1\) times, except with negligible probability. By returning the most frequent secret, the probability of failure is negligible.   \(\square \)

In particular, if \(c \le d\), it is possible to quantumly approximate lattice problems within factor \(\mathcal {O}(n^{c+1/2})\) [34]. Setting \(c=d\), the complexity is \(2^{n/(1/c+2\ln (2c)-o(1))}\), so that the constant slowly converges to 0 when c goes to infinity.

A simple \(\mathsf {BKW}\) using the bias would have a complexity of \(2^{d/cn+o(n)}\), the analysis of [4] or [3] only conjectures \(2^{dn/(c-1/2)+o(n)}\) for \(c>1/2\). In [4], the authors incorrectly claim a complexity of \(2^{cn+o(n)}\) when \(c=d\), because the blowup in the error is not explicitely computed.

Finally, if we want to solve the \(\mathsf {LWE}\) problem for different secrets but with the same vectorial part of the samples, it is possible to be much faster if we work with a bigger final bias, since the Reduce part needs to be called only once.

3.4 Experimentation

We have implemented our algorithm, in order to test its efficiency in practice, as well as that of the practical improvements in the appendix of the full version. We have chosen dimension \(n = 128\), modulus \(q = n^{2}\), binary secret, and Gaussian errors with noise parameter \(\alpha = 1/(\sqrt{n/\pi }\log ^2 n)\). The previous best result for these parameters, using a \(\mathsf {BKW}\) algorithm with lazy modulus switching, claims a time complexity of \(2^{74}\) with \(2^{60}\) samples [4].

Using our improved algorithm, we were able to recover the secret using \(m = 2^{28}\) samples within 13 hours on a single PC equipped with a 16-core Intel Xeon. The computation time proved to be devoted mostly to the computation of \(9\cdot 10^{13}\) norms, computed in fixed point over 16 bits in SIMD.

In appendix of the full version, we compare the different techniques to solve the LWE problem when the number of samples is large or small. We were able to solve the same problem using BKZ with block size 40 followed by an enumeration in two minutes.

4 Applications to Lattice Problems

We first show that \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\) is easier than \(\mathsf {LWE}_{B,\beta }\) for some large enough modulus and then that \(\mathsf {UniqueSVP}_{B,\beta }^{||.||_\infty }\) and \(\mathsf {GapSVP}_{B,\beta }^{||.||_\infty }\) are easier than \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\). In appendix of the full version, we prove the same result for \(\mathsf {BDD}_{B,\beta }^{||.||}\).

4.1 Variant of Bounding Distance Decoding

The main result of this subsection is close to the classic reduction of [34]. However, our definition of \(\mathsf {LWE}\) allows to simplify the proof, and gain a constant factor in the decoding radius. The use of the KL divergence instead of the statistical distance also allows to gain a constant factor, when we need an exponential number of samples, or when \(\lambda _n^*\) is really small.

The core of the reduction lies in Lemma 8, assuming access to a Gaussian sampling oracle. This hypothesis will be taken care of in Lemma 9.

Lemma 8

Let \(\mathbf {\mathrm {A}}\) be a basis of the lattice \(\mathrm {\Lambda }\) of full rank n. Assume we are given access to an oracle outputting a vector sampled under the law \(D_{\mathrm {\Lambda }^*,\sigma }\) and \(\sigma \ge q\eta _\epsilon (\mathrm {\Lambda }^*)\), and to an oracle solving the \(\mathsf {LWE}\) problem in dimension n, modulus \(q\ge 2\), noise parameter \(\alpha \), and distortion parameter \(\xi \) which fails with negligible probability and use m vectors if the secret \(\mathbf {\mathrm {s}}\) verifies \(|s_i|\le B_i\).

Then, if we are given a point \(\mathbf {\mathrm {x}}\) such that there exists \(\mathbf {\mathrm {s}}\) with \(\mathbf {\mathrm {v}}=\mathbf {\mathrm {A}}\mathbf {\mathrm {s}}-\mathbf {\mathrm {x}}\), \(||\mathbf {\mathrm {v}}||\le \sqrt{1/\pi } \alpha q/\sigma \), \(|s_i|\le B_i\) and \(\rho _{\sigma /q}(\mathrm {\Lambda }\setminus \{\mathbf {0}\}+\mathbf {\mathrm {v}})\le \xi \exp (-\alpha ^2)/2\), we are able to find \(\mathbf {\mathrm {s}}\) in at most mn calls to the Gaussian sampling oracle, n calls to the \(\mathsf {LWE}\) solving oracle, with a probability of failure \(n\sqrt{m}\epsilon +2^{-\mathrm {\varOmega }(n)}\) and complexity \(\mathcal {O}(mn^3+n^c)\) for some c.

In the previous lemma, we required access to a \(D_{\mathrm {\Lambda }^*,\sigma }\) oracle. However, for large enough \(\sigma \), this hypothesis comes for free, as shown by the following lemma, which we borrow from [10].

Lemma 9

If we have a basis \(\mathbf {\mathrm {A}}\) of the lattice \(\mathrm {\Lambda }\), then for \(\sigma \ge \mathcal {O}(\sqrt{\log n}||\widetilde{\mathbf {\mathrm {A}}}||)\), it is possible to sample in polynomial time from \(D_{\mathrm {\Lambda },\sigma }\).

We will also need the following lemma, due to Banaszczyk [7]. For completeness, a proof is provided in the appendix of the full version.

Lemma 10

For a lattice \(\mathrm {\Lambda }\), \(\mathbf {\mathrm {c}}\in \mathbb {R}^n\), and \(t\ge 1\),

$$\begin{aligned} \frac{\rho \big ((\mathrm {\Lambda }+\mathbf {\mathrm {c}})\setminus \mathcal {B}\big (0,t\sqrt{\frac{n}{2\pi }}\big )\big )}{\rho (\mathrm {\Lambda })} \le \exp \big (-n(t^2-2\ln t-1)/2\big ) \le \exp \big (-n(t-1)^2/2\big ). \end{aligned}$$

Theorem 6

Assume we have a \(\mathsf {LWE}\) solving oracle of modulus \(q\ge 2^n\), parameters \(\beta \) and \(\xi \) which needs m samples.

If we have a basis \(\mathbf {\mathrm {A}}\) of the lattice \(\mathrm {\Lambda }\), and a point \(\mathbf {\mathrm {x}}\) such that \(\mathbf {\mathrm {A}}\mathbf {\mathrm {s}}-\mathbf {\mathrm {x}}=\mathbf {\mathrm {v}}\) with \(||\mathbf {\mathrm {v}}||\le (1-1/n)\lambda _1/\beta /t < \lambda _1/2\) and \(4\exp (-n(t-1/\beta -1)^2/2)\le \xi \exp (-n/2/\beta ^2)\), then with \(n^2\) calls to the \(\mathsf {LWE}\) solving oracle with secret \(\mathbf {\mathrm {s}}\), we can find \(\mathbf {\mathrm {s}}\) with probability of failure \(2\sqrt{m}\exp (-n(t^2-2\ln t-1)/2)\) for any \(t\ge 1+1/\beta \).

Proof

Using Lemma 10, we can prove that \(\sigma =t\sqrt{n/2/\pi }/\lambda _1 \le \eta _{\epsilon }(\mathrm {\Lambda }^*)\) for \(\epsilon =2\exp (-n(t^2-2\ln t-1)/2)\) and

$$\begin{aligned} \rho _{1/\sigma }\big (\mathrm {\Lambda }\setminus \{\mathbf {0}\}+\mathbf {\mathrm {v}}\big ) \le 2\exp \big (-n(t(1-1/\beta /t)-1)^2/2\big ). \end{aligned}$$

Using LLL, we can find a basis \(\mathbf {\mathrm {B}}\) of \(\mathrm {\Lambda }\) so that \(||\widetilde{\mathbf {\mathrm {B}}^*}||\le 2^{n/2}/\lambda _1\), and therefore, it is possible to sample in polynomial time from \(D_{\mathrm {\Lambda },q\sigma }\) since \(q \ge 2^n\) for sufficiently large n.

The LLL algorithm also gives a non zero lattice vector of norm \(\ell \le 2^n\lambda _1\). For i from 0 to \(n^2\), we let \(\lambda =\ell (1-1/n)^i\), we use the algorithm of Lemma 8 with standard deviation \(tq\sqrt{n/2/\pi }/\lambda \), which uses only one call to the LWE solving oracle, and return the closest lattice vector of \(\mathbf {\mathrm {x}}\) in all calls.

Since \(\ell (1-1/n)^{n^2}\le 2^n\exp (-n)\lambda _1 \le \lambda _1\), with \(0\le i\le n^2\) be the smallest integer such that \(\lambda =\ell (1-1/n)^i \le \lambda _1\), we have \(\lambda \ge (1-1/n)\lambda _1\). Then the lemma applies since

$$\begin{aligned} ||\mathbf {\mathrm {v}}|| \le (1-1/n) \lambda _1/\beta /t \le \sqrt{1/\pi } \sqrt{n/2}/\beta q/(tq\sqrt{n/2/\pi }/\lambda )=\lambda /t/\beta . \end{aligned}$$

Finally, the distance bound makes \(\mathbf {\mathrm {As}}\) the unique closest lattice point of \(\mathbf {\mathrm {x}}\).   \(\square \)

Using self-reduction, it is possible to remove the \(1-1/n\) factor [27].

Corollary 2

It is possible to solve \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\) in time \(2^{(n/2+o(n))/\ln (1+\log \beta /\log B)}\) if \(\beta =\omega (1)\), \(\beta =2^{o(n/\log n)}\) and \(\log B=\mathcal {O}(\log \beta )\).

Proof

Apply the previous theorem and Theorem 4 with some sufficiently large constant for t, and remark that dividing \(\beta \) by some constant does not change the complexity.   \(\square \)

Note that since we can solve \(\mathsf {LWE}\) for many secrets in essentially the same time than for one, we have the same property for \(\mathsf {BDD}\).

4.2 \(\mathsf {UniqueSVP}\) and \(\mathsf {GapSVP}\)

In this section, we show how \(\mathsf {GapSVP}_{B,\beta }^{||.||_\infty }\) and \(\mathsf {UniqueSVP}_{B,\beta }^{||.||_\infty }\) can be reduced to \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\), and hence to \(\mathsf {LWE}\). Proofs are provided in the appendix of the full version.

Theorem 7

Given a \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\) oracle, it is possible to solve \(\mathsf {UniqueSVP}_{B,\beta }^{||.||_\infty }\) in polynomial time of n and \(\beta \).

Theorem 8

We can solve any \(\mathsf {GapSVP}_{o(B\sqrt{\log \log \log \beta /\log \log \beta }),\beta }^{||.||_\infty }\) instances in time \(2^{(n/2+o(n))/\ln (1+\log \beta /\log B)}\) for \(\beta =2^{o(n/\log n)}\), \(\beta =\omega (1)\), \(B\ge 2\).

Corollary 3

It is possible to solve any \(\mathsf {GapSVP}_{2^{\sqrt{\log n}},n^c}^{||.||_\infty }\) with \(c>0\) in time \(2^{(n+o(n))/\ln \ln n}\).

Proof

Use Theorem 8 with \(B=2^{\sqrt{\log n}}\log \log n\) and \(\beta =n^c\).   \(\square \)

Theorem 9

If it is possible to solve \(\mathsf {BDD}_{B,\beta }^{||.||_\infty }\) in polynomial time, then it is possible to solve in randomized polynomial time \(\mathsf {GapSVP}_{B/\sqrt{n},\beta \sqrt{n/\log n}}^{||.||_\infty }\).

5 Other Applications

5.1 Low Density Subset-Sum Problem

Definition 10

We are given a vector \(\mathbf {\mathrm {a}}\in \mathbb {Z}^n\) whose coordinates are sampled independently and uniformly in [0; M), and \(\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {s}} \rangle \) where the coordinates of \(\mathbf {\mathrm {s}}\) are sampled independently and uniformly in \(\left\{ 0,1\right\} \). The goal is to find \(\mathbf {\mathrm {s}}\). The density is defined as \(d=\frac{n}{\log M}\).

Note that this problem is trivially equivalent to the modular subset-sum problem, where we are given \(\langle \mathbf {a},\mathbf {s} \rangle \text { mod }M\) by trying all possible \(\lfloor \langle \mathbf {a},\mathbf {s} \rangle /M \rfloor \).

In [13, 22], Lagarias et al. reduce the subset sum problem to \(\mathsf {UniqueSVP}\), even though this problem was not defined at that time. We will show a reduction to \(\mathsf {BDD}_{1,\mathrm {\mathrm {\varOmega }}(2^{1/d})}^{||.||_\infty }\), which is essentially the same. First, we need two geometric lemmata.

Lemma 11

Let \(\mathcal {B}_n(r)\), the number of points of \(\mathbb {Z}^n\) of norm smaller than r, and \(V_n\) the volume of the unit ball. Then,

$$\begin{aligned} \mathcal {B}_n(r)\le V_n\bigg ( r+\frac{\sqrt{n}}{2} \bigg )^n. \end{aligned}$$

Proof

For each \(\mathbf {\mathrm {x}}\in \mathbb {Z}^n\), let \(E_{\mathbf {\mathrm {x}}}\) be a cube of length 1 centered on \(\mathbf {\mathrm {x}}\). Let E be the union of all the \(E_{\mathbf {\mathrm {x}}}\) which have a non empty intersection with the ball of center \(\mathbf {0}\) and radius r. Therefore \(\mathrm {vol}(E)\ge \mathcal {B}_n(r)\) and since E is included in the ball of center \(\mathbf {0}\) and radius \(r+\frac{\sqrt{n}}{2}\), the claim is proven.   \(\square \)

Lemma 12

For \(n\ge 4\) we have

$$\begin{aligned} V_n= \frac{\pi ^{n/2}}{(n/2)!} \le (\sqrt{\pi \mathrm {e}/n})^n. \end{aligned}$$

Theorem 10

Using one call to a \(\mathsf {BDD}_{1,c2^{1/d}}^{||.||_\infty }\) oracle with any \(c< \sqrt{2/\pi /\mathrm {e}}\) and \(d=o(1)\), and polynomial time, it is possible to solve a subset-sum problem of density d, with negligible probability of failure.

Proof

With the matrix:

$$\begin{aligned} \mathbf {\mathrm {A}}=\begin{pmatrix} \mathbf {\mathrm {I}} \\ C\mathbf {\mathrm {a}} \end{pmatrix} \end{aligned}$$

for some \(C>c2^{1/d}\sqrt{n}/2\) and \(\mathbf {\mathrm {b}}=(1/2,\dots ,1/2,C\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {s}}\rangle )\), return \(\mathsf {BDD}(\mathbf {\mathrm {A}},\mathbf {\mathrm {b}})\). It is clear that \(||\mathbf {\mathrm {A}}\mathbf {\mathrm {s}}-\mathbf {\mathrm {b}}||= \sqrt{n}/2\). Now, let \(\mathbf {\mathrm {x}}\) such that \(||\mathbf {\mathrm {A}}\mathbf {\mathrm {x}}||=\lambda _1\). If \(\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {x}}\rangle \ne 0\), then \(\lambda _1=||\mathbf {\mathrm {A}}\mathbf {\mathrm {x}}||\ge C\) therefore \(\beta \ge c2^{1/d}\). Else, \(\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {x}}\rangle =0\). Without loss of generality, \(x_0\ne 0\), we let \(y=-(\sum _{i>0} a_ix_i)/x_0\) and the probability over \(\mathbf {\mathrm {a}}\) that \(\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {x}}\rangle =0\) is:

$$\begin{aligned} \Pr [\langle \mathbf {\mathrm {a}},\mathbf {\mathrm {x}}\rangle =0]=\Pr [a_0=y]=\sum _{z=0}^{M-1} \Pr [y=z]\Pr [a_0=z] \le \frac{1}{M}. \end{aligned}$$

Therefore, the probability of failure is at most, for sufficiently large n,

$$\begin{aligned} \mathcal {B}_n(\beta \sqrt{n}/2)/M \le&(\sqrt{\pi \mathrm {e}/n})^n(c2^{1/d}\sqrt{n}/2+\sqrt{n}/2)^n/2^{n/d} \\ =&\big (\sqrt{\pi \mathrm {e}/2}(c+2^{-1/d})\big )^n=2^{-\mathrm {\varOmega }(n)}. \end{aligned}$$

   \(\square \)

Corollary 4

For any \(d=o(1)\) and \(d=\omega (\log n/n)\), we can solve the subset-sum problem of density d with negligible probability of failure in time \(2^{(n/2+o(n))/\ln (1/d)}.\)

The cryptosystem of Lyubashevsky et al. [28] uses \(2^{1/d}>10n\log ^2 n\) and is therefore broken in time \(2^{(\ln 2/2+o(1))n/\log \log n}\). Current lattice reduction algorithms are slower than this one when \(d=\omega (1/(\log n\log \log n))\).

5.2 Sample Expander and Application to \(\mathsf {LWE}\) with Binary Errors

Definition 11

Let q be a prime number. The problem \(\mathsf {Small\text {-}DecisionLWE}\) is to distinguish \((\mathbf {\mathrm {A}},\mathbf {\mathrm {b}})\) with \(\mathbf {\mathrm {A}}\) sampled uniformly with n columns and m rows, \(\mathbf {\mathrm {b}}=\mathbf {\mathrm {As}}+\mathbf {\mathrm {e}}\) such that \(||\mathbf {\mathrm {s}}||^2+||\mathbf {\mathrm {e}}||^2\le nk^2\) and \(||\mathbf {\mathrm {s}}||_{\infty }\le B\) from \((\mathbf {\mathrm {A}},\mathbf {\mathrm {b}})\) sampled uniformly. Also, the distribution \((\mathbf {\mathrm {s}},\mathbf {\mathrm {e}})\) is efficiently samplable.

The problem \(\mathsf {Small\text {-}SearchLWE}\) is to find \(\mathbf {\mathrm {s}}\) given \((\mathbf {\mathrm {A}},\mathbf {\mathrm {b}})\) with \(\mathbf {\mathrm {A}}\) sampled uniformly and \(\mathbf {\mathrm {b}}=\mathbf {\mathrm {As}}+\mathbf {\mathrm {e}}\) with the same conditions on \(\mathbf {\mathrm {s}}\) and \(\mathbf {\mathrm {e}}\).

These problems are generalizations of \(\mathsf {BinaryLWE}\) where \(\mathbf {\mathrm {s}}\) and \(\mathbf {\mathrm {e}}\) have coordinates sampled uniformly in \(\left\{ 0,1\right\} \). In this case, remark that each sample is a root of a known quadratic polynomial in the coordinates of \(\mathbf {\mathrm {s}}\). Therefore, it is easy to solve this problem when \(m\ge n^2\). For \(m=\mathcal {O}(n)\), a Gröbner basis algorithm applied on this system will (heuristically) have a complexity of \(2^{\mathrm {\varOmega }(n)}\) [2]. For \(m=\mathcal {O}(n/\log n)\) and \(q=n^{\mathcal {O}(1)}\), it has been shown to be harder than a lattice problem in dimension \(\varTheta (n/\log n)\) [30].

In appendix of the full version, we prove the following theoremFootnote 3, with the coordinates of \(\mathbf {\mathrm {x}}\) and \(\mathbf {\mathrm {y}}\) distributed according to a samplable \(\mathcal {D}\):

Theorem 11

Assume there is an efficient distinguisher which uses k samples for \(\mathsf {Decision\text {-}LWE}\) (respectively a solver for \(\mathsf {Search\text {-}LWE}\)) with error distribution \(\langle \mathbf {\mathrm {s}},\mathbf {\mathrm {y}}\rangle +\langle \mathbf {\mathrm {e}},\mathbf {\mathrm {x}}\rangle \) of advantage (resp. success probability) \(\epsilon \).

Then, either there is an efficient distinguisher for \(\mathsf {Decision\text {-}LWE}\) with samples and secret taken uniformly, and error distribution \(\mathcal {D}\) in dimension \(m-1\) and with \(n+m\) samples of advantage \(\frac{\xi }{4qk}-q^{-n}-q^{-m}\); or there is an efficient distinguisher of advantage \(\epsilon -\xi \) for \(\mathsf {Small\text {-}Decision\text {-}LWE}\) (resp. solver of success probability \(\epsilon -\xi \) for \(\mathsf {Small\text {-}Search\text {-}LWE}\)).

Lemma 13

Let \(\mathcal {D}=D_{\mathbb {Z},\sigma }\) for \(\sigma \ge 1\). Then, the advantage of a distinguisher for \(\mathsf {Decision\text {-}LWE}\) of dimension m with \(m+n\) samples of noise distribution \(\mathcal {D}\) is at most \(\sqrt{q^n/\sigma ^{n+m}}\). Furthermore, the bias of \(\langle (\mathbf {\mathrm {s}}|\mathbf {\mathrm {e}}),(\mathbf {\mathrm {x}}|\mathbf {\mathrm {y}}) \rangle \), for fixed \(\mathbf {\mathrm {s}}\) and \(\mathbf {\mathrm {e}}\), is at least \(\exp (-\pi (||\mathbf {\mathrm {s}}||^2+||\mathbf {\mathrm {e}}||^2)\sigma ^2/q^2)\).

Proof

We have \(\mathcal {D}^{m+n}(\mathbf {\mathrm {a}})\le \mathcal {D}(0)^{m+n}=1/\rho _{\sigma }(\mathbb {Z})^{m+n}\) and \(\rho _{\sigma }(\mathbb {Z})=\sigma \rho _{1/\sigma }(\mathbb {Z}) \ge \sigma \) using a Poisson summation. The first property is then a direct application of the leftover hash lemma, since q is prime.

The bias of \(\lambda \mathcal {D}\) can be computed using a Poisson summation as:

$$\begin{aligned} \sum _{a \in \mathbb {Z}} \rho _{\sigma }(a)\cos (2\pi \lambda a/q)=\rho _{1/\sigma }(\mathbb {Z}+\lambda /q)\ge \exp (-\pi \lambda ^2\sigma ^2/q^2). \end{aligned}$$

Therefore, the second property follows from the independency of the coordinates of \(\mathbf {\mathrm {x}}\) and \(\mathbf {\mathrm {y}}\).   \(\square \)

Corollary 5

Let q, n and m such that \(m\log q/(n+m)=o(n/\log n)\), then \((m-3)\log q/(n+m)-\log k=\omega (\log B)\) and \(m=\omega (1)\). Then, we can solve the \(\mathsf {Small\text {-}Decision\text {-}LWE}\) problem in time

$$\begin{aligned} 2^{(n/2+o(n))/\ln ((m\log q/(n+m)-\log k)/\log B)} \end{aligned}$$

with negligible probability of failure.

Proof

We use the previous lemma with \(\sigma =2q^{(n+2)/(n+m-1)}\), so that we have \(\beta =\varOmega (q^{(m-3)/(n+m)}/k)\). The algorithm from Theorem 4 needs \(2^{o(n)}\) samples, so the advantage of the potential distinguisher for \(\mathsf {Decision\text {-}LWE}\) is \(2^{-(1/4+o(1))n}/q\) for \(\xi =2^{-n/4}\); while the previous lemma proves it is less than \(2^{-n/2}/q\).   \(\square \)

The NTRU cryptosystem [20] is based on the hardness of finding two polynomials f and g whose coefficients are bounded by 1 given \(h=f/g \mod (X^n-1,q)\). Since \(hg=0\) with an error bounded by 1, we can apply previous algorithms in this section to heuristically recover f and g in time \(2^{(n/2+o(n))/\ln \ln q}\). This is the first subexponential time algorithm for this problem since it was introduced back in 1998.

Corollary 6

Assume we have a \(\mathsf {Search\text {-}LWE}\) problem with \(n\log q+\varOmega (n/\log q)\) samples and Gaussian noise with \(\alpha =n^{-c}\) and \(q=n^d\). Then, we can solve it in time \(2^{n/(2\ln (d/(d-c))-o(1))}\) for any failure probability in \(2^{-n^{o(1)}}\).

Proof

First, apply a secret-error switching (Theorem 1). Apply the previous corollary with \(B=n^{d-c+o(1)}\) which is a correct bound for the secret, except with probability \(2^{-n^{o(1)}}\). Lemma 10 shows that \(k^2\le \log q\sigma ^2\), except with probability \(2^{-\mathrm {\varOmega }(n)}\), so that \(\beta =n^{c+o(1)}\). We can then use \(\sigma =\varTheta (1)\) and apply Theorem 4.   \(\square \)

Note that this corollary can in fact be applied to a very large class of distributions, and in particular to the learning with rounding problem, while the distortion parameter is too large for a direct application of Theorem 4.

Also, if the reduction gives a fast (subexponential) algorithm, one may use \(\sigma =2\sqrt{n}\) and assume that there is no quantum algorithm solving the corresponding lattice problem in dimension m.

Even more heuristically, one can choose \(\sigma \) to be the lowest such that if the reduction does not work, we have an algorithm faster than the best known algorithm for the same problem.