1 Introduction

An essential tool in cryptography is the use of divergence measures to prove the security of cryptographic schemes. As an introductory example, we consider the statistical distance \(\varDelta \). It verifies a probability preservation property, which states that for any two distributions and any measureable event E over the support of and , we have

(1)

In a cryptographic context, a useful abstraction is to modelize a cryptographic scheme as relying on some ideal distribution and the success of an attacker against this scheme as an event E. If is negligible, the Eq. 1 will allow to say that a scheme secure with will stay secure if one replaces by an “imperfect” distribution . Many other measures can be used to provide security arguments in cryptography (see e.g. [Cac97]).

The Rényi divergence. In the subfield of lattice-based crytography, the Rényi divergence [R61] has been used for cryptographic proofs in several recent works. Noted \(R_a\), it is somewhat trickier to use than the statistical distance. First, it is parameterized by a value \(a \in [0, +\infty ]\), and has different properties depending on a. It is not a distance, as it is asymmetric and does not verify the triangle inequality; the lack of these two properties can be problematic in security proofs. Interestingly, it also verifies a probability preservation property. For any event and \(a \in (1, +\infty )\), we have

(2)

The Eq. 2 is not additive like Eq. 1, but rather multiplicative. We will later see that in the context of search problems, it allows to give tighter bounds in practice.

1.1 Floating-Point in Lattice-Based Cryptography

Lattice-based cryptography has proven to be a serious candidate for post-quantum cryptography. It is efficient and allows to instantiate a wide range of cryptographic primitives. Some lattice-based schemes [DDLL13, ADPS16] have even already been deployed in large-scale projects.Footnote 1

A notable characteristic of lattice-based cryptography is that it often makes extensive use of floating-point arithmetic, for several reasons.

Gaussians. The first vector for the use of floating-point arithmetic in lattice-based cryptography is the widespread need to sample from discrete Gaussian distributions. When done by standard approaches like precomputed tables, [Pei10] the required precision is rather high and renders the use of these tables cumbersome if not impractical.

On the other hand, bitwise approaches [DDLL13] have been developed to circumvent these floating-point issues, but they can be somewhat tricky to implement.

Rejection sampling. In the early lattice-based signature schemes GGH [GGH97] and NTRUSign [HHGP+03], there existed a correlation between the secret key and the distribution of the signatures. This subsequently led to several key-recovery attacks [GJSS01, GS02, NR06, Wan10, DN12b] which broke the signature schemes and their evolutions.

A provably secure countermeasure was introduced by Lyubashevsky [Lyu09]. The idea is to use rejection sampling as a final step, in order to “factor out” the correlation between the key and the distribution of the signatures.

This paradigm was instantiated in [Lyu12, GLP12, DDLL13, PDG14, POG15]. Now, in the existing implementations [DDLL13], this step is not done in floating-point. Because of precision concerns, another approach based on combining Bernoulli samples was chosen. We will see in Sect. 4.3 that this approach also has several drawbacks.

Trapdoor sampling. In lattice-based cryptography, the tool that makes the most intensive use of floating-point arithmetic is arguably trapdoor sampling. Introduced by Gentry et al. [GPV08], it is a cornerstone of lattice-based cryptography, as it has numerous applications such as hash-and-sign and identity-based encryption in the random oracle model [GPV08], signatures in the standard model [CHKP10, Boy10], hierarchical IBE [CHKP10, ABB10a, ABB10b], attribute-based encryption [Boy13, BGG+14], and much more.

The existing algorithms [Kle00, GPV08, Pei10, MP12] heavily rely on floating-point arithmetic and they perform between \(O(n \log n)\) and \(O(n^2)\) floating-point operations. However, the best available estimations require 150 bits of precision for a security of 256 bits, which is completely impractical.

As we can see, floating-point arithmetic can be found everywhere in lattice-based cryptography. However, if often comes with high precision, which makes it impractical as it stands.

1.2 Our Contributions

Theory. We provide theoretic tools related to the use of the Rényi divergence in cryptographic proofs. They make it not only simpler to use, but also very efficient in some easily-identifiable situations.

  1. 1.

    We establish two lemmas that bound the Rényi divergence of related distributions in two very common situations in lattice-based cryptography. The first lemma concerns tailcut distributions, and for this reason we call it the tailcut lemma. The second one involves distributions which relative error is bounded, so we call it the relative error lemma. The second lemma is particularly powerful in the sense that it often allows to take very aggressive parameters.

  2. 2.

    We show that taking \(a = 2\lambda \) allows to have tight and efficient Rényi divergence-based security arguments for cryptographic schemes based on search problems. We also derive simple and explicit conditions on distributions that allow to easily replace a distribution by another in this context.

  3. 3.

    A simple and versatile distance of divergence was recently introduced by Micciancio and Walter [MW17], the max-log distance. We establish a “reverse Pinsker” inequality between it and the Rényi divergence. An immediate consequence is that we may benefit from the best of both worlds: the versatility of the max-log distance, and the power of the Rényi divergence.

Practice. Our results are not purely theoretic. In Sect. 4, we present five applications of them in lattice-based cryptography.

  1. 1.

    We start by the study of a sampler recently introduced by Micciancio and Walter [MW17]. We show that for this sampler, the security analysis provided by [MW17] can be improved and we can claim a full security of 256 bits instead of the 100 bits claimed in [MW17].

  2. 2.

    We revisit the table-based approach (see e.g. [Pei10]) for sampling distributions such as discrete Gaussians. By a Rényi divergence-based analysis combined to a little tweak on the precomputed table, we reduce the storage size by an order of magnitude, both in theory and in practice (where we gain a factor 9). Our improvement seems highly composable with other techniques related to precomputed tables.

  3. 3.

    We analyze the rejection sampling step of BLISS [DDLL13]. We show that it can be done simply and efficiently in floating-point, simultaneously eliminating the issues – code complexity, side-channel attacks, table storage, etc. – that plagued the only previously existing approach.

  4. 4.

    We then study trapdoor samplers [Kle00, GPV08, Pei10]. We improve the usual bounds on the standard deviation \(\sigma \) by obtaining a new bound which is both smaller and essentially independent of the security level \(\lambda \). In practice, we gain about 30 bits of security compared to a statistical distance-based analysis.

  5. 5.

    The last contribution is also related to trapdoor samplers. We show that a precision of 64 bits allows 256 bits of security, whereas previous estimations [LP15, Pre15] required a precision of 150 bits.

A word on the security parameter and number of queries. In order to make our results as simple as possible and to derive explicit bounds, we consider in this paper that the security level \(\lambda \) and the number of queries \(q_s\) verify \(\lambda \le 256\) and \(q_s \le 2^{64}\). The first choice is arguably standard.

For the bound on \(q_s\), we consider that making more than \(2^{64}\) signature queries would be extremely costly and, unlike queries to e.g. a hash function, require the presence of the target to attack. In addition, it would be easily detectable by the target and so we believe it to be impractical.

Finally, a more pragmatic reason comes from NIST’s current call for proposals for post-quantum cryptography,Footnote 2 which explicitly assumes that an attacker can make no more than \(2^{64}\) signatures queries (resp. decryption queries).

However, if one decides to take \(q_s > 2^{64}\), our results could be easily adapted, but their efficiency would be impacted.

1.3 Related Works

In the context of lattice-based cryptography, Stehlé, Steinfeld and their coauthors [LSS14, LPSS14, BLL+15] have used the Rényi divergence to derive better parameters for cryptographic schemes. The Rényi divergence has also been used by [BGM+16] to improve security proofs, and in [TT15], which aims to improve the proofs from [BLL+15].

A few papers [PDG14, DLP14] used a third metric, the Kullback-Leibler divergence – actually the Rényi divergence of order 1 –, but the Rényi divergence has since then given better results [BLL+15, this work].

Precision issues have been tackled by [DN12a], which resorted to lazy Gaussian sampling but still didn’t eliminate high-precision. A precision analysis of trapdoor samplers by Prest [Pre15] gave about 120 bits of precision for \(\lambda =192\) – which we extrapolate to 150 for \(\lambda =256\). A recent work by Saarinen [Saa15] has also claimed that using p-bit fixed point approximation achieves 2p bits of security, but this was proven to be incorrect by [MW17], which also introduced the max-log distance.

Finally, recent works [BS16, Mir17] have studied the usefulness of the Rényi divergence in the context of differential privacy and have independently come up with results similar to our relative error lemma.

1.4 Roadmap

Section 2 introduces the notations and tools that we will use throughout the paper, including the Rényi divergence.

Section 3 is dedicated to our theoretic results. We first present the tailcut and relative error lemmas, as well as typical usecases for their applications. We give a framework for using them in cryptographic proofs, along with explicit bounds. Finally, we establish a connection between the Rényi divergence and the max-log distance.

Section 4 presents five applications of our theoretic results. We first give a tighter analysis of a sampler from [MW17], then we revisit the standard table-based approach for sampling Discrete distributions. We then show that rejection sampling in BLISS can be done simply in floating-point arithmetic. To conclude, we study trapdoor samplers and provide improved bounds on the standard deviation and precision with which they can be used.

Section 5 concludes this article and presents related open problems.

2 Preliminaries

2.1 Notations

Cryptographic parameters. When clear from context, let \(\lambda \) be the security level of a scheme and \(q_s\) the number of public queries that an attacker can make. In this article, we consider that \(\lambda \le 256\) and \(q_s \le 2^{64}\).

Probabilities. For any distribution , we denote its support by . We may abbreviate the statistical distance and Kullback-Leibler divergence by SD and KLD. As a mnemonic device, we will often refer to as some perfect distribution, and to as a distribution close to in a sense parameterized by \(\delta \).

Matrices and vectors. Matrices will usually be in bold uppercase (e.g. ), vectors in bold lowercase (e.g. ) and scalars in italic (e.g. s). Vectors are represented as rows. The p-norm of an vector is denoted by , and by convention . Let be the spectral norm of a matrix, it is also the maximum of its singular values and is sometimes denoted by . For , we define the max norm of as .

Gram-Schmidt orthogonalization. An important tool in lattice-based cryptography is the Gram-Schmidt orthogonalization of a full-rank matrix , which is the unique factorization such that is lower triangular with 1’s on the diagonal, and is orthogonal. Noting , it allows to define the Gram-Schmidt norm, defined as .

Lattices and Gaussians. A lattice will be denoted by \(\varLambda \). For a matrix , let be the lattice generated by : . We define the Gaussian function as , and the Gaussian distribution over a lattice as

The parameter may be omitted when it is equal to zero.

Smoothing parameter. For \(\epsilon > 0\), we define the smoothing parameter \(\eta _\epsilon (\varLambda )\) of a lattice as the smallest value \(\sigma > 0\) such that \(\rho _{1/\sigma }(\varLambda ^\star \backslash \mathbf {0}) \le \epsilon \). We carefully note that in the existing literature, some definitions take the smoothing parameter to be our definition multiplied by a factor \(\sqrt{2\pi }\). A useful bound on the smoothing parameter is given by [MR07]:

(3)

2.2 The Rényi Divergence

We define the Rényi divergence in the same way as [BLL+15].

Definition 1

(Rényi divergence). Let be two distributions such that . For \(a \in (1, +\infty )\), we define the Rényi divergence of order a by

In addition, we define the Rényi divergence of order \(+\infty \) by

Again, this definition is slightly different from some other existing definitions, which take the log of ours. However, it is more convenient for our purposes. Generic (resp. cryptographic) properties of the Rényi divergence can be found in [vEH14] (resp. [BLL+15]). We recall the most important ones.

Lemma 1

[BLL+15, Lemma 2.9] . For two distributions and two families of distributions , the Rényi divergence verifies the following properties:

  • Data processing inequality. For any function f, .

  • Multiplicativity. .

  • Probability preservation. For any event and \(a \in (1,+\infty )\),

However, we note that the Rényi divergence is not a distance. In Sect. 3.4, we circumvent this issue by linking the Rényi divergence to the max-log distance.

3 Main Results

In this section, we present our theoretic results: the tailcut lemma and relative error lemma for bounding the Rényi divergence between distributions, a generic framework for using these lemmas and a “reverse Pinsker” inequality that connects the Rényi divergence to the max-log distance.

3.1 The Tailcut Lemma

This first lemma may arguably be considered as folklore; it is already briefly mentioned in e.g. [BLL+15]. Here we make it explicit, as applications of it arise naturally in lattice-based cryptography, especially whenever Gaussians distributions are used.

Lemma 2

(Tailcut). Let be two distributions such that:

  • \(\exists \delta > 0\) such that over

Then for \(a \in (1,+\infty ]\):

Proof

We note . If \(a \ne + \infty \):

which yields the result. If \(a = +\infty \), the result is immediate. \(\Box \)

Fig. 1.
figure 1

Typical usecases for the tailcut lemma and the relative error lemma

We may also refer to Lemma 2 as the tailcut lemma. For the rest of the paper, will typically refer to a “perfect” distribution, and to a distribution which is close to in a sense parameterized by \(\delta \).

Usecases. As its name implies, the tailcut lemma is adapted to situations where is a “tailcut” of : we discard a set such that . In order to still have a true measure of probability, the remaining probabilities are scaled by a factor , and we note the new distribution. Lemma 2 gives a relation of closeness between and in this case, which is illustrated by the Fig. 1.

3.2 The Relative Error Lemma

In our second lemma, the conditions are slightly stricter than for the tailcut lemma, but as a compensation the result is a much stronger closeness relation. It is somewhat similar to the [PDG14, Lemma 2] for the KLD, but allows tighter security arguments.

Lemma 3 (Relative error)

Let be two distributions such that:

  • \(\exists \delta > 0\) such that over

Then, for \(a \in (1,+\infty )\):

Proof

Let \(f_a : (x,y) \mapsto \frac{y^a}{(x+y)^{a-1}}\). We compute values of \(f_a\) and its derivatives around (0, y):

$$\begin{array}{rllll} f_a (x,y) &{} = &{} y &{} \text {for }x = 0\\ \frac{\partial f_a}{\partial x} (x,y) &{} = &{} 1-a &{} \text {for }x = 0\\ \frac{\partial ^2 f_a}{\partial x^2} (x,y) &{} = &{} a(a-1) y^a (x+y)^{-a-1} \\ &{} \le &{} \frac{a(a-1)}{(1-\delta )^{a+1} y} &{} \text {for } |x| \le \delta \cdot y \end{array}$$

We now use partial Taylor bounds. If \(|x| \le \delta \cdot y\), then:

$$ f_a(x,y) \le f_a(0,y) + \frac{\partial f_a}{\partial x}(0,y) \cdot x + \frac{a(a-1) \delta ^2}{2(1-\delta )^{a+1}} \cdot y $$

Let . Taking , , then summing i all over S and using the fact that yields the result:

\(\Box \)

We may also refer to Lemma 3 as the relative error lemma.

Usecases. The relative error lemma can be used when the relative error between and is bounded. This may typically happen when the probabilities of are stored in floating-point with a precision \(\log _2 \delta \) – though we will see that it is not limited to this situation. Again, this is illustrated by Fig. 1.

3.3 Security Arguments Using the Rényi Divergence

We consider a cryptographic scheme making \(q_s\) queries to either a perfect distribution or an imperfect distribution . Let E be an event breaking the scheme by solving a search problem, and \(\varepsilon \) (resp. \(\varepsilon _\delta \)) the probability that this event occurs under the use of (resp. ). We suppose that \(\varepsilon _\delta \ge 2^{-\lambda }\). By the data processing and probability preservation inequalities:

We can choose any value in \((1,+\infty )\) for a, but small values for a impact the tightness of the reduction and large values impact its efficiency. Setting \(a = 2\lambda \) seems to be a good compromise. Indeed, we then have \(\varepsilon _\delta ^{a/(a-1)} \ge \varepsilon _\delta /\sqrt{2} \), so we lose at most half a bit of security in the process.

Our goal is now to have , so that we have an almost tight security reduction. In this regard, having is enough, since it yields by a classic inequality.Footnote 3

This yields \(\epsilon \ge 2^{-\lambda - 1}\). By contraposition, a \((\lambda +1)\)-bit secure scheme with will be at least \(\lambda \)-bit secure when replacing by if the following condition is met:

(4)

We make two important remarks: first, this analysis is valid only for cryptographic schemes relying on search problems. This is the case for all the applications we consider in this paper, but for cryptographic schemes relying on decision problems, one may rather rely on SD-based, KLD-based analyses, or on specific Rényi divergence-based analyses as in [BLL+15, Sect. 4].

Second, the savings provided by our analysis heavily rely on the fact that the number of queries is limited. This was already observed in [BLL+15].

Practical Implications. We consider a cryptographic scheme with \(\lambda +1\le 257\) bits of security making \(q_s~\le ~2^{64}\) queries to a distribution . Replacing by another distribution will make the scheme lose at most one bit of security, provided that one of these conditions is verified:

(5)
(6)

Equation 5 comes from the tailcut lemma with Eq. 4, and Eq. 6 from the relative error lemma with Eq. 4. For \(\lambda \le 256\) and \(q_s~\le ~2^{64}\):

  • the condition 5 translates to \(\delta \le 2^{-66}\),

  • the condition 6 translates to \(\delta \le 2^{-37}\).

3.4 Relation to the Max-Log Distance

In [MW17], Micciancio and Walter introduced a new metric, the max-log distance. They argue that this metric is both easy to use and allows to have sharp bounds in cryptographic proofs.

In Lemma 4, we show that the log of the Rényi divergence is bounded (up to a constant) by the square of the max-log distance. It can be seen as a “reverse” analogue of Pinsker inequality for the SD and KLD, so we call it the reverse Pinsker inequality.

Definition 2

(max-log distance [MW17]). The max-log distance between two distributions and over the same support S is

Lemma 4

(Reverse Pinsker inequality). For two distributions of common support, we have:

Proof

We note for some \(\delta \ge 0\). We have:

The first implication applies the definition of the max-log distance, the second one passes to the exponential, the third one applies the relative error lemma. \(\Box \)

There are two implications from Lemma 4. First, we can add the max-log distance to our tools. Unlike the Rényi divergence, it is actually a distance, which is often useful when performing security analyses.

Second, the Lemma 4 provides evidence that the Rényi divergence gives sharper bounds than the max-log distance, as the log of the former is essentially bounded by the square of the second.

In addition, we point out that the max-log distance is defined only for distributions with a common support. For example, it cannot be applied to tailcut distributions. It is nevertheless a useful measure. One may for example use it if a true distance is needed, and then fall back to the Rényi divergence using Lemma 4.

4 Applications

In this section we provide five applications of our results. In all the cases studied, we manage to claim 256 bits of security while lowering the precision requirements to be less than 53 bits (or 61 bits for the last application). All the concrete bounds are obtained for \(\lambda \le 256\) and \(q_s \le 2^{64}\).

This bound of 53 bits is important. Floating-point with 53 bits of precision corresponds to the double precision type in the IEEE 754 standard, and is very often available in software – see e.g. the type double in C. In many cases, it can also be simulated using fixed-point numbers of 64 bits of precision, which can be done easily and efficiently, in particular over 64-bit architectures.

4.1 Tighter Analysis of the Micciancio-Walter Sampler

The first application of our results is also arguably the simplest. A new Gaussian sampler over was recently introduced by Micciancio and Walter [MW17]. They provide a security analysis using the max-log distance [MW17, Lemma 5.5].

Later, at the end of [MW17, Sect. 5.3], this lemma is used to argue that for a given set of parameters, if we note a perfect Gaussian distribution and the output of the new sampler, we have . This in turn allows them to claim about 100 bits of security.

A tighter analysis. We now prove that a Rényi divergence-based analysis gives tighter bounds than the max-log distance-based analysis from [MW17]. This analysis is done completely in black box, as we do not need to know anything about the sampler, except the fact that . Applying the reverse Pinsker inequality (Lemma 4) yields for any \(a \le 512\).

Following the security argument of Sect. 3.3 and in particular Eqs. 4 and 6, this allows us to claim that the use of this sampler is secure for 256 bits of security and \(q_s = 2^{64}\) queries. This remains the case even if we ask up to \(2^{94}\) queries, which we believe is more than enough for any practical application.

4.2 Revisiting the Table Approach

We now study a more generic problem, namely sampling distributions over . We consider situations where the use of precomputed tables is practical: this includes but is not limited to (pseudo-)Gaussians with parameters known in advance.

We revisit the table-based approach. First, we show that the standard approach based on the cumulative distribution function (see e.g. [Pei10]) suffers from precision issues for a large class of distributions: light-tailed distributions. Informally, these are distributions which tails have a negligible weight (like Gaussians). They also happen to be widespread in lattice-based cryptography.

We then introduce a new approach based on the conditional density function. We show that for light-tailed distributions, it behaves in a much nicer way. To conclude, we take a real-life example and show that in terms of space, the new approach allows to gain an order of magnitude compared to the standard approach.

Definition 3

For a distribution over , we call cumulative distribution function of and note the function defined over S by

Classical CDF sampling. To sample from , a standard approach is to store a precomputed table of , draw a uniform deviate \(u \leftarrow [0,1]\) and output . In practice, we will not store the complete CDF table. If is a discrete Gaussian, then we store the values for with a given precision \(p_0\); here, \(k_0\) is a “tailcut bound” which we can fix by either a SD or Rényi divergence argument. We now estimate the requirements in the context of \(\lambda \) bits of security and \(m \cdot q_s \) queries.Footnote 4

SD-based analysis. Using [GPV08, Lemma 4.2], we have \(k_0 = \sqrt{2(\lambda + \log _2 m)}\). Each should be known with absolute precision \(\lambda + \log _2 m\), so we may take \(p_0 = \lambda + \log _2 m\).

Rényi divergence-based analysis. From the tailcut lemma (see also Eq. 5), it is sufficient to take \(k_0 = \sqrt{2\log _2 (4mq_s)}\). From the relative error lemma, each should be known with relative precision \(\log _2 \delta \) verifying Eq. 6. For our choices of \(\lambda \) and \(q_s\), this yields \(k_0 \le \sqrt{2(66 + \log _2 m)}\) and \(\log _2 \delta \le 37 + \log _2 m\).

For \(\lambda = 256\), we divide the number of precomputed elements by about 1.87. A naive interpretation of the analyses above may also lead us to divide the precision \(p_0\) by \((\lambda + \log _2 m) / (37 + \log _2 m) \approx 6.9\). However, the next paragraph will expose why we cannot simply do that.

Precision issues in the case of light-tailed distributions. In the previous paragraph, there is a slight but important difference between the SD and Rényi divergence analyses. The precision is given absolutely in the first case, and relatively in the second case. It is actually this relativity that allows us to use the relative error lemma in the second case, but it comes at a price: it is not efficient anymore to use the table.

We present here an example explaining why this is the case: let be the distribution defined over by . One can show that , so from a machine perspective, will be rounded to 1 as soon as \(k > p_0\). As a consequence, the probability output of the table-based algorithm will be 0 for any \(k > p_0 + 1\) and we will not be able to use the relative error lemma at all.

This problem is common to light-tailed distributions, including Gaussian-like distributions. As the converges very fast to 1, we have to store it in high precision in order for it to be meaningful. This is not satisfactory from a practical viewpoint.

Conditional density sampling. A simple way around the aforementioned problem is to use the conditional density function instead of the . First, we give its definition.

Definition 4

For a distribution over , we call conditional density function of and note the function defined by .

In other words, \(\textsc {CoDF}(z)\) is the probability that a random variable X of distribution takes the value z, conditioned to the fact that X is bigger or equal to z.Footnote 5 A way to use the \(\textsc {CoDF}\) to sample from is given by Algorithm 1, a variation of the sampler.

figure a

It is easy to show that the expected number of loops in Algorithm 1 is the mean of . It outputs z with probability , which by a telescopic product is equal to

(7)

and therefore, Algorithm 1 is correct. However, in practice Algorithm 1 will be used with precomputed values which are only correct up to a given precision. Lemma 5 provides an analysis of the algorithm is this case.

Lemma 5

For a distribution of support , let be the of , and \(f_\delta \) be an approximation of f such that, over S:

$$\begin{aligned} \begin{array}{cccccc} 1- \delta &{}\le &{} \frac{f_\delta }{f} &{}\le &{} 1 + \delta \\ 1- \delta &{}\le &{} \frac{1-f_\delta }{1-f} &{}\le &{} 1 + \delta \\ \end{array} \end{aligned}$$
(8)

Let be the output distribution of the Algorithm 1 using a precomputed table of \(f_\delta \) instead of f. Then, for any \(z \in S\):

Proof

We have

The first implication comes from Eq. 8, the second one from Eq. 7. \(\Box \)

Provided that the is stored with enough precision, Lemma 5 gives us an inequality that allows to use the relative error lemma. Now, the interesting part is that for light-tailed distributions, the does not converge to 1 as fast as the , which is important if we want the lower part of Eq. 8 to be true. For example, if , we have , whereas . This allows to store in small precision and still remain able to use Lemma 5.

Of course, one may argue that z can be arbitrarily big. However, in practice we will not sample from a distribution of infinite support directly but rather from a tailcut distribution of , in the bounds provided by the tailcut lemma, so z will not take too large values and we will be able to store efficiently.

Solving the precision issues. Going back to the example of the distribution , the Table 1 shows how and are stored in machine precision, and how it impacts the associated sampler.

For the -based sampler, due to precision issues, it samples from a distribution which has a probability 0 for elements in the tail of . In contrast, the -based sampler approximates correctly even for elements in the tail of .

Table 1. Precomputed values of and of as stored in 53 bits precision. The stored value of quickly becomes 1, leading to the associated algorithm sampling from some incorrect distribution instead of .

Application: sampling over in BLISS. An important step of the signature scheme BLISS consists of sampling , where \(\sigma _2 \approx 0.85\).

In BLISS, this is done in a bitwise rejection sampling fashion [DDLL13, Algorithm 10], which is very efficient in hardware but not so much in software. In addition, the structure of the Algorithm 10 from [DDLL13] exposes it to side-channel attacks in the lines of [EFGT17] (see also Sect. 4.3). Instead, one can sample efficiently from using a precomputed table T:

  • With a \(\textsc {CDF}\)+SD approach, T must have 20 elements of 266 bits each, which amounts to about 5 300 bits.

  • With a \(\textsc {CoDF}\)+Rényi divergence approach and using Lemma 5, T must have 11 elements of about 53 bits each, which amounts to about 600 bits.Footnote 6

Here, the \(\textsc {CoDF}\)+Rényi divergence approach makes us gain an order of magnitude in storage requirements. Another notable advantage is that it is particularly fit to a fixed-point implementation, which might make it easier to implement in hardware. In addition, it is generic in the sense that it can be applied to a large class of distributions over (or ).

An open question is how to make Algorithm 1 constant-time and protected against side-channel attacks. The trivial way to make it constant-time is to always read the whole table, but this may incur a significant overhead.

4.3 Simpler and More Secure Rejection Sampling in BLISS

We recall that the context and motivation of doing rejection sampling in lattice-based cryptography is exposed in Sect. 1.1. We now focus our attention on the signature scheme BLISS [DDLL13]. In BLISS, the final step of the signature consists of this step:

(9)

where is the secret key, \(\sigma ,M\) are public parameters and are part of the signature. In the original scheme and all the implementations that we are aware of [LD13, Pop14, Str14], this step is implemented by the means of combining several Bernoulli distributions dependent of the bits of .

There are two drawbacks from this approach. First, the algorithm described in [DDLL13] for performing this step is rather sophisticated, and as a result it takes a significant portion of the coding effort in [LD13, Pop14, Str14].

The second drawback is that this algorithm is actually vulnerable to side-channel attacks: Espitau et al. [EFGT17] have shown that a side-channel analysis of the signature traces can recover both and , and from it the secret key. Interestingly, it might be possible to extend this attack to a timing attack, in which case the implementation of Strongswan [Str14], deployed on Windows, Linux, Mac OS, Android and iOS platforms, could also suffer from it.

Simple Rejection Sampling. We observe that the step 9 doesn’t need to be made exactly. We can simply compute a value \(p_\delta \) such that \( 1 - \delta \le \frac{p_\delta }{p} \le 1 + \delta \), sample \(u \leftarrow [0,1]\) uniformly and accept if and only if \(p_\delta \ge u\). By Eq. 6, it is sufficient that p is computed with a relative error \(2^{-37}\). This can be done easily:

  1. 1.

    In software, one may simply resort to a standard implementation of the \(\exp \) function, such as the one provided math.h for the C language. As long as the relative precision provided is more than 37 bits of precision, we can use Eq. 6. We note that implementations of \(\exp (\cdot )\) usually provide at least 53 bits of precision, which is more than enough for our purposes.

  2. 2.

    In hardware, an implementation of the \(\exp \) function may not always be available. There are many ways around this issue, we present two of them:

    • One may use Padé approximants as an efficient way to compute \(\exp \). Padé approximants are generalizations of Taylor series: they approximate a function f by a polynomial fraction \(\frac{P_n}{Q_m}\) instead of a polynomial \(P_n\). They usually converge extremely fast, and in the case of the \(\exp \) function, the relative error between \(\exp (z)\) and its Padé approximant is less than \(2^{-37}\) for an approximation of order 4 and \(|z| < 1/2\).Footnote 7 A more detailed analysis is provided in appendix, Sect. A.1.

    • Another solution is to precompute the values \(\exp (\frac{2^i}{2\sigma ^2})\) for a small number of values . This then allows to compute \(\exp (\frac{z}{2\sigma ^2})\) for any \(z = \sum _i z_i 2^i\), since \(\exp (\frac{z}{2\sigma ^2}) = \prod _{z_i = 1} \exp (\frac{2^i}{2\sigma ^2})\).Footnote 8 For the parameters given by [DDLL13], and are integers and are less than 37 bits, which means that we would need to store at most 37 precomputed values.

    For the two proposed solutions, a very pessimistic analysis estimates that we perform less than 80 elementary floating-point operations to compute p. While it might seem a lot for 3 exponentials, it is negligible compared to the total cost of a signature, which is around \(O(n \log n)\) for \(n = 512\) in the BLISS scheme. In addition, all the techniques we propose are easy to protect against side-channel attacks.

We note that our software solution and our hardware solution based on Padé approximants do not require to store any precomputed table.

In BLISS, explicitely computing the rejection bound as we did was discarded because of precision concerns. We note that all the security analysis in BLISS was performed using the SD, with only subsequent work [PDG14, BLL+15] using more adequate measures of divergence. Using the SD in our case would have required us compute transcendental functions with a precision \(2^{\lambda }\), which is impractical. The relative error lemma is the key which allows to argue that a floating-point approach is secure.

4.4 Squeezing the Standard Deviation of Trapdoor Samplers

Context. The two last sections are related to the most generic and powerful type of Gaussian sampling: trapdoor sampling. Algorithms for performing trapdoor sampling [Kle00, GPV08, Pei10, MP12] are essentially randomized variants of Babai’s round-off and nearest plane algorithms [Bab85, Bab86]. For suitable parameters, they are statistically indistinguishable from a perfect Gaussian .

For a cryptographic use, we want \(\sigma \) to be as small as possible in order to have the highest security guarantees. However, \(\sigma \) cannot be too small: if it is, then the trapdoor samplers will not behave anymore like perfect Gaussian oracles.Footnote 9 At the extreme case \(\sigma = 0\), the samplers become deterministic and leak the shape of the basis used for sampling, exposing the associated schemes to key-recovery attacks described earlier. To avoid that, samplers usually come with lower bounds on \(\sigma \) for using it securely (see e.g. Theorem 1 for Klein’s sampler [Kle00, GPV08]).

Roadmap. Before continuing, we establish the roadmap for this section and the next one. In this section, we show that, if \(\sigma \) is large enough, a Gaussian sampler with infinite precision is as secure as an ideal Gaussian. In the next one, we show that a Gaussian sampler with finite precision is as secure as one with infinite precision. Of course, such analyses are already known. Our contribution here is to use the Rényi divergence to have more aggressive parameters for \(\sigma \) and the precision of the sampler (Fig. 2).

Fig. 2.
figure 2

Roadmap for asserting the security of a practical Gaussian sampler

Klein’s sampler. We cannot analyse all the existing samplers in this article, so we now focus our attention on Klein’s sampler [Kle00, GPV08]. It is described in Algorithm 2.

figure b

An associated lower bound on \(\sigma \) for using Algorithm 2 is given in Theorem 1.

Theorem 1

([DN12a, Theorem 1], concrete version of [GPV08, Theorem 4.1] ). Let \(\epsilon = 2^{-\lambda }\). If , then the SD between and the perfect discrete Gaussian is upper bounded by \(2^{-\lambda }\).

Combined to a standard SD-based argument, Theorem 1 establishes that \(\sigma \) must be proportional to \(\sqrt{\lambda }\) in order to claim \(\lambda \) bits of security when using Algorithm 2. A better bound was established in [DLP14] but it remains proportional to \(\sqrt{\lambda }\). In Lemma 6, we establish a bound that is both (almost) independent of \(\lambda \) and smaller.

Lemma 6

(Rényi divergence of Klein’s sampler). For any \(\epsilon \in (0,1/4)\), if then the Rényi divergence between and the output distribution of verifies

where \(\delta = \left( \frac{1+\epsilon /n}{1 - \epsilon /n} \right) ^n - 1 \approx 2\epsilon \).

Proof

We note and . As detailed in [GPV08], the probability that outputs a given is proportional to

for \(\sigma _j = \sigma / \Vert c_j \Vert \) and some that depends on and . By assumption, , therefore by [MR04, Lemma 4.4]. Since is proportional to and both sum up to one, we have

from which we may conclude by using the relative error lemma. \(\Box \)

Plugging this result with the relative error lemma, we may use Klein’s sampler with \(\delta \approx 2\epsilon \) verifying Eq. 6, instead of \(\epsilon \le 2^{-\lambda }\) with the SD and \(\epsilon \le 2^{-\lambda /2}\) with the KLD [DLP14]. Compared to a SD-based analysis, this allows to squeeze \(\sigma \) by a factor \(\sqrt{\lambda /38}\) that can be as large as \({\approx }2.60\) for \(\lambda = 256\).

While it might seem a small gain, the security of trapdoor samplers is very sensitive to standard deviations variations. We estimate that this factor 2.60 allows to gain up to 30 bits of security (this claim is supported by e.g. [Pre15, Table 6.1]). A similar analysis for Peikert’s sampler [Pei10] yields a similar gain.

4.5 Trapdoor Sampling in Standard Precision

For our last application of the Rényi divergence, we conclude our analysis of Klein’s sampler (Algorithm 2), by performing its precision analysis. This section shows that it can be used safely in small precision.

First, we give a lemma that bounds the ratio of two Gaussian sums in with slightly different centers and standard deviations.

Lemma 7

(Ratio of Gaussian Sums in ). Let two arbitrary centers and standard deviations \(\sigma ,\bar{\sigma }> 0\). Let the Gaussian functions \(\rho (z) = \rho _{\sigma ,t}(z)\), \(\bar{\rho }(z) = \rho _{\bar{\sigma },\bar{t}}(z)\) and the distributions , . Let \( u(z) = \frac{(z-\bar{t})^2}{2\bar{\sigma }^2} - \frac{(z-t)^2}{2\sigma ^2}\). Then

Proof

We first prove the left inequality. We have

where the last inequality comes from Jensen’s inequality: since e is convex, . Following the same reasoning, one gets

\(\Box \)

This lemma is useful in the sense that it provides a relative error bound, which will be used in the next lemma in order use the relative error lemma. We now give a bound on the required precision for using safely Klein’s sampler.

Lemma 8

Let (resp. ) be the output distribution of Algorithm 2 over the input (resp. ), using precomputed values (resp. ). Let \(\delta , \epsilon \in (0, .01)\). We note:

If we have the following (error) bounds on the input of Algorithm 2:

  • \(|\bar{\sigma _j} - \sigma _j| \le \delta \sigma _j\) for all j

Then we have this inequality:

The Lemma 8 covers – but is not limited to – the case where and the \((\sigma _j)_j\)’s are known up to a relative error, and up to an absolute error. For any , , so it is perfectly reasonable to suppose .

Proof

This proof is rather long, so we explain its outline first. In

figure c

, we establish a bound , for some expressions AB. In

figure d

, we establish \(|A| \le C\) and

figure e

, we establish \(|B| \le C\). We conclude in

figure f

.

figure g

Let be a possible output of both samplers. We note and . There exist a unique n-tuple \((c_j)_j\) (resp. \((\bar{c_j})_j\)) such that at each step j, (resp. ) samples a discrete Gaussian in around \(c_j\) (resp. \(\bar{c_j}\)).

The probability that is output by is , where is uniquely defined by . Similarly, , where . We have

For each j, let \(u_j(z) = \frac{(z - \bar{c_j})^2}{2\bar{\sigma }_j^2} - \frac{(z - c_j)^2}{2 \sigma _j^2}\). Lemma 7 yields:

So that we have:

(10)

Let A and B be the left and right terms of the Eq. 10. If we can bound A and B, then we will be able to conclude by the relative error lemma.

figure h

Now, we bound A. We write \(\bar{\sigma }_j = (1+\delta _{\sigma _j})\sigma _j\), where each \(|\delta _{\sigma _j}| \le \delta \) by hypothesis. Developing \(u_j\) yields:

$$\begin{aligned} {\scriptstyle u_j(z_j) = \frac{1}{2(1+\delta _{\sigma _j})^2\sigma _j^2}\left[ (c_j - \bar{c_j})^2 + 2(c_j- \bar{c_j})(z_j -c_j) - (2\delta _{\sigma _j}+ \delta _{\sigma _j}^2)(z_j - c_j)^2 \right] } \end{aligned}$$
(11)

In order to bound \(c_j- \bar{c_j}\), we note that numerically, \(c_j\) is exactly , where is the j-th row of . Noting , and , we have:

Thus

(12)

In Eq. 12, we used the fact that:

  • , with the last inequality coming from [MR07, Lemma 4.4] (see Lemma 10 in the appendix)

We have:

(13)

In Eq. 13, the first line develops the formula for A by using Eq. 11. For the second line, we use [MR07, Lemma 4.2] (see Lemma 9 in the appendix) to bound the two expected values and the term 1.1 to absorb parasitic terms in \(\delta _{\sigma _j}\) and \(\epsilon \).

The third line replaces \(\sigma _j\) by by the bound \(\delta \cdot T\) from Eq. 12. For the fourth line, we notice that and (both equalities follow directly from the Lemma 4.4 of [GPV08]).

In the fifth line, we use the bounds , and : the first one comes from [MR07, Lemma 4.4], and the second one follows from the fact that there exists a vector with coefficients being only \(\pm 1\) such that . Applying the Cauchy-Schwartz theorem yields the bound. The last line simplifies as much as possible the expression.

figure i

We now bound B, the right part of Eq. 10. We can write \(u_j\) as follows:

$$\begin{aligned} {\scriptstyle u_j(z_j) = \frac{1}{\bar{\sigma }_j^2}\left[ - (1+\delta _{\sigma _j})^2(c_j - \bar{c_j})^2 + 2(1+\delta _{\sigma _j})^2(c_j- \bar{c_j})(z_j -c_j) - (2\delta _{\sigma _j}+ \delta _{\sigma _j}^2)(z_j - \bar{c_j})^2 \right] } \end{aligned}$$
(14)

To bound B, we replace the \(u_j\) in each \(u_j(\hat{z}_j)\) by the expression in Eq. 11, and the \(u_j\) in each by the expression of Eq. 14. This yields:

where the bound over |B| is obtained using the same techniques as for |A|. Overall, we see that \(|A|, |B| \le C\).

figure j

To conclude, we have , so . \(\Box \)

Practical implications of Lemma 8. We can now easily – given a few simplifications – apply the relative error lemma. Even though in theory we have , this is a worst-case bound [Pei10, Lemma 5.1]. In practice, it is reasonable to assume , with a small constant factor in the big O [Pre15, Sect. 6.5.2].Footnote 10

In addition, we make the simplification ,Footnote 11 which gives . It is also easy to make , so we consider that this is the case. Removing terms which are clearly negligible, and since \(e^C \underset{C \rightarrow 0}{\sim } 1 + C\), we have

(15)

For typical values of n (say, \(n=1024\)), we can take \(\delta = 2^{-37}/C' \approx 2^{-61}\), which is secure as per the argument of Sect. 3.3. Therefore, precision 61 is sufficient to securely use Klein’s sampler.

5 Conclusion and Open Problems

To conclude, we expose a few perspectives and open problems that we have encountered. Most of them are related to implementing the techniques we have introduced, but in our opinion extending our techniques to decision problems is probably the most challenging question.

The revisited table approach. It remains to see how the CoDF-based algorithm we proposed in Sect. 4.2 can be efficiently implemented and protected against side-channel attacks. Our approach also seems highly composable with existing techniques, and it would be interesting to find combinations that achieve better overall efficiency.Footnote 12 For example, a natural question would be to see how to combine it with Knuth-Yao trees (see e.g. [DG14]).

Rejection sampling in practice. The techniques that we described in Sect. 4.3 remain to be implemented, to assess their efficiency and whether they can easily be made impervious against side-channel attacks.

Precision analysis of trapdoor samplers. It would be interesting to apply the precision analysis of Sect. 4.5 to other samplers, such as the one of [Pei10]. A promising candidate would be a randomized variant of Ducas and Prest’s fast Fourier nearest plane [DP16]. The fast Fourier transform is known to be very stable numerically, and since this algorithm has the same structure, it seems likely that it will inherit this stability and require less than 53 bits of precision.

Decision problems. All the applications that we give are in the context of search problems. We would like to achieve the same efficiency for decision problems: as of today, one can use decision-to-search tricks in the random oracle model as in e.g. [DLP14, Sect. 4] or the results from [BLL+15, Sect. 4]. However, none of these solutions is fully satisfying and having efficient and generic Rényi security arguments for decision problems remain open.