Keywords

1 Introduction

Multilinear maps, starting with bilinear ones, are popular tools for designing cryptosystems. When pairings were introduced to cryptography [Jou04], many previously unreachable cryptographic primitives, such as identity-based encryption [BF03], became possible to construct. Maps of higher degree of linearity were conjectured to be hard to find – at least in the “realm of algebraic geometry” [BS03]. But in 2013, Garg, Gentry and Halevi [GGH13a] proposed a construction, relying on ideal lattices, of a so-called “graded encoding scheme” that approximates the concept of a cryptographic multilinear map.

As expected, graded encoding schemes quickly found many applications in cryptography. Already in [GGH13a] the authors showed how to generalise the 3-partite Diffie-Hellman key exchange first constructed with cryptographic bilinear maps [BS03] to N parties: the protocol allows N users to share a secret key with only one broadcast message each. Furthermore, a graded encoding scheme also allows constructing very efficient broadcast encryption [BS03, BWZ14]: a broadcaster can encrypt a message and send it to a group where only a part of it (decided by the broadcaster before encrypting) will be able to read it. Moreover, [GGH+13b] introduced indistinguishability obfuscation (iO) and functional encryption based on a variant of multilinear maps — multilinear jigsaw puzzles — and some additional assumptions.

The GGH Scheme. For a multilinearity parameter \(\kappa \), the principle of the symmetric GGH graded encoding scheme is as follows: given a ring R and a principal ideal \(\mathcal {I}\) generated by a small secret element \(g \in R\), a plaintext is a small element of \(R/\mathcal {I}\) and is viewed as a level-0 encoding. Given a level-0 encoding, it is easy increase the level to a higher level \(i \leqslant \kappa \), but it is assumed hard to come back to an inferior level. The encodings are additively homomorphic at the same level, and multiplicatively homomorphic up to \(\kappa \) operations. The multiplication of a level-i and a level-j encoding gives a level-\((i+j)\) encoding. Additionally, a zero-testing parameter \(p_{zt} \) allows testing if a level-\(\kappa \) element is an encoding of 0, and hence also allows testing if two level-\(\kappa \) encodings are encoding the same elements. Finally, the extraction procedure uses \(p_{zt} \) to extract \(\ell \) bits which are a “canonical” representation of a ring element given its level-\(\kappa \) encoding.

More precisely, in GGH we are given \(R = {{\mathbb {Z}}} [X] /(X^n +1)\), where n is a power of 2, a secret element z uniformly sampled in \(R_q=R/qR\) (for a certain prime number q), and a public element y which is a level-1 encoding of 1 of the form \({\left[ a/z\right] }_{q} \) for some small a in the coset \(1+\mathcal {I}\). We are also given m level-i encodings of 0 named \(x^{({i})}_{j} \), for all \(1 \leqslant i \leqslant \kappa \), and a zero-testing parameter \(p_{zt} \). To encode an element of \(R/\mathcal {I}\) at level-i (for \(i \leqslant \kappa \)), we multiply it by \(y^{i}\) in \(R_q\) (which give an element of the form \({\left[ c/z^{i}\right] }_{q} \), where c is an arbitrary small coset representative). Then, we add a linear combination of encodings of 0 at level-i of the form \(\sum _j \rho _j x_j^{(i)}\) to it where the \(\rho _j\) are sampled from a certain discrete Gaussian. This last step is the re-randomisation process and ought to ensure that the analogue of the discrete logarithm problem is hard: going from level-i to level-0, for example by multiplying the encoding by \(y^{-i}\). We will see later that the encodings of zero made public for this step are a problem for the security of the scheme.

The asymmetric variant of this scheme replaces levels by “groups” which are identified with subsets of \(\{1,\dots ,\kappa \}\). Addition of two elements in the same group stays within the group, multiplying two elements of different groups with disjoint index sets produces an element in the group defined by the union of their index sets. These groups are realised by defining one \(z_i\) for each index \(1 \leqslant i \leqslant \kappa \) and then dividing by the appropriate product of \(z_i\). Given a group characterised by \(S \subseteq \{1,\dots ,\kappa \}\) we call the cardinality of S its level.

We can distinguish between GGH instances where encodings of zero are made publicly available to allow anyone to encode elements and those where this is not the case. The latter are also called “Multilinear Jigsaw Puzzles” and were introduced in [GGH+13b] as a building block for indistinguishability obfuscation. Such instances can be thought of as secret-key graded encoding schemes. To distinguish the two cases, we denote those instances where no encodings of zero \(x^{({i})}_{j} \) are published as GGH\(_{s}\). In such instances the secret elements g and \(z_i\) are required to encode elements at levels above zero.

Security. Already in [GGH13a] it was shown that an attacker can recover the ideal (g) and the coset of (g) for any encoding at level \(\leqslant \kappa \) if encodings of zero are made available. However, since these representatives of either (g) or the cosets are not small, it was believed that these “weak discrete log” attacks would not undermine the central security goal of GGH – the analogue of the BDDH assumption. However, in [HJ15] it was shown that these attacks can be extended to recover short representatives of the cosets. As a consequence, if encodings of zero are published, then [HJ15] breaks the GGH security goals in many scenarios and it is not clear, at present, if and how GGH-like graded encoding schemes can be defended against such attacks. A candidate proposal to prevent weak discrete logarithm attacks was proposed in [CLT15, Appendix G], where the strategy is to change zero testing to make it non-linear in the encodings such that the attack does not work anymore. However, no security analyses was provided in [CLT15] and revision 20150516:083005 of [CLT15] drops any mention of this candidate fix. Hence, the status of GGH-like schemes where encodings of zero are published is currently unclear. However, we note that GGH\(_s\), where no encodings of zero are made available, does not appear to be vulnerable to weak discrete log attacks if the freedom of an attacker to produce encodings of zero at the higher levels is also severely restricted to prevent generalisations of “zeroizing” attacks such as [CGH+15]. Such variants are the central building block of indistinguishability obfuscation, i.e. this case has important applications despite being more limited in functionality. Indeed, at present no known attack threatens the security of indistinguishability obfuscation constructed from graded encoding schemes such as GGH.

Alternative Constructions. An alternative instantiation of graded encoding schemes over the integers promising practicality was proposed by Coron, Lepoint and Tibouchi [CLT13]. This first proposal was also broken in polynomial time using public encodings of zero in [CHL+15]. The attack was later generalised in [CGH+15] and a candidate defence against these attacks was proposed in [CLT15]. The authors of [CLT15] also provided a C++ implementation of a heuristic variant of this scheme. They report that the Setup phase of an 7-partite Diffie-Hellman key exchange takes 4528 s (parallelised on 16 cores), publishing a share (Publish) takes 7.8 s per party (single core) and the final key derivation (KeyGen) takes 23.9 s per party (single core) for a level of security \(\lambda = 80\).

Instantiation. The implementation reported in [CLT15] is to date the only implementation of a candidate graded encoding scheme. This is partly because instantiating the original GGH construction is too costly in practice for anything but toy instances. In 2014, Langlois, Stehlé and Steinfeld [LSS14a] proposed a variant of GGH called GGHLite, improving the re-randomisation process of the original scheme. It reduces the number m of re-randomisers, public encodings of zero, needed from \(\Omega (n\log n)\) to 2 and also the size of the parameter \(\sigma _i^\star \) of the Gaussian used to sample multipliers \(\rho _j\) during the re-randomisation phase from \(\widetilde{\mathcal {O}}(2^{\lambda }\, \lambda \, n^{4.5} \kappa )\) to \(\widetilde{\mathcal {O}}(n^{5.5} \sqrt{\kappa })\). These improvements allow reducing the size of the public parameters and improving the overall efficiency of the scheme. But even though [LSS14a] made a step forward towards efficiency and in some cases no public re-randomisation is required at all (GGH\(_s\)), GGH-like schemes are still far from being practical.

Our contribution. Our main contribution is a first and efficient implementation of improved GGH-like schemes which we make publicly available under an open-source license. This implementation covers symmetric and asymmetric flavours and we allow encodings of zero to be published or not. However, since the security of GGH-like constructions is unclear when encodings of zero are published, we do not discuss this variant in this paper. We note, however, that our implementation provides a good basis for implementing any future fixes and improvements for GGH-based graded encoding schemes.

Implementing GGH-like schemes efficiently such that non-trivial levels of multilinearity and security can be achieved is not straight forward and to obtain an implementation we had to address several issues. In particular, we contribute the following improvements to make GGH-like multilinear maps instantiable:

  • We show that we do not require (g) to be a prime ideal for the existing proofs to go through. Indeed, sampling an element \(g \in {\mathbb {Z}} [X]/(X^n+1)\) such that the ideal it generates is prime, as required by GGH and GGHLite, is a prohibitively expensive operation. Avoiding this check is then a key step to allow us to go beyond toy instances.

  • We give a strategy to choose practical parameters for the scheme and extend the analysis of [LSS14a] to ensure the correctness of all the procedures of the scheme. Our refined analysis reduces the bitsize of q by a factor of about 4, which in turn reduces the required dimension n.

  • We apply the analyses from [CS97] to pick parameters to defend against lattice attacks.

  • For all steps during the instance generation we provide implementations and algorithms which work in quasi-linear time and efficiently in practice. In particular, we provide algorithms and implementations for inverting in some Cyclotomic number fields, for computing norms of ideals in some Cyclotomic number rings, for producing short representatives of elements modulo (g) and for sampling from discrete Gaussians in \(\widetilde{\mathcal {O}}(n)\). For the latter we use Ducas and Nguyen’s strategy [Duc13] Our implementation of these operations might be of independent interest (cf. [LP15] for recent work on efficient sampling from a discrete Gaussian distribution), which is why they are available as a separate module in our code.

  • We discuss our implementation and report on experimental results.

Our results (cf. Table 1) are promising, as we manage to compute up to multilinearity level \(\kappa =52\) (resp. \(\kappa =38\)) at security level \(\kappa =52\) (resp. \(\lambda = 80\)) in the asymmetric GGH\(_{\textsc {s}}\) case. We note that much smaller levels of multilinearity have been used to realise non-trivial functionality in the literature. For example, [BLR+15] reports on comparisons between 16-bit encrypted values using a 9-linear map (however, this result holds in a generic multilinear map model). We note that the results in Table 1, where no encodings of zero are made available, are not directly comparable with those reported in [CLT15].

Table 1. Computing a \(\kappa \)-level asymmetric multilinear maps with our implementation without encodings of zero. Column \(\lambda \) gives the minimum security level we accepted, column \(\lambda '\) gives the actually expected security level based on the best known attacks for the given parameter sizes. Timings produced on Intel Xeon CPU E5–2667 v2 3.30 GHz with 256 GB of RAM, parallelised on 16 cores, but not all operations took full advantage of all cores. Setup gives the time for generating the GGH instance. Encode lists the time it takes to reduce an element \(\in {\mathbb {Z}} _p\) with \(p = \mathcal {N}({\mathcal {I}}) \) to a small element in \({\mathbb {Z}} [X]/\left( X^n+1 \right) \) modulo (g). Mult lists the time to multiply \(\kappa \) elements. All times are wall times.

Technical Overview. Our implementation relies on FLINT [HJP14]. However, we provide our own specialised implementations for operations in the ring of integers of Cyclotomic number fields where the degree is a power of two and related rings as listed above.

Our variant of GGH foregoes checking if g generates a prime ideal. During instance generation [GGH13a, LSS14a] specify to sample g such that (g) is a prime ideal. This condition is needed in [GGH13a, LSS14a] to ensure that no non-zero encoding passes the zero-testing test and to argue that the non-interactive N-partite key exchange produces a shared key with sufficient entropy. We show that for both arguments we can drop the requirement that g generates a prime ideal. This was already mentioned as a potential improvement in [Gar13, Section 6.3] but not shown there. As rejection sampling until a prime ideal (g) is found is prohibitively expensive due to the low density of prime ideals in \({\mathbb {Z}} [X]/(X^n+1)\), this allows speeding-up instance generation such that non-trivial instances are possible. We also provide fast algorithms and implementations for checking if \((g) \subset {\mathbb {Z}} [X]/(X^n+1)\) is prime for applications which still require prime (g).

We also improve the size of the two parameters q and \(\ell \) compared to [LSS14a]. We first perform a finer analysis than [LSS14a], which allows us to reduce the size of the parameter q by a factor 2. Then, we introduce a new parameter \(\xi \), which controls what fraction of q is considered “small”, i.e. passes the zero-testing test, which reduces the size of q further. This also reduces the number of bits extracted from each coefficient \(\ell \). Indeed, instead of setting \(\ell = 1/4 \log q - \lambda \) where \(\lambda \) is the security parameter, we set \(\ell = \xi \log q - \lambda \) with \(0 < \xi \leqslant 1/4\). We then show that for a good choice of \(\xi \) this is enough to ensure the correctness of the extraction procedure and the security of the scheme. Overall, our refined analysis allows us to reduce the size of \(q \approx {(3n^{\frac{3}{2}}\sigma ^{\star }_1 \sigma ')}^{8\kappa }\) in [LSS14a] to \(q \approx {(3n^{\frac{3}{2}}\sigma ^{\star }_1 \sigma ')}^{(2+\varepsilon )\kappa }\) which, in turn, allows reducing the dimension n. When no encodings of zero are published we simply set \(\sigma _1^\star = 1\) and apply the same analysis.

Open Problems. The most pressing question at this point is whether GGH-like constructions are secure. There exist no security proofs for any variant and recent cryptanalysis results recommend caution. Even speculating that secure variants of GGH-like multilinear maps can be found, performance is still an issue. While we manage to compute approximate multilinear maps for relatively high levels of \(\kappa \) in this work, all known schemes are still at least quadratic in \(\kappa \) which presents a major obstacle to efficiency. Any improvement which would reduce this to something linear in \(\kappa \) would mean a significant step forward. Finally, establishing better estimates for lattice reduction and tuning the parameter choices of our schemes are areas of future work.

Roadmap. We give some preliminaries in Sect. 2. In Sect. 3 we describe the GGH-like asymmetric graded encoding schemes and the multilinear jigsaw puzzles used for iO. In Sect. 4, we explain our modifications to GGH-like schemes, especially concerning the parameter q. We also recall a lattice attack to derive the parameter n and show that we do not require (g) to be prime. In Sect. 5, we give the details of our implementation.

2 Preliminaries

Lattices and Ideal Lattices. An m-dimensional lattice L is an additive subgroup of \(\mathbb {R}^m\). A lattice L can be described by its basis \(B = \{b_1,b_2,\dots ,b_k\}\), with \(b_i \in \mathbb {R}^m\), consisting in k linearly independent vectors, for some \(k \leqslant m\), called the rank of the lattice. If \(k=m\), we say that the lattice has full-rank. The lattice L spanned by B is given by \(L = \{\sum _{i=1}^k c_i \cdot b_i, c_i \in {\mathbb {Z}} \}\). The volume of the lattice L, denoted by \(\text {vol} (L)\), is the volume of the parallelepiped defined by its basis vectors. We have \(\text {vol} (L) = \sqrt{\det (B^T B)}\), where B is any basis of L.

For n a power of two, let \(f(X) \in {\mathbb {Z}} [X]\) be a monic polynomial of degree n (in our case, \(f(X) = X^n + 1\)). Then, the polynomial ring \(R = {\mathbb {Z}} [X]/f(X)\) is isomorphic to the integer lattice\(~{\mathbb {Z}} ^n\), i.e. we can identify an element \(u(X) = \sum _{i=0}^{n-1}u_i\cdot X^i \in R\) with its corresponding coefficient vector \((u_0,u_1,\dots ,u_{n-1})\). We also define \(R_q = R/qR = {\mathbb {Z}} _q[X]/(X^n+1)\) (isomorphic to \(~{\mathbb {Z}} _q^n\)) for a large prime q and \(K = {{\mathbb {Q}}} [X] / (X^n+1)\) (isomorphic to\(~{\mathbb {Q}} ^n\)).

Given an element \(g\in R\), we denote by \(\mathcal {I}\) the principal ideal in R generated by g: \( (g) = \{g\cdot u: u\in R\}\). The ideal (g) is also called an ideal lattice and can be represented by its \({\mathbb {Z}} \)-basis \((g,X \cdot g,\dots ,X^{n-1}\cdot g)\). We denote by \(\mathcal {N}(g)\) its norm. For any \(y \in R\), let \([y]_g\) be the reduction of y modulo \(\mathcal {I}\). That is, \([y]_g\) is the unique element in R such that \(y-[y]_g \in (g)\) and \([y]_g = \sum _{i=0}^{n-1} y_i X^ig\), with \(y_i \in [-1/2,1/2), \forall i, 0\leqslant i \leqslant n-1\). Following [LSS14a] we abuse notation and let \(\sigma _n(b)\) denotes the last singular value of the matrix \(\text {rot} (b) \in {\mathbb {Z}} ^{n \times n}\), for any \(b \in \mathcal {I}\). For \(z \in R\), we denote by \(\text {MSB}_{\ell } \in \{ 0,1 \}^{\ell \cdot n}\) the \(\ell \) most significant bits of each of the n coefficients of z in R.

Gaussian Distributions. For a vector \(c \in \mathbb {R}^n\) and a positive parameter \(\sigma \in \mathbb {R}\), we define the Gaussian distribution of centre c and width parameter \(\sigma \) as \( \rho _{\sigma ,c}(x) = \exp (-\pi \frac{||x-c||^2}{\sigma ^2} ), \text { for all } x \in \mathbb {R}^n.\) This notion can be extended to ellipsoid Gaussian distribution by replacing the parameter \(\sigma \) with the square root of the covariance matrix \(\varSigma = BB^t \in ~\mathbb {R}^{n\times n}\) with \(\det (B) \ne 0\). We define it by \(\rho _{\sqrt{\varSigma },c}(x) = \exp (-\pi \cdot (x-c)^t(B^tB)^{-1}(x-c))\), for all \(x \in \mathbb {R}^n\). For L a subset of \({\mathbb {Z}} ^n\), let \(\rho _{\sigma ,c}(L) = \sum _{x\in L}\rho _{\sigma ,c}(x)\). Then, the discrete Gaussian distribution over L with centre c and standard deviation \(\sigma \) (resp. \(\sqrt{\varSigma }\)) is defined as \(D_{L,\sigma ,c}(y) = \frac{\rho _{\sigma ,c}(y)}{\rho _{\sigma ,c}(L)}, \text { for all } y \in L.\) We use the notations \(\rho _\sigma \) (resp. \(\rho _{\sqrt{\varSigma }}\)) and \(D_{L,\sigma }\) (resp. \(D_{L,\sqrt{\varSigma }}\)) when c is 0.

Finally, for a fixed \(Y = (y_1,y_{2}) \in R^2\), we define: \(\widetilde{\mathcal {E}}_{Y,s} = y_1 D_{R,s} + y_2 D_{R,s}\) as the distribution induced by sampling \(\mathbf {u} = (u_1,u_2) \in R^2\) from a discrete spherical Gaussian with parameter s, and outputting \(y = y_1 u_1 + y_2 u_2\). It is shown in [LSS14a, Theorem 5.1] that if \(Y \cdot R^{2} = \mathcal {I}\) and \(s \geqslant \max (\Vert g^{-1}y_1\Vert _{\infty },\Vert g^{-1}y_2\Vert _{\infty }) \cdot n \cdot \sqrt{2 \log ( 2 n (1+1/\varepsilon )) / \pi }\) for \(\varepsilon \in (0,1/2)\), this distribution is statistically close to the Gaussian distribution \(D_{\mathcal {I},s Y^T}\).

3 GGH-like Asymmetric Graded Encoding Scheme

We now recall the definitions given in [GGH+13b, Section 2.2] for the notions of Jigsaw specifier, Multilinear Form and Multilinear Jigsaw puzzle.

Definition 1

([GGH+13b, Definition 5]). A Jigsaw specifier is a tuple \((\kappa ,\ell ,A)\) where \(\kappa , \ell \in {\mathbb {Z}} ^+\) are parameters and A is a probabilistic circuit with the following behavior: On input a prime number q, A outputs the prime q and an ordered set of \(\ell \) pairs \((S_1,a_1), \ldots , (S_{\ell },a_{\ell })\) where each \(a_i \in {\mathbb {Z}} _q\) and each \(S_i \subseteq [\kappa ]\).

Definition 2

([GGH+13b, Definition 6 and 7]). A Multilinear Form is a tuple \(\mathcal {F}=(\kappa ,\ell ,\varPi ,F)\) where \(\kappa , \ell \in {\mathbb {Z}} ^+\) are parameters and \(\varPi \) is a circuit with \(\ell \) input wires, made out of binary and unary gates. F is an assignment of an index set \(I \subseteq [\kappa ]\) to every wire of \(\varPi \). A multilinear form must satisfies constraints given in the original definition (on gates, and the output wire is assigned to \([\kappa ]\)).

We say that a Multilinear Form \(\mathcal {F}=(\kappa ',\ell ',\varPi ,F)\) is compatible with \(X=((S_1,a_1), \ldots , (S_{\ell },a_{\ell }))\) if \(\kappa =\kappa '\), \(\ell =\ell '\) and the input wires of \(\varPi \) are assigned to the sets \(S_1, \ldots , S_{\ell }\). The evaluation of \(\mathcal {F}\) on X is then doing arithmetic operations on the inputs depending on the gates. We say that the evaluation succeeds if the final output is \(([\kappa ],0)\).

We now define the Multilinear Jigsaw Puzzles.

  • Jigsaw Generator: \(\mathsf {JGen}(\lambda ,\kappa ,\ell ,\) A \() \rightarrow (q,X,\mathsf {puzzle})\). This algorithm takes as input \(\lambda \), and a Jigsaw specifier \((\kappa ,\ell ,A)\). It outputs a prime q, a private output X and a public output \(\mathsf {puzzle}\). The generator is using a pair of PPT algorithms \(\mathsf {JGen} = (\mathsf {InstGen},\mathsf {Encode}) \).

    • \(\mathsf {InstGen}(\lambda , \kappa ) \rightarrow (q,\mathsf {params}, s)\). This algorithm takes \(\lambda \) and \(\kappa \) as inputs and outputs \((q,\mathsf {params}, s)\), where q is a prime of size at least \(2^{\lambda }\), \(\mathsf {params}\) is a description of public parameters, and s is a secret state to pass to the encoding algorithm.

    • \(\mathsf {Encode}(q,\mathsf {params},s,(S,a)) \rightarrow (S,u)\). The encoding algorithm takes as inputs the prime q, the parameters \(\mathsf {params}\), the secret state s, and a pair (Sa) with \(S \subseteq [\kappa ]\) and \(a \in {\mathbb {Z}} _q\) and outputs u, an encoding of a relative to S.

    More precisely, the algorithm runs the Jigsaw specifier on input q to get \(\ell \) pairs \((S_1,a_1), \ldots , (S_{\ell },a_{\ell })\). Then encodes all the plaintext elements by using the \(\mathsf {Encode}\) algorithm on each \((S_i,a_i)\) which return \((S_i,u_i)\). We have:

    $$X=(q,(S_1,a_1), \ldots ,(S_{\ell },a_{\ell })) \text { and } \mathsf {puzzle}=(\mathsf {params}, (S_1,u_1), \ldots ,(S_{\ell },u_{\ell })).$$
  • Jigsaw Verifier: \(\mathsf {JVer}(\mathsf {puzzle}, \mathcal {F}) \rightarrow \{0,1\}\) . This algorithm takes as input the public output of a Jigsaw Generator \(\mathsf {puzzle}\), and a multilinear form \(\mathcal {F}\). It outputs either accept (1) or reject (0).

Correctness. For an output \((q,X,\mathsf {puzzle})\) and a form \(\mathcal {F}\) compatible with X, we say that the verifier \(\mathsf {JVer}\) is correct if either the evaluation of \(\mathcal {F}\) on X succeeds and \(\mathsf {JVer}(\mathsf {puzzle}, \mathcal {F})=1\) or the evaluation fails and \(\mathsf {JVer}(\mathsf {puzzle}, \mathcal {F})=0\). We require that with high probability over the randomness of the generator, the verifier will be correct on all forms.

Security. The hardness assumptions for the Multilinear Jigsaw puzzle requires that for two different polynomial-size families of Jigsaw Specifier \({\{ (\kappa _{\lambda },\ell _{\lambda },A_{\lambda }) \}}_{\lambda \in {\mathbb {Z}} ^+}\) and \({\{ (\kappa _{\lambda },\ell _{\lambda },A'_{\lambda }) \}}_{\lambda \in {\mathbb {Z}} ^+}\) the public output of the Jigsaw Generator on \((\kappa _{\lambda },\ell _{\lambda },A_{\lambda })\) will be computationally indistinguishable from the public output of the Jigsaw Generator on \((\kappa _{\lambda },\ell _{\lambda },A'_{\lambda })\).

3.1 Using GGH to Construct Jigsaw Puzzles

In Fig. 1, we describe a GGH-like asymmetric graded encoding scheme without encodings of zero based on the definition of GGHLite from [LSS14a]. Below, we explain how to use those procedures to construct the Jigsaw Generator, described in [GGH+13b, Appendix A].

Fig. 1.
figure 1

\(\mathsf {GGH}\)-like asymmetric graded encoding scheme adapted from [LSS14a].

  • Jigsaw Generator. The Jigsaw Generator uses \(\mathsf {InstGen}\) to generate all the public (\(\mathsf{params}\) and \(p_{zt} \)) and secret parameters of the multilinear map. Each level of the multilinear map will be associated with a subset of the set \([\kappa ]\). To create the puzzle pieces, which are encodings of some elements of R at different level, the Generator simply encodes some random elements at level \(S \subset [1,\kappa ]\), those are given as \(\mathsf {puzzle}\).

  • Jigsaw Verifier. The verifier is given the public parameters \(\mathsf{params}\) and \(p_{zt} \), a valid form \(\varPi \) (which is defined [GGH+13b, Def. 6] in as a circuit made of binary and unary gates) and \(\mathsf {puzzle}\), an input for \(\varPi \) (which are some encodings). The verifier is then evaluating \(\varPi \) on these input using \(\mathsf {Add}\) for addition gates and \(\mathsf {Mult}\) for multiplication gates. The verifier must succeeds if the evaluation of \(\mathcal {F}\) on X succeeds, which means that the final output of the evaluation is an encoding of zero at level \(\kappa \). The verifier is invoking the zero-testing procedure, and outputs 1 if the test passes, 0 otherwise.

4 Modifications to and Parameters for GGH-like Schemes

In this section, we first show that we do not require a prime (g) and then describe a method which allows to reduce the size of two parameters: the modulus q and the number \(\ell \) of extracted bits. In Sect. 4.3 then we describe the lattice-attack against the scheme which we use to pick the dimension n. Finally, we describe our strategy to choose parameters that satisfy all these constraints.

4.1 Non-prime (g)

Both GGHLite and GGH-like jigsaw puzzles as specified in Fig. 1 require to sample a g such that (g) is a prime ideal. However, finding such a g is prohibitively expensive. While checking each individual g whether (g) is a prime ideal is asymptotically not slower than polynomial multiplication, finding such a g requires to run this check often. The probability that an element generates a prime ideal is assumed to be roughly \(1/(n^c)\) for some constant \(c>1\) [Gar13, Conjecture 5.18], so we expect to run this check \(n^c\) times. Hence, the overall complexity is at least quadratic in n which is too expensive for anything but toy instances.

Primality of (g) is used in two proofs. Firstly, to ensure that after multiplying \(\kappa +1\) elements in \(R_g\) the product contains enough entropy. This is used to argue entropy of the N-partite non-interactive key exchange. Secondly, to prove that \(c\cdot h/g\) is big if \(c, h \not \in g\) (cf. Lemma 2). Below, we show that we can relax the conditions on g for these two arguments to still go through, which then allows us to drop the condition that (g) should be prime. We note, though, that some other applications might still require g to be prime and that future attacks might find a way to exploit non-prime (g).

Entropy of the Product. The next lemma shows that excluding prime factors \(\leqslant 2N\) and guaranteeing \(\mathcal {N}({g}) \geqslant 2^n\) is sufficient to ensure \(2\lambda \) bits of entropy in a product of \(\kappa +1\) elements in \(R_g\) with overwhelming probability. We note that both conditions hold with high probability, are easy to check and are indeed checked in our implementation.

Lemma 1

Let \(\kappa \geqslant 2\), \(\lambda \) be the security parameter and \(g \in {\mathbb {Z}} [X]/(X^n+1)\) with norm \(p = \mathcal {N}({g}) \geqslant 2^n\) such that p has no prime factors \(\leqslant 2\kappa +2\), and such that \(n \geqslant \kappa \cdot \lambda \cdot \log (\lambda )\). Then, with overwhelming probability, the product of \(\kappa +1\) uniformly random elements in \(R_g\) has at least \(\kappa \cdot \lambda \cdot \log (\lambda )/4\) bits of entropy.

Proof

Write \(p = \prod _{i=1}^{r} p_i^{e_i}\) where \(p_i\) are distinct primes and \(e_i \ge 1\) for all i. Let us consider the set \(\mathcal {S} = \{i \in \{1,\dots ,r\}: e_i = 1 \}\). Then, following [CDKD14] we define \(p_s = \prod _{i \in \mathcal {S}} p_i\) as the square-free part of p. Asymptotically, it holds that \(\#\{p \leqslant x : p/p_s > p_s\}\) is \(cx^{3/4}\) for some computable constant c (cf. [CDKD14]). Since in our case we have \(x\geqslant 2^n\), this implies that with overwhelming probability it holds that \(p_s \geqslant \sqrt{p}\) and hence \(\log (p_s) \geqslant n/2\).

By the Chinese Remainder Theorem, \(R_g\) is isomorphic to \(F_1 \times \cdots \times F_{r}\) where each “slot” \(F_i = {\mathbb {Z}} _{p_i^{e_i}}\). The set of \(F_i\), for \(i \in \mathcal {S}\) corresponds to the square-free part of p. Those \(F_i\) are fields, and each of them has order \(p_i \geqslant 2N\) which means that a random element in such \(F_i\) is zero with probability \(1/p_i\). In those slots, the product of N elements has \(E_s\) bits of entropy, where

$$\begin{aligned} E_s = \sum _{i \in \mathcal {S}} \left( 1 - \frac{N}{p_i} \right) \log (p_i). \end{aligned}$$

First, as \(p_i \geqslant 2N\) for all \(i \in \mathcal {S}\), the quotient \(N/p_i \leqslant 1/2\) and then \(\left( 1 - \frac{N}{p_i} \right) \geqslant 1/2\) for all \(i \in \mathcal {S}\). This implies that

$$E_s \geqslant 1/2 \sum _{i \in \mathcal {S}} \log (p_i) = 1/2 \log \Big (\prod _{i \in \mathcal {S}} p_i\Big ) = 1/2\log (p_s).$$

Because \( \log (p_s) \geqslant n/2\), we conclude that \(E_s \geqslant \frac{n}{4} \geqslant \frac{\kappa \cdot \lambda \cdot \log (\lambda )}{4}\).    \(\square \)

Probability of False Positive. It remains to be shown that we can ensure that there are no false positives even if (g) is not prime. In [GGH13a, Lemma 3] false positives are ruled out as follows. Let \(u = {\left[ c / z^{\kappa }\right] }_{q} \) where c is a short element in some coset of \(\mathcal {I}\), and let \(w = {\left[ p_{zt} \cdot u\right] }_{q} \), then we have \(w = {\left[ c \cdot h / g\right] }_{q} \). The first step in [GGH13a] is to suppose that \(\left\| {g \cdot w}\right\| _{} \) and \(\left\| {c \cdot h}\right\| _{} \) are each at most q / 2, then, since \( g \cdot w = c \cdot h \mod q\) we have that \( g \cdot w = c \cdot h\) exactly. We also have an equality of ideals: \((g) \cdot (w) = (c) \cdot (h)\), and then several cases are possible. If (g) is prime as in [GGH13a, Lemma 3], then (g) divides either (c) or (h) and either c or h is in (g). As, by construction, none of them is in (g) if c is not in \(\mathcal {I}\), either \( \left\| {g \cdot w}\right\| _{} \) or \(\left\| {c \cdot h}\right\| _{} \) is more than q / 2. Using this, they conclude that there is no small c (not in \(\mathcal {I}\)) such that w is small enough to be accepted by the zero-test.

Our approach is to simply notice that all we require is that (g) and (h) are co-prime. Checking if (g) and (h) are co-prime can be done by checking \(\gcd (\mathcal {N}({g}),\mathcal {N}({h})) = 1\). However, computing \(\mathcal {N}({h}) \) is rather costly because h is sampled from \(D_{{\mathbb {Z}} ^n,\sqrt{q}}\) and hence has a large norm. To deal with this issue we notice that if \(\gcd (\mathcal {N}({g}),\mathcal {N}({h})) \ne 1\) then we also have \(\gcd (\mathcal {N}({g}),\mathcal {N}({h \mod g})) \ne 1\) which can be verified with a simple calculation. Now, interpreting \(h \mod g\) as “a small representative of h modulo g”, we can compute \(h \mod g\) as \(h - g \cdot \lfloor g^{-1} \cdot h\rceil \), which produces an element of size \(\approx \sqrt{n}\cdot \left\| {g}\right\| _{} \). We can use this observation to reduce the complexity of checking if (g) and (h) are co-prime to computing two norms for elements of size \(\left\| {g}\right\| _{} \) and \(\approx \sqrt{n}\cdot \left\| {g}\right\| _{} \) and taking their gcd. Furthermore, this condition holds with high probability, i.e. we only have to perform this test \(\mathcal {O}(1)\) times. Indeed, by ruling out likely common prime factors first, we expect to run this test exactly once. Hence, checking co-primality of (g) and (h) is much cheaper than finding a prime (g) but still rules out false positives.

Finally, we note that recent proposals of indistinguishability obfuscation from multilinear maps [Zim15, AB15] requires composite order maps. These are not the maps we are concerned with here as in [Zim15, AB15] it is assumed that the factorisation of (g) is known. However, we note that our techniques and implementation easily extend to this case by considering \(g = g_1 \cdot g_2\) for known co-prime \(g_1\) and \(g_2\).

4.2 Reducing the Size of q

In this section, we show how to reduce q for which we consider the case where re-randomisers are published for level-1 but no other levels. This matches the requirements of the N-partite Diffie-Hellman key exchange but not the Jigsaw puzzle case. However, when no re-randomisers are published we may simply set \(\sigma ^\star _{1} = 1\) and apply the same analysis. Hence, assuming that re-randomisers are published fits our framework in all cases and makes our analysis compatible with previous work. We note that the analysis can be easily generalised to accommodate re-randomisers at higher levels than one by increasing q to accommodate “numerator growth”.

The size of q is driven from both correctness and security considerations. To ensure the correctness of the zero-testing procedure, [LSS14a] showed the two following lower bounds on q. Equation 1 implies that false negatives do not exist, and Eq. 2 implies that the probability of false positive occurrence is negligible:

$$\begin{aligned} q&> \max \left( {(n\ell _{g^{-1}})}^8, {(3n^{\frac{3}{2}} \sigma _1^\star \sigma ')}^{8\kappa }\right) , \end{aligned}$$
(1)
$$\begin{aligned} q&> {(2n\sigma )}^4. \end{aligned}$$
(2)

The strongest constraint for q is given by the inequality \(q > {(3 n^{\frac{3}{2}}\sigma _1^\star \sigma ')}^{8\kappa }\). It comes from the fact that for any level-\(\kappa \) encoding u of 0, the inequality \(\Vert p_{zt} u\Vert _\infty < q^{3/4}\) has to hold. The condition is needed for the correctness of zero-testing and extraction.

New parameter \(\xi \) . The choice suggested in [LSS14a] is to extract \(\ell = \log (q)/4 - \lambda \) bits from each element of the level-\(\kappa \) encoding. We show that this supplies much more entropy than needed and that we can sample a smaller fraction, \(\ell = \xi \log (q) - \lambda \) bits. The equation for q can be rewritten in terms of the variable \(\xi \), by setting the initial condition \(\Vert p_{zt} \, u\Vert _\infty < q^{1-\xi }\).

Lemma 2

(Adapted from Lemma A.1 in  [LSS14b]). Let \(g \in R\) and \(\mathcal {I}=(g)\), let \(c,h \in R\) such that \(c \notin \mathcal {I}\), (g) and (h) are co-prime, \(\left\| {c \cdot h}\right\| _{} < q/2\) and \(q>{(2tn\sigma )}^{1/\xi }\) for some \(t \geqslant 1\) and any \(0 < \xi \leqslant 1/4\). Then \(\Vert {\left[ c \cdot h/ g\right] }_{q} \Vert > t\cdot q^{1-\xi }\).

Proof

From [GGH13a, Lemma 3] and the discussion in Sect. 4.1 we know that since \(\left\| {c \cdot h}\right\| _{} < q/2\) we must have \(\left\| {g \cdot {\left[ c \cdot h/ g\right] }_{q}}\right\| _{} > q/2\) if (g) and (h) are co-prime (note that \(c\cdot h \ne g \cdot {\left[ c \cdot h/ g\right] }_{q} \) in \(R/(X^n+1)\)). So we have \(\left\| {g \cdot {\left[ c \cdot h/g\right] }_{q}}\right\| _{} > q/2 \Longrightarrow \sqrt{n} \left\| {g}\right\| _{} \cdot \left\| {{\left[ c \cdot h/g\right] }_{q}}\right\| _{} > q/2 \Longrightarrow \left\| {{\left[ c \cdot h/g\right] }_{q}}\right\| _{} > q/(2n \sigma )\). We have \(t \cdot q^{1-\xi } = t \cdot q/q^{\xi } < t \cdot q/(2tn \sigma ) = q/(2n \sigma )\) and the claim follows.    \(\square \)

Correctness of Zero-Testing. We can obtain a tighter bound on q by refining the analysis in [LSS14a]. Recall that \(\Vert {\left[ p_{zt} \, u\right] }_{q} \Vert _\infty = \Vert {\left[ hc/g\right] }_{q} \Vert _\infty = \Vert h ~\cdot ~ c/g\Vert _\infty \leqslant \Vert h\Vert \cdot \Vert c/g\Vert \leqslant \Vert h\Vert \cdot \Vert c\Vert \cdot \Vert g^{-1}\Vert \sqrt{n}\). The first inequality is a direct application of the inequalities between the infinity norm of a product and the product of the Euclidean norms, the second comes from [Gar13, Lemma 5.9].

Since \(h \leftarrow D_{R,\sqrt{q}}\), we have \(\Vert h\Vert \leqslant \sqrt{n}q^{1/2}\). Moreover, c can be written as a product of \(\kappa \) level-1 encodings \(u_i\), for \(i=1,\dots ,\kappa \), i.e., \(c = \prod _{i=1}^{\kappa }u_i\). Thus, \(\left\| {c}\right\| _{} \leqslant {(\sqrt{n})}^{\kappa -1}{(\max _{i=1,\dots ,\kappa }\Vert u_i\Vert )}^\kappa \) since each of the \(\kappa -1\) multiplications brings an extra \(\sqrt{n}\) factor. Let \(u_{\max }\) be one of the \(u_i\) of largest norm. It can be written as \(u_{\max } = e\cdot a + \rho _1\cdot {b_1}^{(1)}+\rho _2\cdot {b_2}^{(1)}\). As we sampled the polynomial g such that \(\left\| {g^{-1}}\right\| _{} \leqslant l_{g^{-1}}\) the inequality \(\Vert {\left[ p_{zt} \, u\right] }_{q} \Vert _\infty < q^{1-\xi }\) holds if:

$$\begin{aligned} nl_{g^{-1}} {(\sqrt{n})}^{\kappa -1} \Vert (e\cdot a+\rho _1\cdot b_1^{(1)}+\rho _2\cdot b_2^{(1)})\Vert ^\kappa < q^{1/2-\xi }. \end{aligned}$$
(3)

Then, since

$$\Vert e\cdot a+\rho _1\cdot b_1^{(1)}+\rho _2\cdot b_2^{(1)}\Vert ^\kappa \leqslant {(\Vert e\Vert \cdot \Vert a\Vert \sqrt{n} + \Vert \rho _1\Vert \cdot \Vert b_1^{(1)}\Vert \sqrt{n} + \Vert \rho _2\Vert \cdot \Vert b_2^{(1)}\Vert \sqrt{n})}^\kappa ,$$

\(e\leftarrow D_{R,\sigma '}, a\leftarrow D_{1+I,\sigma '}, b_1^{(1)},b_2^{(1)} \leftarrow D_{I,\sigma '}\) and \(\rho _1,\rho _2 \leftarrow D_{R,\sigma _1^\star }\), we can bound each of these values as \(\Vert e\Vert ,\Vert a\Vert ,\Vert b_1^{(1)}\Vert ,\Vert b_2^{(1)}\Vert \leqslant \sigma '\sqrt{n}, \Vert \rho _1\Vert ,\Vert \rho _2\Vert \leqslant \sigma _1^\star \sqrt{n}\) to get:

$$\begin{aligned} nl_{g^{-1}}{(\sqrt{n})}^{\kappa -1}{(\sigma '\sqrt{n}\cdot \sigma '\sqrt{n}\cdot \sqrt{n} + 2\cdot \sigma _1^\star \sqrt{n}\cdot \sigma '\sqrt{n}\cdot \sqrt{n})}^\kappa < q^{1/2-\xi }, \end{aligned}$$
$$\begin{aligned} {\bigg (nl_{g^{-1}}{(\sqrt{n})}^{\kappa -1}{({(\sigma ')}^2n^{\frac{3}{2}}+2\sigma _1^\star \sigma ' n^{\frac{3}{2}})}^\kappa \bigg )}^{\frac{2}{1-2\xi }}< q. \end{aligned}$$
(4)

In [LSS14a], we had \(\xi =1/4\) (which give \(2/(1-2 \xi )=4\)), we hence have that this analysis allows to save a factor of 2 in the size of q even for the same \(\xi \). If we additionally consider \(\xi < 1/4\) bigger improvements are possible. For practical parameter sizes we reduce the size of q by a factor of almost 4 because \(\xi \) tends towards zero as \(\kappa \) and \(\lambda \) grow.

Correctness of Extraction. As in [LSS14a], we need that two level-\(\kappa \) encodings u and \(u'\) of different elements have different extracted elements, which implies that we need: \(\Vert {\left[ p_{zt} (u-u') \right] }_{q} \Vert _{\infty } > 2^{L-\ell +1}\) with \(L=\lfloor \log q \rfloor \). This condition follows from Lemma 2 with t satisfying \(t \cdot q^{1-\xi } > 2^{L-\ell +1}\), which holds for \(t=q^{\xi } \cdot 2^{-\ell +1}\). As a consequence, the condition \(q > {(2tn\sigma )}^{1/x}\) is still satisfied if we have \(\ell > \log _2(8n\sigma )\), and to ensure that \(t>1\) we need that \(\ell < \xi \log q + 2\). Finally, to ensure that \(\varepsilon _{ext}\), the probability of the extraction to be the same for two different elements, is negligible, we need that \( \ell \leqslant \xi \log _2 q - \log _2 (2n / \varepsilon _{ext})\).

Picking \(\xi \) and q . Putting all constraints together, we let \(\ell = \log (8 n \sigma )\) and

$$\begin{aligned} \tilde{q} = n l_{g^{-1}}{(\sqrt{n})}^{\kappa -1} {\bigg ({(\sigma ')}^2n^{\frac{3}{2}}+2\sigma _1^\star \sigma ' n^\frac{3}{2}\bigg )}^\kappa . \end{aligned}$$

To find \(\xi \) we solve \(\ell + \lambda = {\frac{2\xi }{1-2\xi }} \cdot \log \tilde{q}\) for \(\xi \) and set \(q = \tilde{q}^{\frac{2}{1-2\xi }}\).

4.3 Lattice Attacks

To pick a dimension n we rely on lattice attacks. The most efficient lattice attacks described [GGH13a] rely on computing weak discrete logarithms and hence do not seem to be applicable to either the case where no encodings of zero are published or the case where such attacks are ruled out in some other way. However, we may mount the attack from [CS97] against GGH-like graded encoding schemes. We explain it in the symmetric setting. Assume two encodings of random elements: \(u_{1} = {\left[ e_{1}/z\right] }_{q} \) and \(u_{2} = {\left[ e_{2}/z\right] }_{q} \). We have

$$\begin{aligned} {\left[ \frac{u_{1}}{u_{2}}\right] }_{q} = {\left[ \frac{e_{1}/z}{e_{2}/z}\right] }_{q} = {\left[ \frac{e_{1}}{e_{2}}\right] }_{q} \end{aligned}$$

with \(e_{1} \) and \(e_{2} \) small. We set up the lattice \(\varLambda = \left( \begin{array}{cccc} q I &{} 0\\ X &{} I\\ \end{array}\right) \) where I is the \(n \times n\) identity matrix, 0 is the \(n \times n\) zero matrix, and U a rotational basis for \({\left[ u_{1}/u_{2} \right] }_{q} \). By construction \(\varLambda \) contains the vector \( (e_{1},e_{2}) \) which is short. We have \(\det (\varLambda ) = q^n\) and \(\left\| {(e_{1}, e_{2})}\right\| _{} \approx \sqrt{2n}\sigma '\). In contrast, a random lattice with determinant \(q^n\) and dimension 2n is expected to have a shortest vector of norm \(\approx q^{n/2n} = \sqrt{q}\) which is much longer than \(\left\| {(e_{1},e_{2})}\right\| _{} \). While \(\varLambda \) does not constitute a Unique-SVP instance because there are many short elements of norm roughly \(\sqrt{2n}\sigma '\) we may consider all of these “interesting”. Clearly, there is a gap between those “interesting” vectors and the expected length of short vectors for random lattices. To hedge against potential attacks exploiting this gap, we may hence want to ensure that finding those “interesting” short vectors is hard. The hardness of Unique-SVP instances is determined by the ratio of the second shortest \(\lambda _2(\varLambda )\) and the shortest vector \(\lambda _1(\varLambda )\) of the lattice. We assume that the complexity of finding a short element in \(\varLambda \) depends on the gap between \( (e_{1},e_{2}) \) and \(\sqrt{q}\) in a similar way.

In order to succeed, an attacker needs to solve something akin of a Unique-SVP instance with gap \(\lambda _2(\varLambda )/\lambda _1(\varLambda )\). We need to pick parameters such that this problem takes at least \(2^\lambda \) operations. The most efficient technique known in the literature to produce short lattice vectors is to run lattice reduction. The quality of lattice reduction is typically expressed as the root-Hermite factor \(\sigma _0\). An algorithm with root-Hermite factor \(\sigma _0\) is expected to output a vector v in a lattice L such that \(\left\| { v }\right\| _{} = \sigma _0^n\, \text {vol} (L)^{1/n}\). Hence, in our case we require \(\tau \cdot \sigma _0^{2n} \leqslant \lambda _2(\varLambda )/\lambda _1(\varLambda )\) and thus

$$\begin{aligned} \sigma _0 \leqslant {\left( \frac{\sqrt{q}}{\sqrt{2n} \cdot \sigma '\cdot \tau }\right) }^{1/(2n)}, \end{aligned}$$
(5)

where \(\tau \) is a constant which depends on the lattice structure and on the reduction algorithm used. Typically \(\tau \approx 0.3\) [APS15], which we will use as an approximation.

Currently, the most efficient algorithm for lattice reduction is a variant of the BKZ algorithm [SE94] referred to as BKZ 2.0 [CN11]. However, its running time and behaviour, especially in high dimensions, is not very well understood: there is no consensus in the literature as to how to relate a given \(\sigma _{0}\) to computational cost. We estimate the cost of lattice reduction as in [APS15].

We stress, though, that these assumptions requires further scrutiny. Firstly, this attack does not use \(p_{zt} \) which means we expect that better lattice attacks can be found eventually. Secondly, we are assuming that the lattice reduction estimates in [APS15] are accurate. However, should these assumptions be falsified, then this part of the analysis can simply be replaced by refined estimates.

4.4 Putting Everything Together

Our overall strategy is as follows. Pick an n and compute parameters \(\sigma \), \(\sigma '\), \(\sigma _1^\star \) as in [LSS14a] and \(\ell _g\) and q as in Sect. 4.2. Now, establish the root-Hermite factor required to carry out the attack in Sect. 4.3 using Equation (5). If this \({\sigma }_0\) is small enough to satisfy security level \(\lambda \) terminate, otherwise double n and restart the procedure.

We give choices of parameters in Table 2.

Table 2. Parameter choices for multilinear jigsaw puzzles.

5 Implementation

Our implementation relies on FLINT [HJP14]. We use its data types to encode elements in \({\mathbb {Z}} [X]\), \({\mathbb {Q}} [X]\), and \({{\mathbb {Z}_q}} [X]\) but re-implement most non-trivial operations for the ring of integers of a Cyclotomic number field where the degree is a power of two. Other operations — such as Gaussian sampling or taking approximate inverses — are not readily available in FLINT and are hence provided by our implementation. For computation with elements in \(\mathbb {R}\) we use MPFR’s mpfr_t [The13] with precision \(2\lambda \) if not stated otherwise. Our implementation is available under the GPLv2+ license at https://bitbucket.org/malb/gghlite-flint. We give experimental results for computing multilinear maps using our implementation in Table 1.

For all operations considered in this section naive algorithms are available in \(\mathcal {O}\left( n^2 \log q\right) \) or \(\mathcal {O}\left( n^3 \log n\right) \) bit operations. However, the smallest set of parameters we consider in Table 1 is \(n=2^{15}\) which implies that if implemented naively each operation would take \(2^{49}\) bit operations for the smallest set of parameters we consider. Even quadratic algorithms can be prohibitively expensive. Hence, in order to be feasible, all algorithms should run in quasi-linear time in n, or more precisely in \(\mathcal {O}\left( n \log n\right) \) or \(\mathcal {O}\left( n \log ^2 n\right) \). All algorithms discussed in this section run in quasi-linear time.

5.1 Polynomial Multiplication in \({\mathbb {Z}_q} [X]/(X^n+1)\)

During the evaluation of a GGH-style graded encoding scheme multiplications of polynomials in \({\mathbb {Z}} _q[X]/(X^n+1)\) are performed. Naive multiplication takes \(\mathcal {O}\left( n^2\right) \) time in n, Asymptotically fast multiplication in this ring can be realised by first reducing to multiplication in \({\mathbb {Z}} [X]\) and then to the Sch?nehage-Strassen algorithm for multiplying large integers in \(\mathcal {O}(n \log n \log \log n)\). This is the strategy implemented in FLINT, which has a highly optimised implementation of the Sch?nehage-Strassen algorithm. Alternatively, we can get an \(\mathcal {O}(n\log n)\) algorithm by using the Number-Theoretic Transform (NTT). Furthermore, using a negative wrapped convolution we can avoid reductions modulo \((X^n+1)\):

Theorem 1

(Adapted from [Win96]). Let \(\omega _n\) be a nth root of unity in \({\mathbb {Z}_q} \) and \(\varphi ^2 = \omega _n\). Let \(a = \sum _{i=0}^{n-1} a_i X^i\) and \(b = \sum _{i=0}^{n-1} b_i X^i\) \(\in {\mathbb {Z}_q} [X]/(X^n+1)\). Let \(c = a \cdot b \in {\mathbb {Z}_q} [X]/(X^n+1)\) and let \(\overline{a} = (a_0, \varphi a_1, \dots , \varphi ^{n-1}a_{n-1})\) and define \(\overline{b}\) and \(\overline{c}\) analogously. Then \(\overline{c} = 1/n \cdot \text{ NTT }_{\omega _n}^{-1}(\text{ NTT }_{\omega _n}(\overline{a})\odot \text{ NTT }_{\omega _n}(\overline{b}))\).

The NTT with a negative wrapped convolution has been used in lattice-based cryptography before, e.g. [LMPR08]. We note that if we are doing many operations in \({\mathbb {Z}_q} [X]/(X^n+1)\) we can avoid repeated conversions between coefficient and “evaluation” representations, \(\left( f(1),f(\omega _n),\dots ,f(\omega _n^{n-1})\right) \), of our elements, which reduces the amortised cost from \(\mathcal {O}(n \log n)\) to \(\mathcal {O}(n)\). That is, we can convert encodings to their evaluation representation once on creation and back only when running extraction. We implemented this strategy. We observe a considerable overall speed-up with the strategy of avoiding the conversions where possible. We also note that operations on elements in their evaluation representation are embarrassingly parallel.

5.2 Computing Norms in \({\mathbb {Z}} [X]/(X^n+1)\)

During instance generation we have to compute several norms of elements in \({\mathbb {Z}} [X]/(X^n+1)\). The norm \(\mathcal {N}({f}) \) of an element f in \({\mathbb {Z}} [X]/(X^n+1)\) is equal to the resultant \(\text{ res }(f,X^n+1)\). The usual strategy for computing resultants over the integers is to use a multi-modular approach. That is, we compute resultants modulo many small primes \(q_i\) and then combine the results using the Chinese Remainder Theorem. Resultants modulo a prime \(q_i\) can be computed in \(\mathcal {O}(M(n)\log n)\) operations where M(n) is the cost of one multiplication in \({\mathbb {Z}} _{q_i}[X]/(X^n+1)\). Hence, in our setting computing the norm costs \(\mathcal {O}(n \log ^2 n)\) operations without specialisation.

However, we can observe that \(\text{ res }(f,X^n+1) \mod q_i\) can be rewritten as \(\prod _{(X^n+1)(x) = 0} f(x) \mod q_i\) as \(X^n+1 \) is monic, i.e. as evaluating f on all roots of \(X^n+1 \). Picking \(q_i\) such that \(q_i \equiv 1 \mod 2n\) this can be accomplished using the NTT reducing the cost mod \(q_i\) to \(\mathcal {O}(M(n))\) saving a factor of \(\log n\), which in our case is typically \(>15\).

5.3 Checking if (g) is a Prime Ideal

While we show in Sect. 4.1 that we do not necessarily require a prime (g), some applications might still rely on this property. We hence provide an implementation for sampling such g.

To check whether the ideal generated by g is prime in \({\mathbb {Z}} [X]/(X^n+1)\) we compute the norm \(\mathcal {N}({g})\) and check if it is prime which is a sufficient but not necessary condition. However, before computing full resultants, we first check if \(\text{ res }(g,X^n+1) = 0 \mod q_i\) for several “interesting” primes \(q_i\). These primes are 2 and then all primes up to some bound with \(q_i \equiv 1 \mod n\) because these occur with good probability as factors. We list timings in Table 3.

Table 3. Average time of checking primality of a single (g) on Intel Xeon CPU E5–2667 v2 3.30 GHz with 256 GB of RAM using 16 cores.

5.4 Verifying that \((b^{({1})}_{1} b^{({1})}_{2}) = (g)\)

If re-randomisation elements are required, then it is necessary that they generate all of \(\left( g\right) \), i.e. \((b^{({1})}_{1},b^{({1})}_{2}) = (g)\). If \(b^{({1})}_{i} = \tilde{b}^{({1})}_{i} \cdot g\) for \(0 < i \leqslant 2\) then this condition is equivalent to \((\tilde{b}^{({1})}_{1}) + (\tilde{b}^{({1})}_{2}) = R\). We check the sufficient but not necessary condition \(\text{ gcd }(\text{ res }(\tilde{b}^{({1})}_{1},X^n+1),\, \text{ res }(\tilde{b}^{({1})}_{2},X^n+1)) = 1\), i.e. if the respective ideal norms are co-prime. This check, which we have to perform for every candidate pair \((\tilde{b}^{({1})}_{1},\tilde{b}^{({1})}_{2})\), involves computing two resultants and their gcd which is quite expensive. However, we observe that \(\text{ gcd }(\text{ res }(\tilde{b}^{({1})}_{1},X^n+1),\, \text{ res }(\tilde{b}^{({1})}_{2},X^n+1)) \ne 1\) when \(\text{ res }(\tilde{b}^{({1})}_{1},X^n+1) = 0 = \text{ res }(\tilde{b}^{({1})}_{2},X^n+1) \mod q_i\) for any modulus \(q_i\). Hence, we first check this condition for several “interesting” primes and resample if this condition holds. These “interesting” primes are the same as in the previous section. Only if these tests pass, we compute two full resultants and their gcd. Indeed, after having ruled out small common prime factors it is quite unlikely that the gcd of the norms is not equal to one which means that with good probability we will perform this expensive step only once as a final verification. However, this step is still by far the most time consuming step during setup even with our optimisations applied. We note that a possible strategy for reducing setup time is to sample \(m>2\) re-randomisers \(b^{({1})}_{i} \) and to apply some bounds on the probability of m elements \(\tilde{b}^{({1})}_{i} \) sharing a prime factor (after excluding small prime factors).

5.5 Computing the Inverse of a Polynomial Modulo \(X^n+1 \)

Instance generation relies on inversion in \({\mathbb {Q}} [X]/(X^n+1)\) in two places. Firstly, when sampling g we have to check that the norm of its inverse is bounded by \(\ell _g\). Secondly, to set up our discrete Gaussian samplers we need to run many inversions in an iterative process. We note that for computing the zero-testing parameter we only need to invert g in \({\mathbb {Z}} _q[X]/(X^n+1)\) which can be realised in n inversions in \({\mathbb {Z}_q} \) in the NTT representation.

In both cases where inversion in \({\mathbb {Q}} [X]/(X^n+1)\) is required approximate solutions are sufficient. In the first case we only need to estimate the size of \(g^{-1}\) and in the second case inversion is a subroutine of an approximation algorithm (see below). Hence, we implemented a variant of [BCMM98] to compute the approximate inverse of a polynomial in \({\mathbb {Q}} [X]/(X^n+1)\), with n a power of two.

The core idea is similar to the FFT, i.e. to reduce the inversion of f to the inversion of an element of degree n / 2. Indeed, since \(n\) is even, f(X) is invertible modulo \(X^n+1 \) if and only if \(f(-X)\) is also invertible. By setting \(F(X^2) = f(X)f(-X) \mod X^n+1 \), the inverse \(f^{-1}(X)\) of f(X) satisfies

$$\begin{aligned} F(X^2)\,f^{-1}(X) = f(-X) \pmod {X^n+1}. \end{aligned}$$
(6)

Let \(f^{-1}(X) = g(X) = G_e(X^2) + X G_o(X^2)\) and \(f(-X) = F_e(X^2) + X F_o(X^2)\) be split into their even and odd parts respectively. From Eq. 6, we obtain \(F(X^2)(G_e(X^2) + X G_o(X^2)) =F_e(X^2) + X F_o(X^2) \pmod {X^n+1}\) which is equivalent to

$$\left\{ \begin{array}{l} F(X^2) G_e(X^2) = F_e(X^2) \pmod {X^n+1} \\ F(X^2) G_o(X^2) = F_o(X^2) \pmod {X^n+1}. \\ \end{array}\right. $$
figure a

From this, inverting f(X) can be done by inverting \(F(X^2)\) and multiplying polynomials of degree n / 2. It remains to recursively call the inversion of F(Y) modulo \((X^{n/2}+1)\) (by setting \(Y=X^2)\). This leads to an algorithm for approximately inverting elements of \({\mathbb {Q}} [X]/(X^n+1)\) when n is a power of 2 which can be performed in \(\mathcal {O}(n \log ^2(n))\) operations in \({\mathbb {Q}} \). We give experimental results in Table 4.

We give experimental results comparing Algorithm 1 with FLINT’s extended GCD algorithm in Table 4 which highlights that computing approximate inverses instead of exact inverses is necessary for anything but toy instances.

Table 4. Inverting \(g \hookleftarrow D_{{\mathbb {Z}} ^n,\sigma }\) with FLINT’s extended Euclidean algorithm (“xgcd”), our implementation with precision 160 (“160”), iterating our implementation until \(\Vert \tilde{f}^{-1}(X) \cdot f(X)\Vert < 2^{-160}\) (“160iter”) and our implementation without truncation (“\(\infty \)”) on Intel Core i7–4850HQ CPU at 2.30 GHz, single core.

5.6 Small Remainders

The Jigsaw Generator as defined in [GGH+13b, Definition 8] takes as input elements \(a_i\) in \({\mathbb {Z}} _p\) where \(p = \mathcal {N}({\mathcal {I}}) \) and produces level encodings with respect to some source group \(S_i\). In particular, this algorithm produces some small representative of the coset \(a_i\) modulo \(\left( g\right) \) from large integers of size \(\approx {(\sigma \sqrt{n})}^n\) if we represents elements in \({\mathbb {Z}} _p\) as integers \(0 \leqslant a_i < p\). This can be accomplished by using Babai’s trick and that g is small, i.e. by computing \(a_i - g \cdot \lfloor g^{-1} \cdot a_i \rceil \) in \({\mathbb {Q}} [X]/(X^n+1)\). However, in order for this operation to produce sufficiently small elements, we need \(g^{-1}\) either exactly or with high precision. Computing such a high quality approximation of \(g^{-1}\) can be prohibitively expensive in terms of memory and time. Our strategy for computing with a lower precision is to rewrite \(a_i\) as

$$\begin{aligned} a_i = \sum _{j=0}^{\lceil \log _2(a_i)/B\rceil } 2^{B\cdot j}\cdot a_{ij} \end{aligned}$$

where \(a_{ij} < 2^B\) for some B. Then, we compute small representatives for all \(2^{B\cdot j}\) and \(a_{ij}\) using an approximation of \(g^{-1}\) with precision B. Finally, we multiply the small representatives for \(2^{B\cdot j}\) and \(a_{ij}\) and add up their products. This produces a somewhat short element which we then reduce using our approximation of \(g^{-1}\) with precision B until its size does not decrease any more.

5.7 Sampling from a Discrete Gaussian

While the strategy in Sect. 5.6 produces short elements it does not necessarily produce elements which follow a spherical Gaussian distribution and hence do not leak geometric information about g. To produce such samples we need to sample from the discrete Gaussian \(D_{(g),\sigma ',c}\) where c is a small representative of a coset of (g). Furthermore, if encodings of zero are published, we are required to sample from \(D_{(g),\sigma ',0}\) and \(D_{(g),\sigma ',1}\). For this, a fundamental building block is to sample from the integer lattice. We implemented a discrete Gaussian sampler over the integers both in arbitrary precision – using MPFR — and in double precision — using machine doubles. For both cases we implemented rejection sampling from a uniform distribution with and without table (“online”) lookups [GPV08] and Ducas et al’s sampler which samples from \(D_{{\mathbb {Z}},k\sigma _2}\) where \(\sigma _2\) is a constant [DDLL13, Algorithm 12]. Our implementation automatically chooses the best algorithm based on \(\sigma \), c and \(\tau \) (the tail cut). In our case \(\sigma \) is typically relatively large, so we call the latter whenever sampling with a centre \(c \in {\mathbb {Z}} \) and the former when \(c \not \in {\mathbb {Z}} \). We list example timings of our discrete Gaussian sampler in Table 5. We note that in our implementation we — conservatively — only make use of the arbitrary precision implementation of this sampler with precision \(2\lambda \).

Table 5. Example timings for discrete Gaussian sampling over \({\mathbb {Z}} \) on Intel Core i7–4850HQ CPU at 2.30 GHz, single core.

Using our discrete Gaussian sampler over the integers we implemented discrete Gaussian samplers over lattices. Implemented naively this takes \(\mathcal {O}(n^3 \log n)\) operations even if we ignore issues of precision. Following [Duc13], we implemented a variant of [Pei10] which we reproduce in Algorithm 2. Namely, we first observe that \(D_{(g),\sigma ',0} = g \cdot D_{R,\sigma '\cdot g^{-T}}\) and then use [Pei10, Algorithm 1] to sample from \(D_{R,\sigma '\cdot g^{-T}}\) where \(g^{-T}\) is the conjugate of \(g^{-1}\). That is, \(g^{T}_0 = g_0\) and \(g^{T}_{n-i} = -g_{i}\) for \(1 \leqslant i < n\) for \(\deg (g) = n-1\). We then proceed as follows. We first compute an approximate square root (see below) of \(\varSigma _2' = g^{-T} \cdot g^{-1}\) up to \(\lambda \) bits of precision. We perform operations with \(\log (n) + 4\,(\log (\sqrt{n}\parallel \sigma \parallel ))\) bits of precision. If the square root does not converge for this precision, we double it and start over. We then use this value, scaled appropriately, as the initial value from which to start computing a square-root of \(\varSigma _2 = \sigma '^2 \cdot g^{-T} \cdot g^{-1} - r^2\) with \(r=2\cdot \lceil \sqrt{\log n}\, \rceil \). We terminate when the square of the approximation is within distance \(2^{-2\lambda }\) to \(\varSigma _2\). This typically happens quickly because our initial candidate is already very close to the target value.

figure b
figure c

Given an approximation \(\sqrt{\varSigma _2}'\) of \(\sqrt{\varSigma _2}\) we then sample a vector \(x \hookleftarrow \mathbb {R}^n\) from a standard normal distribution and interpret it as a polynomial in \({\mathbb {Q}} [X]/(X^n+1)\). We then compute \(y = \sqrt{\varSigma _2}' \cdot x\) in \({\mathbb {Q}} [X]/(X^n+1)\) and return \(g \cdot (\lfloor y \rceil _r)\), where \(\lfloor y \rceil _r\) denotes sampling a vector in \({\mathbb {Z}} ^n\) where the i-th component follows \(D_{{\mathbb {Z}},r,y_i}\). This algorithm is then easily extended to sample from arbitrary centres c. The whole algorithm is summarised in Algorithm 3 and we give experimental results in Table 6.

5.8 Approximate Square Roots

Our Gaussian sampler requires an (approximate) square root in \({\mathbb {Q}} [X]/(X^n+1)\). That is, for some input element \(\varSigma \) we want to compute some element \(\sqrt{\varSigma }' \in {\mathbb {Q}} [X]/(X^n+1)\) such that \(\Vert \sqrt{\varSigma }'\cdot \sqrt{\varSigma }' - \varSigma \Vert < 2^{-2\lambda }\). We use iterative methods as suggested in [Duc13, Section 6.5] which iteratively refine the approximation of the square root similar to Newton’s method. Computing approximate square roots of matrices is a well studied research area with many algorithms known in the literature (cf. [Hig97]). All algorithms with global convergence invoke approximate inversions in \({\mathbb {Q}} [X]/(X^n+1)\) for which we call our inversion algorithm.

Table 6. Approximate square roots of \(\varSigma _2 = \sigma '^2 \cdot g^{-T} \cdot g - r^2 \cdot I\) for discrete Gaussian sampling over g with parameter \(\sigma '\) on Intel Core i7–4850HQ CPU at 2.30 GHz, 2 cores for Denman-Beavers, 4 cores for estimating the scaling factor, one core for sampling. The last column lists the rate (samples per second) of sampling from \(D_{(g),\sigma '}\).

We implemented the Babylonian method, the Denman-Beavers iteration [DB76] and the Padé iteration [Hig97]. Although the Babylonian method only involves one inversion which allows us to compute with lower precision, we used Denman-Beavers, since it converges faster in practice and can be parallelised on two cores. While the Padé iteration can be parallelised on arbitrarily many cores, the workload on each core is much greater than in the Denman-Beavers iteration and in our experiments only improved on the latter when more than 8 cores were used.

Most algorithms have quadratic convergence but in practice this does not assure rapid convergence as error can take many iterations to become small enough for quadratic convergence to be observed. This effect can be mitigated, i.e. convergence improved, by scaling the operands appropriately in each loop iteration of the approximation [Hig97, Section 3]. A common scaling scheme is to scale by the determinant which in our case means computing \(\text{ res }(f,X^n+1)\) for some \(f \in {\mathbb {Q}} [X]/(X^n+1)\). Computing resultants in \({\mathbb {Q}} [X]/(X^n+1)\) reduces to computing resultants in \({\mathbb {Z}} [X](X^n+1)\). As discussed above, computing resultants in \({\mathbb {Z}} [X]/(X^n+1)\) can be expensive. However, since we are only interested in an approximation of the determinant for scaling, we can compute with reduced precision. For this, we clear all but the most significant bit for each coefficient’s numerator and denominator of f to produce \(f'\) and compute \(\text{ res }(f',X^n+1)\). The effect of clearing out the lower order bits of f is to reduce the size of the integer representation in order to speed up the resultant computation. With this optimisation scaling by an approximation of the determinant is both fast and precise enough to produce fast convergence. See Table 6 for timings.