Skip to main content

Large FHE Gates from Tensored Homomorphic Accumulator

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10831))

Abstract

The main bottleneck of all known Fully Homomorphic Encryption schemes lies in the bootstrapping procedure invented by Gentry (STOC’09). The cost of this procedure can be mitigated either using Homomorphic SIMD techniques, or by performing larger computation per bootstrapping procedure.

In this work, we propose new techniques allowing to perform more operations per bootstrapping in FHEW-type schemes (EUROCRYPT’13). While maintaining the quasi-quadratic \(\tilde{O}(n^2)\) complexity of the whole cycle, our new scheme allows to evaluate gates with \(\varOmega (\log n)\) input bits, which constitutes a quasi-linear speed-up. Our scheme is also very well adapted to large threshold gates, natively admitting up to \(\varOmega (n)\) inputs. This could be helpful for homomorphic evaluation of neural networks.

Our theoretical contribution is backed by a preliminary prototype implementation, which can perform 6-to-6 bit gates in less than 10 s on a single core, as well as threshold gates over 63 input bits even faster.

G. Bonnoron—Funded and supported by Ecole Navale, IMT Atlantique, Naval Group and Thales.

L. Ducas is supported by a Veni Innovational Research Grant from NWO under project number 639.021.645.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   84.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    More precisely, \(t \le q / \sqrt{n \cdot \log 1/p_{\text {fail}}}\), where \(p_{\text {fail}}\) is the failure probability. In this paper, we will always aim for exponentially small failure probability.

  2. 2.

    And maybe even impossible due to dimensionality constraints.

  3. 3.

    We wish to clarify that our scheme does not require the NTRU assumption, namely the assumption that \(f/g \bmod q\) is indistinguishable from random even for small f and g. Up to the usual circular-security assumption, our scheme is based on a ring-LWE type of assumption.

  4. 4.

    This is simply a special case of the usual definition of the trace function, but we do not need the general definition here.

  5. 5.

    More formally, for some event E with \(p(E) \le 2\min (p, q)\exp (-\lambda )\), when conditioning on \(\overline{E}\), \(A\otimes B\) is subgaussian with parameter \(\sqrt{2\lambda }\gamma \delta \).

  6. 6.

    https://github.com/gbonnoron/Borogrove.

  7. 7.

    More formally, for some event E with \(p(E) \le 2\min (p, q)\exp (-\lambda )\), when conditioning on \(\overline{E}\), \(A\otimes B\) is subgaussian with parameter \(\sqrt{2\lambda }\gamma \delta \).

  8. 8.

    While \(\tilde{K}_d\) is a field, \(K_d\) is only a ring, but we keep this notation for coherence.

  9. 9.

    Of size roughly \(d/\ell + 2\) assuming the public vector \(\mathbf {y} \in \mathbb Z_d^\ell \) is uniformly random.

  10. 10.

    This is assuming the FFT can handle numbers of bit-size \(\varTheta (\log (n))\). In practice more FFT at double precision will be needed to avoid numerical errors.

References

  1. Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Mitzenmacher, M. (ed.) 41st ACM STOC, Bethesda, MD, USA, 31 May–2 June 2009, pp. 169–178. ACM Press (2009)

    Google Scholar 

  2. Gentry, C.: A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University (2009). crypto.stanford.edu/craig

  3. Smart, N.P., Vercauteren, F.: Fully homomorphic encryption with relatively small key and ciphertext sizes. In: Nguyen, P.Q., Pointcheval, D. (eds.) PKC 2010. LNCS, vol. 6056, pp. 420–443. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13013-7_25

    Chapter  Google Scholar 

  4. Brakerski, Z., Vaikuntanathan, V.: Efficient fully homomorphic encryption from (standard) LWE. In: Ostrovsky, R. (ed.) 52nd FOCS, Palm Springs, CA, USA, 22–25 October 2011, pp. 97–106. IEEE Computer Society Press (2011)

    Google Scholar 

  5. Gentry, C., Halevi, S., Smart, N.P.: Fully homomorphic encryption with polylog overhead. In: Pointcheval, D., Johansson, T. (eds.) EUROCRYPT 2012. LNCS, vol. 7237, pp. 465–482. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29011-4_28

    Chapter  Google Scholar 

  6. Brakerski, Z., Gentry, C., Vaikuntanathan, V.: (Leveled) fully homomorphic encryption without bootstrapping. In: Goldwasser, S. (ed.) ITCS 2012, Cambridge, MA, USA, 8–10 January 2012, pp. 309–325. ACM (2012)

    Google Scholar 

  7. Halevi, S., Shoup, V.: Bootstrapping for HElib. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015, Part I. LNCS, vol. 9056, pp. 641–670. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_25

    Google Scholar 

  8. Gentry, C., Sahai, A., Waters, B.: Homomorphic encryption from learning with errors: conceptually-simpler, asymptotically-faster, attribute-based. In: Canetti, R., Garay, J.A. (eds.) CRYPTO 2013, Part I. LNCS, vol. 8042, pp. 75–92. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40041-4_5

    Chapter  Google Scholar 

  9. Barrington, D.A.M.: Bounded-width polynomial-size branching programs recognize exactly those languages in \(\text{NC}^1\). In: 18th ACM STOC, Berkeley, CA, USA, 28–30 May 1986, pp. 1–5. ACM Press (1986)

    Google Scholar 

  10. Brakerski, Z., Vaikuntanathan, V.: Lattice-based FHE as secure as PKE. In: Naor, M. (ed.) ITCS 2014, Princeton, NJ, USA, 12–14 January 2014, pp. 1–12. ACM (2014)

    Google Scholar 

  11. Alperin-Sheriff, J., Peikert, C.: Faster bootstrapping with polynomial error. In: Garay, J.A., Gennaro, R. (eds.) CRYPTO 2014, Part I. LNCS, vol. 8616, pp. 297–314. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44371-2_17

    Chapter  Google Scholar 

  12. Ducas, L., Micciancio, D.: FHEW: bootstrapping homomorphic encryption in less than a second. In: Oswald, E., Fischlin, M. (eds.) EUROCRYPT 2015, Part I. LNCS, vol. 9056, pp. 617–640. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46800-5_24

    Google Scholar 

  13. Biasse, J.-F., Ruiz, L.: FHEW with efficient multibit bootstrapping. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 119–135. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22174-8_7

    Chapter  Google Scholar 

  14. Gama, N., Izabachène, M., Nguyen, P.Q., Xie, X.: Structural lattice reduction: generalized worst-case to average-case reductions and homomorphic cryptosystems. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016, Part II. LNCS, vol. 9666, pp. 528–558. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49896-5_19

    Chapter  Google Scholar 

  15. Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: Faster fully homomorphic encryption: bootstrapping in less than 0.1 seconds. In: Cheon, J.H., Takagi, T. (eds.) ASIACRYPT 2016, Part I. LNCS, vol. 10031, pp. 3–33. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-53887-6_1

    Chapter  Google Scholar 

  16. Riordan, J., Shannon, C.E.: The number of two-terminal series-parallel networks. Stud. Appl. Math. 21(1–4), 83–93 (1942)

    MathSciNet  Google Scholar 

  17. Hoffstein, J., Pipher, J., Silverman, J.H.: NTRU: a ring-based public key cryptosystem. In: Buhler, J.P. (ed.) ANTS 1998. LNCS, vol. 1423, pp. 267–288. Springer, Heidelberg (1998). https://doi.org/10.1007/BFb0054868

    Chapter  Google Scholar 

  18. Lyubashevsky, V., Peikert, C., Regev, O.: On ideal lattices and learning with errors over rings. In: Gilbert, H. (ed.) EUROCRYPT 2010. LNCS, vol. 6110, pp. 1–23. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13190-5_1

    Chapter  Google Scholar 

  19. Halevi, S., Halevi, T., Shoup, V., Stephens-Davidowitz, N.: Implementing BP-obfuscation using graph-induced encoding. Cryptology ePrint Archive, Report 2017/104 (2017). http://eprint.iacr.org/2017/104

  20. Chillotti, I., Gama, N., Georgieva, M., Izabachène, M.: Improving TFHE: faster packed homomorphic operations and efficient circuit bootstrapping. Cryptology ePrint Archive, Report 2017/430 (2017). http://eprint.iacr.org/2017/430

  21. Vershynin, R.: Introduction to the non-asymptotic analysis of random matrices. In: Eldar, Y., Kutyniok, G. (eds.) Compressed Sensing, Theory and Applications, pp. 210–268. Cambridge University Press, Cambridge (2012)

    Chapter  Google Scholar 

  22. Rivasplata, O.: Subgaussian Random Variables: An Expository Note (2012). https://sites.ualberta.ca/~omarr/publications/subgaussians.pdf

  23. Hoffstein, J., Howgrave-Graham, N., Pipher, J., Silverman, J.H., Whyte, W.: NTRUSign: digital signatures using the NTRU lattice. In: Joye, M. (ed.) CT-RSA 2003. LNCS, vol. 2612, pp. 122–140. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36563-X_9

    Chapter  Google Scholar 

  24. Blum, A., Furst, M., Kearns, M., Lipton, R.J.: Cryptographic primitives based on hard learning problems. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 278–291. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-48329-2_24

    Chapter  Google Scholar 

  25. Regev, O.: On lattices, learning with errors, random linear codes, and cryptography. In: Gabow, H.N., Fagin, R. (eds.) 37th ACM STOC, Baltimore, MA, USA, 22–24 May 2005, pp. 84–93. ACM Press (2005)

    Google Scholar 

  26. Applebaum, B., Cash, D., Peikert, C., Sahai, A.: Fast cryptographic primitives and circular-secure encryption based on hard learning problems. In: Halevi, S. (ed.) CRYPTO 2009. LNCS, vol. 5677, pp. 595–618. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03356-8_35

    Chapter  Google Scholar 

  27. Gentry, C., Halevi, S., Peikert, C., Smart, N.P.: Ring switching in BGV-style homomorphic encryption. In: Visconti, I., De Prisco, R. (eds.) SCN 2012. LNCS, vol. 7485, pp. 19–37. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-32928-9_2

    Chapter  Google Scholar 

  28. Gentry, C., Halevi, S., Peikert, C., Smart, N.P.: Field switching in BGV-style homomorphic encryption. Cryptology ePrint Archive, Report 2012/240 (2012). http://eprint.iacr.org/2012/240

  29. Frigo, M., Johnson, S.G.: The design and implementation of FFTW3. Proc. IEEE 93(2), 216–231 (2005). Special issue on “Program Generation, Optimization, and Platform Adaptation”

    Article  Google Scholar 

  30. Albrecht, M.R., Player, R., Scott, S.: On the concrete hardness of learning with errors. Cryptology ePrint Archive, Report 2015/046 (2015). http://eprint.iacr.org/2015/046

  31. Albrecht, M.R.: On dual lattice attacks against small-secret LWE and parameter choices in HElib and SEAL. In: Coron, J.-S., Nielsen, J.B. (eds.) EUROCRYPT 2017, Part II. LNCS, vol. 10211, pp. 103–129. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56614-6_4

    Chapter  Google Scholar 

  32. Castryck, W., Iliashenko, I., Vercauteren, F.: Provably weak instances of ring-LWE revisited. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016, Part I. LNCS, vol. 9665, pp. 147–167. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49890-3_6

    Chapter  Google Scholar 

  33. Ducas, L., Durmus, A.: Ring-LWE in polynomial rings. In: Fischlin, M., Buchmann, J., Manulis, M. (eds.) PKC 2012. LNCS, vol. 7293, pp. 34–51. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30057-8_3

    Chapter  Google Scholar 

  34. Lyubashevsky, V., Peikert, C., Regev, O.: A toolkit for ring-LWE cryptography. In: Johansson, T., Nguyen, P.Q. (eds.) EUROCRYPT 2013. LNCS, vol. 7881, pp. 35–54. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38348-9_3

    Chapter  Google Scholar 

  35. Peikert, C.: An efficient and parallel Gaussian sampler for lattices. In: Rabin, T. (ed.) CRYPTO 2010. LNCS, vol. 6223, pp. 80–97. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14623-7_5

    Chapter  Google Scholar 

  36. Peikert, C.: How (not) to instantiate ring-LWE. In: Zikas, V., De Prisco, R. (eds.) SCN 2016. LNCS, vol. 9841, pp. 411–430. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44618-9_22

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guillaume Bonnoron .

Editor information

Editors and Affiliations

Appendices

A Proofs for Section 2 (Preliminaries)

Lemma 1

Let A be a \(\delta \)-subgaussian random variable over \(\mathcal {R}_{pq}\). Then \(\mathrm{Tr}_{\mathcal {R}_{pq}/\mathcal {R}_p}^*(A)\) is \(\delta \)-subgaussian as well.

Proof

Let \(b\in \mathcal {R}_p\). Then, \(\mathrm{Tr}_{\mathcal {R}_p/\mathbb Z}^*\bigl (\mathrm{Tr}_{\mathcal {R}_{pq}/\mathcal {R}_p}^*(Ab)\bigr )/\Vert b\Vert = \mathrm{Tr}^*(Ab)/\Vert b\Vert \) which is \(\delta \)-subgaussian by assumption.

B Proofs for Section 3 (Encryption Schemes)

Lemma 2

If the decisional \(\tilde{\mathcal {R}} \mathsf {{\text {-}}LWE}\) problem is hard, then the Circulant-LWE scheme is CPA-secure for messages of the form \(m=X^k\).

Proof

If \(\tilde{\mathcal {R}} \mathsf {{\text {-}}LWE}\) is hard, then by Lemma 3, the Circulant-LWE distribution is indistinguishable from the uniform distribution over \(\mathcal S_{d,Q}^2\). To prove CPA-security, it suffices to show that, for any \(k\in \mathbb Z/d\mathbb Z\) and \(u = \left\lfloor Q/t \right\rceil \), we have \(\mathcal S_{d,Q}+uX^k = \mathcal S_{d,Q}+u\). This then shows that a Circulant-LWE encryption of \(m = X^k\) is indistinguishable from a uniformly random sample from \(\mathcal S_{d,Q}\times (\mathcal S_{d,Q}+u)\). Indeed, \(\mathcal S_{d,Q}+u=\{\sum _{i=0}^{d-1}a_iX^i\mid \sum a_i = u\bmod Q\} =\mathcal S_{d,Q} + uX^{k}\).

Lemma 3

If the decisional \(\tilde{\mathcal {R}} \mathsf {{\text {-}}LWE}\) problem is hard, then the Circulant-GSW scheme is CPA-secure.

Proof

Let \(\mathbf {C}\) be a Circulant-GSW ciphertext. Each row of \(\mathbf {C}\) is of the form \((a, b) + (0, u B^im)\) or \((a, b) + (u B^im, 0)\) where \(m = X^k\) and (ab) is a Circulant-LWE sample, and thus indistinguishable from a random element of \(\mathcal S_{d,Q}^2\). By the same argument as in the previous proof, each row of \(\mathbf {C}\) is indistinguishable from a uniformly random samble from either \((\mathcal S_{d,Q} + uB^i)\times \mathcal S_{d,Q}\), or \(\mathcal S_{d,Q}\times (\mathcal S_{d,Q} + uB^i)\) where i only depends on the row number, not on m.

C Proofs for Section 4 (Homomorphic Operations)

Lemma 4

Algorithm 1 is correct. Furthermore, if \(e = \mathsf {err}(\mathbf {c})\) and \(\mathbf {e}_i = \mathsf {err}(\mathbf {S}_i)\), then the error term of the output ciphertext is \(e + \sum _{i=1}^n \mathbf {d}_i^T \mathbf {e}_i\), where each \(\mathbf {d}_i\) is a vector whose entries have operator norm at most d.

Proof

By definition of \(\mathbf {g}^{-T}\), it is easy to see that the error term is \(e - \sum _{i=1}^n \mathbf {g}^{-T}(\mathbf {a}_i)\mathbf {e}_i\) and each component of \(g^{-T}(\mathbf {a}_i)\) is in \(\mathcal {R}_d/2\mathcal {R}_d\). Thus, the second part of the lemma follows. The first part holds because for every i, \(g^{-T}(\mathbf {a}_i)\mathbf {e}_i\) is subgaussian with parameter at most \(\sqrt{K}d\sigma \). If the error terms are independent, it follows that the error parameter is as stated in the algorithm.

Lemma 5

Algorithm 2 is correct. Furthermore, for \(e = \mathsf {err}(\mathbf {c})\) and \(\mathbf {e} = \mathsf {err}(\mathbf {C})\), the error term of the output is \(X^k\cdot e + \mathbf {d}^T\mathbf {e}\) for some k and a random vector \(\mathbf {d}\in \mathcal {R}_d^{2K}\) independent of \(\mathbf {e}\) with \(\Vert \mathbf {d}_i\Vert \le d\) for every i.

Proof

Write \(u = \left\lfloor Q/t \right\rceil \), so \(\mathbf {c} = (a, as + e + \left\lfloor Q/t \right\rceil T^m)\) and \(\mathbf {C} = (\mathbf {a}, \langle \mathbf {a}, \mathbf {s}\rangle + \mathbf {e}) + u T^{m'}\mathbf {G}\). Let \(\mathbf {d} = \mathbf {G}^{-1}(u^{-1} \cdot \mathbf {c})\). We have:

$$\begin{aligned} \mathbf {d}^T \cdot \mathbf {C}&= \mathbf {d}^T \cdot (\mathbf {a}, \mathbf {a} s + \mathbf {e}) + u T^{m'} \mathbf {d}^T \mathbf {G} \\&= (\mathbf {d}^T \mathbf {a}, \mathbf {d}^T \mathbf {a} s + \mathbf {d}^T\mathbf {e}) + uu^{-1}T^{m'} \left( a, as + e + \left\lfloor \frac{Q}{t} \right\rceil T^m\right) \\&= \left( a', a's + e' + \left\lfloor \frac{Q}{t} \right\rceil T^{m+m'}\right) \end{aligned}$$

where \(a' = \mathbf {d}^T \mathbf {a} + aT^{m'}\) and \(e' = \mathbf {d}^T\mathbf {e} + eT^{m'}\). Each component of \(\mathbf {e}\) is independent and subgaussian with parameter \(E'\), and \(\mathbf {d}\) is a vector in \(\mathcal {R}_d^{2K}\), where each entry has binary coefficients. Thus, for every i, we have \(|\mathbf {d}_i| \le d\). Using the following Lemma 2, the error parameter follows.

Lemma 2

Let e be a \(\gamma \)-subgaussian variable over \(\mathcal {R}_d\) and \(\mathbf {e} = (\mathbf {e}_1,\ldots , \mathbf {e}_n)\) be a vector of independent \(\delta \)-subgaussian random variables over \(\mathcal {R}_d\). Let \(\mathbf {d}\) be a random variable over \(\mathcal {R}_d^n\) such that \(|\mathbf {d}_i|\le k\) for all i. If \(\mathbf {d}\) and \(\mathbf {e}\) are independent and e and \(\mathbf {e}\) are independent, then \(e + \langle \mathbf {d},\mathbf {e}\rangle \) is \(\sqrt{\gamma ^2 + k^2n\delta ^2}\)-subgaussian.

We first consider the case where \(e = 0\) and \(\mathbf {d}\) is a fixed vector instead of a random variable. For every \(b\in \mathcal {R}_d\) and every i, we have \( \mathrm{Tr}(\mathbf {d}_i\mathbf {e}_i b)/\Vert b\Vert \le k\mathrm{Tr}(\mathbf {e}_i (\mathbf {d}_ib))/\Vert \mathbf {d}_ib\Vert \) which is \((k\delta )\)-subgaussian. From the independence of the \(\mathbf {e}_i\), it follows that \(\mathrm{Tr}(\langle \mathbf {d},\mathbf {e}\rangle b)/\Vert b\Vert \) is \((\sqrt{n}k\delta )\)-subgaussian.

If \(\mathbf {d}\) and e are random variables independent of \(\mathbf {e}\), it holds for every \(b\in \mathcal {R}_d\) that

$$\begin{aligned}&\mathbb E[\exp (t\mathrm{Tr}(eb + \langle \mathbf {d}, \mathbf {e}\rangle b)/\Vert b\Vert )] \\&= \sum _{e^*, \mathbf {d}^*}P[e = e^*, \mathbf {d} = \mathbf {d}^*]\cdot \mathbb E[\exp (t\mathrm{Tr}(eb + \langle \mathbf {d}, \mathbf {e}\rangle b)/\Vert b\Vert )\mid e = e^*,\mathbf {d}=\mathbf {d}^*]\\&= \sum _{e^*, \mathbf {d}^*}P[e = e^*,\mathbf {d} = \mathbf {d}^*]\cdot \exp (t\mathrm{Tr}(e^*b)/\Vert b\Vert )\cdot \mathbb E[\exp (t\mathrm{Tr}(\langle \mathbf {d}^*, \mathbf {e}\rangle b)/\Vert b\Vert )]\\&\le \sum _{e^*,\mathbf {d}^*}P[e = e^*,\mathbf {d} = \mathbf {d}^*]\cdot \exp (t\mathrm{Tr}(e^*b)/\Vert b\Vert )\cdot \exp \bigl (t^2 nk^2\delta ^2/2\bigr )\\&= \mathbb E[t\exp (\mathrm{Tr}(eb)/\Vert b\Vert )]\cdot \exp \bigl (t^2 nk^2\delta ^2/2\bigr )\\&\le \exp \bigl (t^2\bigl (\gamma ^2 + nk^2\delta ^2\bigr )/2\bigr ) \end{aligned}$$

which concludes the proof.

Lemma 6

Algorithm 3 is correct and runs in time \(O(pq\log (pq)\log Q')\).

Proof

We can compute \(\mathrm{Tr}^*_{\mathcal {R}_{pq}/\mathcal {R}_p}(x)\) by examining p coefficients of x, and \(\mathrm{Tr}^*_{\mathcal {R}_p/\mathbb Z}(x)\) is simply the constant term of x. Thus, the runtime is dominated by the key-switch, which runs in time \(O(pq\log (pq)K)\). After the multiplication and key-switch, it holds that

$$ c\in \mathcal {R}_{pq}\mathsf {LWE}^{t|Q'}_{\mathbf {s}^{(pq)}}\Biggl (\sum _{i\in \mathbb Z_{pq}}Z^{m-i\bmod pq}; |F|\sqrt{E^2 + 3\sigma ^2p^2q^2K}\Biggr ) $$

since \(|f|\le |F|\). Using Lemma 1, the linearity of the trace function, and the fact that \(s'\in \mathcal {R}_p\), we conclude that after the trace,

$$ (a, b) \in \mathcal {R}_p\mathsf {LWE}^{t|Q'}_{s'}\Biggl ( \mathrm{Tr}_{\mathcal {R}_{pq}/\mathcal {R}_p}^*\Biggl (\sum _{i\in \mathbb Z_{pq}}F(i)Z^{m-i\bmod pq}\Biggr ); |F|\sqrt{E^2 + 3\sigma ^2p^2q^2K} \Biggr ) $$

It holds that

$$ \mathrm{Tr}^*(b) = \mathrm{Tr}^*(a\cdot s') + \left\lfloor Q'/t \right\rceil \mathrm{Tr}^*_{\mathcal {R}_{pq}/\mathbb Z}\Biggl (\sum _{i\in \mathbb Z_{pq}}F(i)Z^{m-i\bmod pq}\Biggr ) + \mathrm{Tr}^*(e) $$

and by Lemma 1, \(\mathrm{Tr}^*\bigl (\sum _{i\in \mathbb Z_{pq}}F(i)Z^{m-i\bmod pq}\bigr ) = F(j)\) if \(m = j\). Since \(\mathrm{Tr}^*(as) = a_0s_0 + \sum _{i=1}^{p-1}a_{p-i}s_i = \langle \mathbf {a}, \mathbf {s}\rangle \) and \(\mathrm{Tr}^*\) does not increase the error parameter, the correctness of our algorithm follows.

Lemma 7

Let A and B be independent subgaussian random variables on \(\mathcal {R}_p\) and \(\mathcal {R}_q\), respectively, with parameters \(\gamma \) and \(\delta \). Then, for every \(\lambda \in \mathbb {R}\), \(A\otimes B\) is subgaussian with parameter \(\sqrt{2\lambda }\gamma \delta \) except with probability \(2\min (p, q)\exp (-\lambda )\).Footnote 7

Proof

We want to show that for every \(y\in \mathcal {R}_{pq}\setminus \{0\}\), \(\mathrm{Tr}^*((A\otimes B)y)/\Vert y\Vert \) is subgaussian (except with a small probability). Let \(y\in \mathcal {R}_{pq}\setminus \{0\}\). We can write \(y = \sum _{i=0}^{q-1} y_i\otimes Y^i\). It holds that \(\Vert y\Vert = \sqrt{\sum _i \Vert y_i\Vert ^2}\). Thus,

$$ \frac{\mathrm{Tr}^*((A\otimes B)y)}{\Vert y\Vert } = \sum _i \frac{\mathrm{Tr}^*(Ay_i\otimes BY^i)}{\Vert y\Vert } = \sum _i \frac{\mathrm{Tr}^*(Ay_i)\cdot \mathrm{Tr}^*(BY^i)}{\Vert y\Vert } $$

Let \(E_i\) be the event that \(|\mathrm{Tr}^*(Ay_i)|\ge \sqrt{2\lambda }\gamma \Vert y_i\Vert \). Applying the subgaussian tail estimate, we conclude that for each i, \(p(E_i) \le 2\exp (-\lambda )\). By the union bound, it follows that, for \(E = \bigcup _i E_i\), \(p(E) \le 2q\exp (-\lambda )\). We now proceed similarly to the proof of Lemma 2. For every fixed value \(a\in \mathcal {R}_p\) such that \(\mathrm{Tr}^*(ay_i)\ < \sqrt{2\lambda }\gamma \Vert y_i\Vert \) for all i, we have

$$ \sum _i \frac{\mathrm{Tr}^*(ay_i)\cdot \mathrm{Tr}^*(BY^i)}{\Vert y\Vert } = \frac{\mathrm{Tr}^*\bigl (\sum _i B\mathrm{Tr}^*(ay_i)Y^i\bigr )}{\Vert y\Vert } $$

which is subgaussian with parameter

$$ \frac{\left\| \sum _i \mathrm{Tr}^*(ay_i)Y^i\right\| \delta }{\Vert y\Vert } = \frac{\sqrt{\sum _i \mathrm{Tr}^*(ay_i)^2}\delta }{\sqrt{\sum _j \Vert y_j\Vert ^2}} < \frac{\sqrt{2\lambda }\gamma \delta \sqrt{\sum _i \Vert y_i\Vert ^2}}{\sqrt{\sum _j \Vert y_j\Vert ^2}} = \sqrt{2\lambda }\gamma \delta $$

We can then use the independence of A and B to conclude that, conditioned on \(\overline{E}\), \(\mathrm{Tr}^*((A\otimes B)y)/\Vert y\Vert \) is \((\sqrt{2\lambda }\gamma \delta )\)-subgaussian, as claimed.

Using a similar argument, this time writing \(y = \sum _{i=0}^{p-1}X^i\otimes y_i\), it also follows that \(\mathrm{Tr}^*((A\otimes B)y)/\Vert y\Vert \) is \((\sqrt{2\lambda }\gamma \delta )\)-subgaussian except with probability \(2p\exp (-\lambda )\). This proves our claim.

Lemma 8

Algorithm 4 is correct and runs in time \(\varTheta (pq)\).

Proof

Let \(m' = \alpha qm_p + \beta p m_q\bmod pq\). It holds that \(m'\bmod p = m_p\) and \(m'\bmod q = m_q\). Thus, by the Chinese Remainder Theorem, \(m' = m\). Let \(s_p' = \psi _\alpha (s_p)\) and \(s_q' = \psi _\beta (s_q)\). Let us write \(b_p' = uX^{\alpha m_p} + e_p\) and \(b_q' = Q_{\otimes }Y^{\beta m_q}/t + e_q\). We have

$$\begin{aligned} tb_p\otimes b_q&= ta_ps_p \otimes b_q + tb_p'\otimes a_qs_q + tb_p'\otimes b_q'\\&= -ta_ps_p'\otimes a_qs_q' + ta_ps_p' \otimes b_q' + tb_p\otimes a_qs_q' + tb_p'\otimes b_q' \end{aligned}$$

and \(tb_p'\otimes b_q' = \left\lfloor Q_{\otimes }/t \right\rceil X^{\alpha m_p}\otimes Y^{\beta m_q} + X^{\alpha m_p}\otimes e_q + e_p\otimes Y^{\beta m_q} + te_p\otimes e_q\). Since \(X^{\alpha m_p}\otimes Y^{\alpha m_q} = Z^m\), the error term is

$$ e_{pq} = X^{\alpha m_p}\otimes e_q + e_p \otimes Y^{\beta m_q} + te_p\otimes e_q\text{. } $$

Since \(e_p\) and \(e_q\) are independent, the sum of the first two terms is subgaussian with parameter \(\sqrt{E_p^2 + E_q^2}\). The third term is subgaussian with parameter \(t\sqrt{2\lambda }E_pE_q\), except with probability \(2\min (p, q)\exp (-\lambda )\) by Lemma 7. In total, \(e_{pq}\) is subgaussian with parameter \(\sqrt{E_p^2 + E_q^2} + t\sqrt{2\lambda }E_pE_q\) except with probability \(2\min (p, q)\exp (-\lambda )\).

Thus, with \(\mathbf {a}= (a_p\otimes a_q, a_p\otimes b_q, b_p\otimes a_q)\) and \(\mathbf {s} = (-\psi _\alpha (s_p)\otimes \psi _\beta (s_q), \psi _\alpha (s_p)\otimes 1, 1 \otimes \psi _\beta (s_q)) = (-s_p' \otimes s_q', s_p' \otimes 1, 1 \otimes s_q')\), an easy computation shows that \(tb_p \otimes b_q - t\langle \mathbf {a}, \mathbf {s}\rangle = tb_p' \otimes b_q' = \left\lfloor Q_{\otimes }/t \right\rceil Z^m + e_{pq}\). The algorithm is correct.

The running time is dominated by the cost of tensoring the ring elements, which takes time \(\varTheta (pq)\).

Theorem 9

Algorithm 6 is correct and runs in time \(\varTheta (l K d\log d)\).

Proof

By induction, we prove that the error term of \(\mathbf {c}\) in the i-th iteration of the for-loop is of the form \(e_1 + e_2\) where \(e_1\) is \((2id\sqrt{K}\sigma )\)-subgaussian, and \(e_2 = \sum _{j=1}^i\left\langle \mathbf {d}^{(j)},\psi _{\mathbf {y}_j}(\mathbf {e}^{(j)})\right\rangle \) with the following properties: \(\mathbf {e}^{(j)}\) is the error vector of \(\mathbf {C}_j\), and \(\mathbf {d}^{(j)}\in \mathcal {R}_d^{2K}\) is a random vector with \(|\mathbf {d}^{(j)}_n|\le d\) that is independent of \(\mathbf {e}^{(k)}\) for all \(k\ge j\).

Clearly, our claim holds prior to the loop (with \(i = 0\)) since \(\mathbf {c}\) has no error term at this point. Suppose now that the claim holds for \(i-1\). Let \(\alpha = \mathbf {y}_i\) and \(\beta = \alpha ^{-1}\bmod d\). During the \(\mathsf {ExtExpMultAdd}\) operation, we first apply a \(\mathsf {Galois}\) operation, which results in an error term of \(\psi _\beta (e_1) + \psi _\beta (e_2)\). This is followed by a key-switch, which, by Lemma 4, changes the error to \(\psi _\beta (e_1) + \psi _\beta (e_2) + e_{ks, 1}\) where \(e_{ks, 1}\) is independent of \(\mathbf {e}^{(j)}\) for all j, and subgaussian with parameter \(\sqrt{K}d\sigma \). Next comes an \(\mathsf {ExtMult}\) operation which changes it to \(X^k\psi _\beta (e_1) + X^k\psi _\beta (e_2) + X^ke_{ks, 1} + \left\langle \mathbf {d},\mathbf {e}^{(i)}\right\rangle \) for some k, where \(\mathbf {e}\) is the error in \(\mathbf {C}_i\), and \(\mathbf {d}\in \mathcal {R}_d^{2K}\) is a random vector independent of \(\mathbf {e}^{(j)}\) for \(j\ge i\) which satisfies \(|\mathbf {d}_n| \le d\) for every n, by Lemma 5. After the second \(\mathsf {Galois}\) and key-switch, the error term becomes \(X^{\alpha k}e_1+X^{\alpha k}e_2+X^{\alpha k}\psi _\alpha (e_{ks,1})+\psi _\alpha \left( \left\langle \mathbf {d}, \mathbf {e}^{(i)}\right\rangle \right) +\psi _\alpha (e_{ks,2})\) where \(e_{ks,2}\) is again subgaussian with parameter \(\sqrt{K}d\sigma \). We can reorder the error terms and write

$$ \underbrace{X^{\alpha k}e_1 + X^{\alpha k}\psi _\alpha (e_{ks, 1}) + \psi _\alpha (e_{ks,2})}_{e_1'} + \underbrace{X^{\alpha k}e_2 + \psi _\alpha (\langle \mathbf {d}, \mathbf {e}^{(i)}\rangle )}_{e_2'} $$

By the induction hypothesis and since \(e_{ks,1}\) and \(e_{ks,2}\) are subgaussian with parameter \(d\sqrt{K}\sigma \), it follows that \(e_1'\) is \((2id\sqrt{K}\sigma )\)-subgaussian (because we do not assume that \(e_1\), \(e_{ks,1}\) and \(e_{ks,2}\) are independent). Finally, it holds that

$$ X^{\alpha k}e_2 = \sum _{j=1}^{i-1}\left\langle X^{\alpha k}\mathbf {d}^{(j)}, \psi _{\mathbf {y}_j}\left( \mathbf {e}^{(j)}\right) \right\rangle $$

and thus, setting \(\mathbf {d}'^{(j)} = X^{\alpha k}\mathbf {d}^{(j)}\) for \(j < i\) and \(\mathbf {d}'^{(i)} = \psi _{\mathbf {y}_i}(\mathbf {d})\), we have \(e_2' = \sum _{j=1}^i\left\langle \mathbf {d}'^{(j)},\psi _{\mathbf {y}_j}(\mathbf {e}^{(j))}\right\rangle \) which completes the induction step. Finally, by repeated applications of Lemma 2, we conclude that the error term in the output is subgaussian with parameter \(\sqrt{4K\ell ^2d^2\sigma ^2 + 2K\ell d^2E'^2}\).

It is easy to see that the algorithm has the claimed runtime by adding up the runtimes of the algorithms used in \(\mathsf {ExtExpMultAdd}\).

D Proofs for Section 5 (Joining the Building Blocks)

Theorem 10

Algorithm 7 is correct and runs in time \(\tilde{O}(n^2)\). Moreover, there exists \(Q = O(\gamma '|f|n^{6.5}\sigma ^{1.5})\) such that the output of \(\mathsf {EvalBootstrap}\) can be used as input for another execution of \(\mathsf {EvalBootstrap}\) with coefficients \(\gamma '_1,\ldots ,\gamma '_k\) such that \(\gamma ' = \sum _i|\gamma '_i|\) (with failure probability exponentially small in n).

Proof

It is straighforward to verify the error parameters for each step in the comments of the algorithm. There are two steps where failures might occur: the \(\mathsf {ExpCRT}\) step, and the \(\mathsf {FunExpExtract}\) step. The failure probability for \(\mathsf {ExpCRT}\) is \(2\exp (-\lambda )\). \(\mathsf {FunExpExtract}\) will not fail to extract the value of F, but if the error term in \(\mathbf {c}\) is too large, the output might not be an encryption of f(m). The subgaussian tail estimate guarantees that the failure probability is at most \(2\exp (-\lambda )\) if \(\sqrt{r^2\gamma ^2E_{\text {in}}^2 + p+1}\le pq/(2t\sqrt{2\lambda })\) where \(r = \left\lfloor pq/t \right\rceil /\left\lfloor Q'/t \right\rceil \). Since \(t\le \sqrt{pq}/4\) and \(\lambda \le q\), this condition is satisfied if \(\sqrt{r^2\gamma ^2E_{\text {in}}^2 + p+1}\le \sqrt{2p}\), or equivalently,

$$ \gamma E_{\text {in}}\le \underbrace{\sqrt{\frac{p-1}{r^2}}}_{T} = \varTheta \Biggl ( \sqrt{\frac{Q'^2}{pq^2}} \Biggr ) = \varTheta \Biggl ( \frac{Q}{n^2\sqrt{\sigma }} \Biggr ) $$

The runtime is dominated by \(\mathsf {ExtExpInner}\) and \(\mathsf {FunExpExtract}\), which run in time \(O(nKd\log d)\) and \(O(Kpq\log (pq))\), respectively. Given our asymptotic parameter choices, both of those are \(\tilde{O}(n^2)\).

If we want to use outputs of \(\mathsf {EvalBootstrap}\) as inputs for another execution of \(\mathsf {EvalBootstrap}\), where the absolute values of the coefficients sum up to \(\gamma '\), we require that \(\gamma 'E_{\text {out}}\le T\). From the asymptotic formulas for \(E_{\text {out}}\) and T, it is easy to see that this inequality can be satisfied by a Q in \(O(|f|\gamma ' n^{6.5}\sigma ^{1.5})\).

E More Details on Circulant LWE

1.1 E.1 Circulant LWE and Reduction to Ring-LWE

In all this subsection, we assume d to be prime. It is well known that the naive decisional version of Ring-LWE is insecure over circulant rings, simply by exploiting the CRT decomposition \(\mathcal {R}_d/Q\mathcal {R}_d \simeq \tilde{\mathcal {R}}_d/Q\tilde{\mathcal {R}}_d\times \mathbb Z/Q\mathbb Z\) when Q is coprime to d, and mounting an attack on the \(\mathbb Z/Q\mathbb Z\) part (projecting to this part corresponds to evaluating the polynomial at 1, and therefore maintain smallness of the error). However, this does not mean that such rings are inherently insecure: The NTRU cryptosystems [17, 23] use circulant rings, choosing the secret key and errors that evaluate to a fixed known value (say 0) at 1.

This suggests a strategy to construct a variant of Ring-LWE over circulant rings that would be as secure as the cyclotomic Ring-LWE, simply by lifting all elements \(\tilde{x} \in \tilde{\mathcal {R}}_d/Q\tilde{\mathcal {R}}_d\) to \(x \simeq (\tilde{x}, 0)\), yet this reverse CRT operation may not keep small elements small.

Instead, one can construct such a lift without working modulo Q, in order to preserve smallness of coefficients (up to some reasonable distortion). We also note that such a lift should actually start from the co-different ideal \(\tilde{\mathcal {R}}_d^\vee \), so as to match the Ring-LWE instances admitting worst-case hardness proofs [18], yet a reduction (with some loss on the error parameter) to Ring-LWE without the co-different was given in [33].

Because \(1-X\) and \(\varPhi _d(X)\) are not coprime over \(\mathbb Z[X]\) (their \(\gcd \) is d, not 1), we do not have a CRT decomposition of \(\mathcal {R}_d\) as \(\tilde{\mathcal {R}}_d \times \mathbb Z\). Yet, those polynomials are coprime over \(\mathbb {Q}[X]\) which allows to write

$$ K_d = \tilde{K}_d \times \mathbb {Q}$$

where \(K_d = \mathbb {Q}[X]/(X^d - 1)\) and \(\tilde{K}_d = \mathbb {Q}[X]/\varPhi _d(X)\).Footnote 8 We write L the canonical inclusion map \(L : \tilde{K}_d \rightarrow K_d\), which is explicitly given by

$$\begin{aligned} L: \sum _{i=0}^{d-1} a_i X^i \mapsto \sum _{i=0}^{d-1} a_i X^i - \frac{1}{d} (\sum _{i=0}^{d-1} a_i)(\sum _{i=0}^{d-1} X^i). \end{aligned}$$

Note that the above formula can be extended to a \(\mathbb {Q}\)-linear map \(K_d \rightarrow K_d\), viewing \(\tilde{K}_d\) as a subspace of \(K_d\) according to the above isomorphism \(K_d = \tilde{K}_d \times \mathbb {Q}\). This extension of L is the projection orthogonal to the all-1 vector in coefficient representation. Unfortunately the image \(L(\tilde{\mathcal {R}}_d)\) is not included in \(\mathcal {R}_d\): the projection does not maintain integrality of coefficients. Yet, one notes that a small ideal \(\mathfrak I \subset \tilde{\mathcal {R}}_d\) does have an integer lift: namely, the ideal \(\tilde{\mathfrak I} = (1-X) \mathcal {R}_d\) satisfies \(L(\tilde{ \mathfrak I}) \subset \mathcal {R}_d\). Moreover, for \(a \in \tilde{\mathfrak I}\), it holds that \(\sum a_i = 0\), in particular L preserves sizes of elements of \(\tilde{\mathfrak I}\).

Also consider the lift L taken modulo Q (assuming Q is coprime to d), simply replacing \(\frac{1}{d} \in \mathbb {Q}\) by the inverse of d in \(\mathbb Z/Q\mathbb Z\), denoted by \(L_Q\). Consider a Ring-LWE sample as defined in [33]: \((\tilde{a}, \tilde{b} = \tilde{a}\tilde{s} + \tilde{e}) \in (\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}})^2\) for small \(\tilde{s}, \tilde{e} \in \mathcal {R}\). We lift this sample to \(\mathcal {R}/Q\mathcal {R}\):

$$\begin{aligned} a = L_Q(\tilde{a}), b = L_Q((1-X)\tilde{b}). \end{aligned}$$
(11)

We define \(s = L((1-X)\tilde{s})\) and \(e = L((1-X)\tilde{s})\), and it holds that \(s = L_Q((1-X)\tilde{s}) \mod Q\) and \(e = L_Q((1-X)\tilde{e}) \mod Q\) since s and e are integral. Therefore,

$$\begin{aligned} b&= L_Q((1-X) \tilde{a}\cdot \tilde{s} + (1-X)\tilde{e}) \\&= L_Q(\tilde{a})\cdot L_Q((1-X)\tilde{s}) + L_Q((1-X)\tilde{e}) \\&= L_Q(\tilde{a}) s + e \bmod Q\\&= as + e \bmod Q \end{aligned}$$

We also note that se are still small since the operator norm of \(1-X\) is less than 2: these Circulant-LWE samples are useful.

It remains to explain what this transformation does to uniform samples \((\tilde{a}, \tilde{b}) \in (\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}})^2\). Assume that Q is coprime to d, it then holds that Q and \((1-X)\) are coprimes over the integral ring \(\tilde{\mathcal {R}}_d\). Therefore, the multiplication by \(1-X\) over \((\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}})\) is a bijection, so the sample \((\tilde{a}, (1-X) \tilde{b}) \in (\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}})^2\) is also uniform in \((\tilde{\mathcal {R}}/Q\tilde{\mathcal {R}})^2\). Finally, the lift \(L_Q\) is injective, so the final sample \((a, b) \in (\mathcal {R}/Q\mathcal {R})^2\) is uniform over \((L_Q(\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}}))^2\). One easily characterizes the image \(L_Q(\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}})\) of \(L_Q\) as the set \(\mathcal S_{d,Q}= \{\sum _{i=0}^{d-1} a_i X^i \mid \sum a_i = 0 \bmod Q\} \) of elements of \(\mathcal {R}/ Q\mathcal {R}\) whose coefficients sums to 0 modulo Q.

Lemma 3

(Hardness of Circulant-LWE). Assume that d is prime, and Q is coprime to d. If it is hard to distinguish samples \((\tilde{a}_i, \tilde{b}_i = \tilde{a}_i\tilde{s} + \tilde{e}_i) \in (\tilde{\mathcal {R}} / Q\tilde{\mathcal {R}})^2\) from uniform where \(\tilde{e}_i\) are independent random variables drawn from a distribution \(\psi \), then the samples \((a_i = L_Q(\tilde{a}_i), b_i = L_Q((1-X)\tilde{b}_i) \in \mathcal S_{d,Q}^2 \subset (\mathcal {R}/Q\mathcal {R})^2\) are also hard to distinguish from uniform samples in \(\mathcal S_{d,Q}^2\).

1.2 E.2 Simpler Error Distribution in CLWE for Practice

In practice, most FHE schemes do not follow precisely the Ring-LWE problem definition admitting reduction to worst-case problem [18, 34]. For example, HElib [7] uses Ring-LWE with spherical errors in the coefficient embedding, and very sparse ternary secrets, and ignoring the co-different ideal \(\mathcal {R}^\vee \). The TFHE scheme [15] also relies on Ring-LWE with ternary secrets, which is not know to reduce to the regular Ring-LWE. Cutting such corners appears quite crucial to error growth management and therefore efficiency. We will follow this approach, and define adjust the distributions as follows.

  • we proceed to sample secrets and error isotropically in \(S_{d,Q}\), while the above reduction leads to errors with a distortion factor \((1-X)\). This distortion seems to be an artefact of the proof, as it breaks symmetries: one could choose a different way of breaking those symmetries by replacing \(1-X\) by \(1-X^e\) for any e coprime to d. Respecting the symmetries seems a better idea in the light of recent analysis [32, 36].

    This variant could also be proved secure (with a loss of a constant factor about \(\sqrt{2}\) on the size of the error), simply by adding more noise to make it spherical again, using the convolution lemma of [35], but this would drag us away from the topic of this paper.

  • we choose to use ternary secrets s, which, as in previous schemes leads to serious performance improvements due to smaller error growth. It has recently been showed that such choices make lattice attacks somewhat faster [31], especially when s is very sparse: we will account for this refined analysis when measuring the concrete security of our proposed parameters.

Sampling of a. We sample a uniform in \(\mathcal {R}_d/(Q\mathcal {R}_d)\) under the constraint \(a(1) \mod Q = 0\) by choosing all the coefficients \(a_i\) at random for \(i\ge 1\), and setting \(a_0 = - \sum _{i>0} a_i \bmod Q\).

Sampling of s. When d is prime, we sample a a ternary s of density \(\delta = 2/3\) by choosing exactly \(\lfloor \delta d / 2 \rfloor \) coefficients set to 1 and \(\lfloor \delta d /2 \rfloor \) coefficients set to \(-1\). This implies that \(s(1) = 0\), and \(\Vert s\Vert ^2 = 2 \lfloor \delta d/2 \rfloor \). Indeed, we find it preferable to fix its length to avoid sampling sparse keys that would be subtentially weaker.

Sampling of e. We wish to sample errors e with variance \(\sigma \) in a way that ensures \(e(1) = 0\). We set:

$$\begin{aligned} e = \sum _{i=0}^{\sigma ^2 d/2} T^{a_i} - T^{b_i}, \end{aligned}$$

where the \(a_i\)’s and \(b_i\)’s’ are independant uniform exponents modulo d. One note that this distribution is invariant by permutation over \(\{1, T, \dots ,T^{d-1}\}\): we have preserved the symmetries of the ring. Note that this procedure would get rather slow for large \(\sigma \), yet we won’t exceed \(\sigma \le 8\) in our parameter choices.

Remark 6

The above procedure would not be adapted for composite degree d, as more care is required to construct a lift as done in Sect. E.1. Yet, while we will make use of circulant ring \(\mathcal {R}_d\) with composite degree \(d=pq\), we will never directly construct ciphertexts over that ring. Indeed, the ciphertext in \(\mathcal {R}_d\) will be publicly constructed by tensoring two ciphertexts from \(\mathcal {R}_p\) and \(\mathcal {R}_q\), and are therefore no easier to decrypt than the original ciphertexts over \(\mathcal {R}_p\) and \(\mathcal {R}_q\).

F Optimizations

In this section, we present some optimization of the scheme for practice. Our implementation does include the optimizations from Sects. F.1F.2 and F.3. We left out the optimization from Sect. F.4, which requires substential modifications to our code base.

1.1 F.1 Accelerating \(\mathsf {ExtExpInner}\)

Factoring \(\mathsf {Galois}\)-\(\mathsf {KeySwitch}\) Sequences. We note that it is possible to factor some operations when chaining \(\mathsf {ExtExpMultAdd}^\alpha \) and \(\mathsf {ExtExpMultAdd}^{\alpha '}\), by applying \(\mathsf {Galois}^{\alpha \beta '}\) rather than \(\mathsf {Galois}^{\alpha }\) followed by \(\mathsf {Galois}^{\beta }\) (together with the appropriate Key Switches), cf. Fig. 2.

Furthermore, if \(\mathbf {y} \in \mathbb Z_d^\ell \) contains repeated values, it is possible to re-index the inner product to make equal values contiguous, and skip useless \(\mathsf {Galois}^{1}\) operations. Those tricks also decrease the final error E by constant factors.

Pushing this trick to its limits, if \(\ell \) is large enough, one could re-index the inner product so that the \(\alpha \beta '\) all belong to a smallFootnote 9 subset \(\mathbb Z_d^*\), allowing to decrease the size of the key material. In combination with the following optimization, this should lead to reduce the overall key-size by a significant factor.

Fig. 2.
figure 2

Optimized \(\mathsf {ExtExpInner}\) (external inner product in exponent) overview

Decreasing \(\mathsf {LWE}\) Dimension. In our theoretical scheme, the homomorphic inner product in exponent operation is done over vectors of length \(\ell = p+1\) where p is the dimension of the secret in the LWE scheme.

In practice, we remark that this dimension is quite larger than needed for security, given the amount of noise and the modulus pq of those ciphertexts. We therefore proceed with an extra \(\mathsf {LWE}\) key-switch just the combination of the \(\mathsf {LWE}\) ciphertexts. In practice it allows to decrease the dimension by a factor between 2 and 3, which accelerates the \(\mathsf {ExtExpInner}\) operations by the same factor. As a small added bonus, it also slightly decreases the error in the ciphertexts outputted by this function.

1.2 F.2 Heuristic Error Propagation

Our theoretical analysis of the scheme used sub-gaussian analysis [21] to provide bounds on error propagation that are already significantly better than worst-case bounds. Yet those bounds are asymptotic, without explicit constants, and for some operations may not be perfectly tight. As in previous work [12, 15], when it comes to choose practical parameters, we rely on a tighter but heuristic analysis of error propagation, essentially treating all random variables as independent gaussians. More precisely, considering that the critical random variable for correctness is obtained as the sum of many random variables, we only compute its variance as the sum of the variance of its terms, and treat this final result as Gaussian in accordance with the central limit theorem (which is formally not applicable due to potential dependencies).

Linear Operations. For the linear operations \(\mathsf {Add}\), \(\mathsf {Mult}\) and \(\mathsf {Galois}\) operations, we use the same Eqs. (3), (8) as in our sub-gaussian analysis, since it is tight in this case, but apply it to the standard deviation of each variable rather than the sub-gaussianity parameter.

Modulus Switching. For our analysis, we needed to randomize the rounding step to ensure sub-gaussianity without resorting to the randomness of the input ciphertext. Instead, in practice we use deterministic rounding and account for the randomness of the input ciphertext. Treating the rounding errors as independent uniform random variables in the interval \([-\nicefrac 1 2, \nicefrac 1 2]\) allows to heuristically improve the error bound (4) down to

$$\begin{aligned} \mathsf {ModSwitch}: \mathcal {R}_d\mathsf {LWE}_{\mathbf {s}}^{t|Q}(m;\, E) \rightarrow \mathcal {R}_d\mathsf {LWE}_{\mathbf {s}}^{t|Q'} \Biggl (m;\, \sqrt{\frac{Q'^2}{Q^2} E^2 + \frac{\Vert \mathbf {s}\Vert ^2}{12}} \Biggr ) \end{aligned}$$
(12)

Key Switching, External Multiplication and Inner Product in the Exponent. We first note that, according to Remark 2, the bounds given by (5) and (6) must be amended to account for the use of a Gadget matrix in base B rather than in base 2. Additionally, we note that this bound accounts for the worst output of \(\mathbf {G}^{-1}\). Instead, we treat the output of \(\mathbf {G}^{-1}\) as a uniform random vectors with coordinates uniform in the integer interval \(I_B = \{- \left\lfloor \frac{B-1}{2}\right\rfloor , \dots , \left\lceil \frac{B-1}{2} \right\rceil \}\). Each such coordinate has variance \(V_B = \frac{1}{B} \sum _{i \in I_B} i^2 \approx B^2 / 12\).

For our heuristic analysis, we therefore amend (5) to

$$\begin{aligned} \mathsf {KeySwitch}: \mathcal {R}_d\mathsf {LWE}_{\mathbf {s}}^{t|Q}(m;\, E) \rightarrow \mathcal {R}_d\mathsf {LWE}_{s'}^{t|Q} \left( m;\, \sqrt{E^2 + \sigma ^2 dn KV_B} \right) . \end{aligned}$$
(13)

Similarly, (6) is heuristically changed to

(14)

Note that assuming independence decreased the factor \(d^2\) to a factor d. Similarly, a factor \(4\ell ^2\) can be decreased to \(2\ell \), ignoring the potential dependences discussed in Remark 5. The trick described in Sect. F.1 further decreases this \(2\ell \) factor to \(\ell \).

In conclusion, the accumulated error in the error propagation of the whole \(\mathsf {ExtExpInner}\) operation (10) is now heuristically given by:

$$\begin{aligned} \mathsf {ExtExpInner}: \bigoplus _{i = 1}^ \ell \mathcal {R}_d\mathsf {GSW}_{s}^{t|Q}(T^{x_i};\, E') \rightarrow \mathcal {R}_d\mathsf {LWE}_{s}^{t|Q} \left( T^{\langle \mathbf {x}, \mathbf {y}\rangle };\, \sqrt{dK \ell V (\sigma ^2 + E'^2)} \right) . \end{aligned}$$
(15)

Tensoring. Looking only at the variance of individual coefficient, one may save the factor \(\sqrt{2\lambda }\) in the error propagation of \(\mathsf {ExpCRT}\), namely, (9) becomes:

(16)

We could successfully confirm all these heuristic equations by measuring the actual errors in our implementation.

1.3 F.3 Amortising \(\mathsf {FunExpExtract}\)

The costly steps of the \(\mathsf {FunExpExtract}\) algorithm consist in computing

$$\begin{aligned} c^{(pq)} \mapsto \mathrm{Tr}_{\mathcal {R}_{pq}/\mathcal {R}_p}^*(f \cdot \mathbf {G}^{-T}(c^{(pq)}) \cdot \mathbf {S}) \end{aligned}$$

where f represent the function F to extract, \(\mathbf {S}\) is a Key-Switching Key (See Fig. 1 and Algorithm 7). We note here that the most expensive part of the computation \(\mathbf {G}^{-T}(c^{(pq)}) \cdot \mathbf {S}\) can be re-used for up to several different f’s.

This amortization allows to extend our technique so that not only the input of the function is large, but also its output.

1.4 F.4 Accelerating \(\mathsf {FunExpExtract}\)

As mentioned above, the practical cost of the \(\mathsf {FunExpExtract}\) step as described in Sect. 4 is prohibitive. The costly steps consist in the computation of

$$\begin{aligned} \mathrm{Tr}_{\mathcal {R}_{pq}/\mathcal {R}_p}^*(f \cdot \mathbf {G}^{-T}(\mathbf {x} \otimes \mathbf {y}) \cdot \mathbf {S}) \end{aligned}$$

where f represent the function F to extract, \(\mathbf {x}\), \(\mathbf {y}\) are the ciphertexts outputted by \(\mathsf {ExtExpInner}\), and \(\mathbf {S}\) is a Key-Switching Key. Naively, even using precomputations of f and \(\mathbf {S}\), this operation would require \(4K + 1\) FFT’s in dimension pq: one forward FFT for each component of \(\mathbf {G}^{-1}(\mathbf {c})\), and one FFT backward.Footnote 10 We here show how to get completly rid of those large FFT’s, requiring only small FFT’s (dimension p and q) and a few additions of vectors of dimension pq.

FFT of Pure Tensors. To tackle these costly FFT operations, one should first note that FFT and \(\otimes \) can be commuted. Indeed, one may first rewrite \(x \otimes y = (x \otimes 1) \cdot (1 \otimes y)\), and note that the FFT coefficients of \(x \otimes 1 \in \mathcal {R}_{pq}\) are easily derived from the FFT coefficients of \(x \in \mathcal {R}_p\) by simply repeating the coefficients q times (and similarly for \(1 \otimes y\)). This remark allows us to decrease the naive cost of the FFT operation over pure tensors from \(\varTheta (pq \log pq)\) to \(\varTheta (pq + p \log p + q \log q)\).

The CRT-Gadget. To provide an asymptotic improvement for gadget inversion of pure tensors, we need to rely on a different Gadget matrix construction, based on the Chinese-Remainder Theorem. We describe it over the integers \(\mathbb Z\), yet it naturally extends coefficient-wise to any ring \(\mathcal {R}_d\).

Consider a modulus Q such that we can write \(Q = \prod _{i=1}^K q_i\) where the \(q_i\) are small coprime integers. Consider the CRT isomorphism \(\mu : r \in \mathbb Z_Q \mapsto (r\bmod q_1,\ldots , r\bmod q_K)\), and let \(\mathbf {g} \in \mathbb Z^K_Q\) be the vector of the Bezout coefficients, i.e., the coefficients such that \(\mu ^{-1}(\mathbf {x}) = \mathbf {x}^T\mathbf {g} \bmod Q\). This gadget also permits to efficiently find small pre-images. Indeed, define: \(\mathbf {g}^{-T}(x) = (x_1, \dots x_K) \in \mathbb Z^K\) where \(x_i\) is the representative of \(x \bmod q_i\) in the range \((-q_i/2, q_i/2]\).

Gadget Inversion of Pure Tensors (in FFT Format). This new gadget has the advantage that gadget inversion is somewhat homomorphic. Let us write \(\odot \) for the coefficient-wise product of vectors. While in general we have \(\mathbf {g}^{-T}(xy) \ne \mathbf {g}^{-T}(x) \odot \mathbf {g}^{-T}(y)\), it nevertheless holds that

$$\begin{aligned} (\mathbf {g}^{-T}(x) \odot \mathbf {g}^{-T}(y)) \mathbf {g} = xy \bmod Q . \end{aligned}$$

It also hold that \(\mathbf {g}^{-T}(x) \odot \mathbf {g}^{-T}(y)\) is rather small, namely, its i-th coefficient has absolute value less than \(q_i^2 / 4\). This will allow us, at the cost of increased error propagation, to swap the gadget-inversion and the tensoring.

More precisely, we define

$$\mathbf {g}^{-T}_\otimes (x, y) = (\mathbf {g}^{-T}(x)_i \otimes \mathbf {g}^{-T}(y)_i)_{i=1 \dots k}, $$

and note that it is a proper gadget inversion: \(\mathbf {g}^{-T}_\otimes (x, y)\mathbf {g} = x\otimes y \bmod Q\), and the coefficients of \(\mathbf {g}^{-T}_\otimes (x, y)_i\) are less than \(q_i^2 / 4\).

For inputs \((x,y)\in \mathcal {R}_p \times \mathcal {R}_q\) One may compute \(g^{-T}_\otimes (x, y)\) in FFT format in time \(\varTheta (K pq + K p\log p + K q \log q)\), that is in time linear in the size of the output. Indeed, one may compute each \((\mathbf {g}^{-T}(x)_i, \mathbf {g}^{-T}(y)_i)\), convert them to FFT format, and then only perform the tensoring step using the remark above. In comparison, the naive algorithm would have cost \(\varTheta (K pq \log pq)\): asymptotically, our new trick improves the complexity by a logarithmic factor \(\varTheta (\log pq)\). The impact in practice may quite substantial also considering the large hidden constants in FFT operations.

Tracing Down in the FFT Domain. At last, we note that the trace operation \(\mathrm{Tr}_{\mathcal {R}_{pq}/\mathcal {R}_p}^*\) can also be performed directly in the FFT domain in time \(\varTheta (pq)\) by summing the appropriate FFT coefficients. The allows to replace the final large backward FFT (in dimension pq) by a cheap backward FFT in dimension p. The cost of this step decreases form \(\varTheta (pq \log pq)\) down to \(\varTheta (pq + p\log p)\).

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bonnoron, G., Ducas, L., Fillinger, M. (2018). Large FHE Gates from Tensored Homomorphic Accumulator. In: Joux, A., Nitaj, A., Rachidi, T. (eds) Progress in Cryptology – AFRICACRYPT 2018. AFRICACRYPT 2018. Lecture Notes in Computer Science(), vol 10831. Springer, Cham. https://doi.org/10.1007/978-3-319-89339-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-89339-6_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89338-9

  • Online ISBN: 978-3-319-89339-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics