Accelerating number theoretic transform in GPU platform for fully homomorphic encryption


In scientific computing and cryptography, there are many applications that involve large integer multiplication, which is a time-consuming operation. To reduce the computational complexity, number theoretic transform is widely used, wherein the multiplication can be performed in the frequency domain with reduced complexity. However, the speed performance of large integer multiplication is still not satisfactory if the operand size is very large (e.g., more than 100K-bit). In view of that, several researchers had proposed to accelerate the implementation of number theoretic transform using massively parallel GPU architecture. In this paper, we proposed several techniques to improve the performance of number theoretic transform implementation, which is faster than the state-of-the-art work by Dai et al. The proposed techniques include register-based twiddle factors storage and multi-stream asynchronous computation, which leverage on the features offered in new GPU architectures. The proposed number theoretic transform implementation was applied to CMNT fully homomorphic encryption scheme proposed by Coron et al. With the proposed implementation technique, homomorphic multiplications in CMNT take 0.27 ms on GTX1070 desktop GPU and 7.49 ms in Jetson TX1 embedded system, respectively. This shows that the proposed implementation is suitable for practical applications in server environment as well as embedded system.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5


  1. 1.

    Harvey D (2015) Computing zeta functions of arithmetic schemes. Proc Lond Math Soc 111(6):1379–1401

    MathSciNet  Article  Google Scholar 

  2. 2.

    Wang W, Hu Y, Chen L, Huang X, Sunar B (2015) Exploring the feasibility of fully homomorphic encryption. IEEE Trans Comput 64(3):698–706

    MathSciNet  Article  Google Scholar 

  3. 3.

    Wei D, Berk S (2015) cuHE: a homomorphic encryption accelerator library. International conference on cryptography and information security in the balkans. Springer, Berlin

    Google Scholar 

  4. 4.

    Öztürk E, Doröz Y, Savas E, Sunar B (2016) A custom accelerator for homomorphic encryption applications. IEEE Trans Comput 66(1):3–16

    MathSciNet  Article  Google Scholar 

  5. 5.

    Gentry C (2009) A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University

  6. 6.

    Van Dijk M, Gentry C, Halevi S, Vaikuntanathan V (2010) Fully homomorphic encryption over the integers. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp 24–43

  7. 7.

    Coron J-S, Naccache D, Tibouchi M (2012) Fully homomorphic encryption over the integers with shorter public keys. Advances in cryptology—CRYPTO 2011. CRYPTO. Lecture Notes in Computer Science, vol 6841. Springer, Berlin, Heidelberg, 2011

  8. 8.

    Schönhage A, Strassen V (1971) Schnelle Multiplikation grosser Zahlen. Computing 7:281–292

    MathSciNet  Article  Google Scholar 

  9. 9.

    Gentry C, Halevi S (2011) Implementing Gentry’s fully-homomorphic encryption scheme. In: Proceedings advances in cryptology-EUROCRYPT, pp 129–148

  10. 10.

    CUDA Homomorphic Encryption Library. Accessed 1 Apr 2019

  11. 11.

    Doröz Y, Shalverdi A, Eisenbarth T, Sunar B (2014) Toward practical homomorphic evaluation of block ciphers using prince. In: 2nd workshop on applied homomorphic cryptography and encrypted computing

  12. 12.

    Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301

    MathSciNet  Article  Google Scholar 

  13. 13.

    Barrett P (2006) Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Advances in cryptology—CRYPTO’ 86. Lecture Notes in Computer Science, vol 263, pp 311–323

  14. 14.

    Li F, Ye Y, Tian Z, Zhang X (2018) CPU versus GPU: which can perform matrix computation faster–performance comparison for basic linear algebra subprograms. Neural Comput Appl 31:4353–4365

    Article  Google Scholar 

  15. 15.

    Hernández DE, Olague G, Hernández B, Clemente E (2018) CUDA-based parallelization of a bio-inspired model for fast object classification. Neural Comput Appl 30(10):3007–3018

    Article  Google Scholar 

  16. 16.

    Emmart N, Weems CC (2011) High precision integer multiplication with a GPU using Strassen’s algorithm with multiple FFT sizes. Parallel Process Lett 21(3):359–375

    MathSciNet  Article  Google Scholar 

  17. 17.

    Kepler—The World’s fastest, most efficient hpc architecture. Accessed 21 Apr 2019

  18. 18.

    Fully-Homomorphic-DGHV-and-Variants. Accessed 1 Apr 2019

Download references


This project is funded by Universiti Tunku Abdul Rahman Research Fund (UTARRF) under the grant number IPSR/RMC/UTARRF/2016-C1/G1. Part of the research work is funded by Fundamental Research Grant Scheme (FRGS), Malaysia, under project number FRGS/1/2018/STG06/UTAR/03/1.

Author information



Corresponding author

Correspondence to Wai-Kong Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Goey, JZ., Lee, WK., Goi, BM. et al. Accelerating number theoretic transform in GPU platform for fully homomorphic encryption. J Supercomput 77, 1455–1474 (2021).

Download citation


  • Number theoretic transform
  • Homomorphic encryption
  • Graphics processing unit
  • Cryptography