In scientific computing and cryptography, there are many applications that involve large integer multiplication, which is a time-consuming operation. To reduce the computational complexity, number theoretic transform is widely used, wherein the multiplication can be performed in the frequency domain with reduced complexity. However, the speed performance of large integer multiplication is still not satisfactory if the operand size is very large (e.g., more than 100K-bit). In view of that, several researchers had proposed to accelerate the implementation of number theoretic transform using massively parallel GPU architecture. In this paper, we proposed several techniques to improve the performance of number theoretic transform implementation, which is faster than the state-of-the-art work by Dai et al. The proposed techniques include register-based twiddle factors storage and multi-stream asynchronous computation, which leverage on the features offered in new GPU architectures. The proposed number theoretic transform implementation was applied to CMNT fully homomorphic encryption scheme proposed by Coron et al. With the proposed implementation technique, homomorphic multiplications in CMNT take 0.27 ms on GTX1070 desktop GPU and 7.49 ms in Jetson TX1 embedded system, respectively. This shows that the proposed implementation is suitable for practical applications in server environment as well as embedded system.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Harvey D (2015) Computing zeta functions of arithmetic schemes. Proc Lond Math Soc 111(6):1379–1401
Wang W, Hu Y, Chen L, Huang X, Sunar B (2015) Exploring the feasibility of fully homomorphic encryption. IEEE Trans Comput 64(3):698–706
Wei D, Berk S (2015) cuHE: a homomorphic encryption accelerator library. International conference on cryptography and information security in the balkans. Springer, Berlin
Öztürk E, Doröz Y, Savas E, Sunar B (2016) A custom accelerator for homomorphic encryption applications. IEEE Trans Comput 66(1):3–16
Gentry C (2009) A fully homomorphic encryption scheme. Ph.D. thesis, Stanford University
Van Dijk M, Gentry C, Halevi S, Vaikuntanathan V (2010) Fully homomorphic encryption over the integers. In: Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp 24–43
Coron J-S, Naccache D, Tibouchi M (2012) Fully homomorphic encryption over the integers with shorter public keys. Advances in cryptology—CRYPTO 2011. CRYPTO. Lecture Notes in Computer Science, vol 6841. Springer, Berlin, Heidelberg, 2011
Schönhage A, Strassen V (1971) Schnelle Multiplikation grosser Zahlen. Computing 7:281–292
Gentry C, Halevi S (2011) Implementing Gentry’s fully-homomorphic encryption scheme. In: Proceedings advances in cryptology-EUROCRYPT, pp 129–148
CUDA Homomorphic Encryption Library. https://github.com/vernamlab/cuHE. Accessed 1 Apr 2019
Doröz Y, Shalverdi A, Eisenbarth T, Sunar B (2014) Toward practical homomorphic evaluation of block ciphers using prince. In: 2nd workshop on applied homomorphic cryptography and encrypted computing
Cooley JW, Tukey JW (1965) An algorithm for the machine calculation of complex Fourier series. Math Comput 19(90):297–301
Barrett P (2006) Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor. In: Advances in cryptology—CRYPTO’ 86. Lecture Notes in Computer Science, vol 263, pp 311–323
Li F, Ye Y, Tian Z, Zhang X (2018) CPU versus GPU: which can perform matrix computation faster–performance comparison for basic linear algebra subprograms. Neural Comput Appl 31:4353–4365
Hernández DE, Olague G, Hernández B, Clemente E (2018) CUDA-based parallelization of a bio-inspired model for fast object classification. Neural Comput Appl 30(10):3007–3018
Emmart N, Weems CC (2011) High precision integer multiplication with a GPU using Strassen’s algorithm with multiple FFT sizes. Parallel Process Lett 21(3):359–375
Kepler—The World’s fastest, most efficient hpc architecture. https://www.NVIDIA.in/object/NVIDIA-kepler-in.html. Accessed 21 Apr 2019
Fully-Homomorphic-DGHV-and-Variants. https://github.com/deevashwer/Fully-Homomorphic-DGHV-and-Variants. Accessed 1 Apr 2019
This project is funded by Universiti Tunku Abdul Rahman Research Fund (UTARRF) under the grant number IPSR/RMC/UTARRF/2016-C1/G1. Part of the research work is funded by Fundamental Research Grant Scheme (FRGS), Malaysia, under project number FRGS/1/2018/STG06/UTAR/03/1.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Goey, JZ., Lee, WK., Goi, BM. et al. Accelerating number theoretic transform in GPU platform for fully homomorphic encryption. J Supercomput 77, 1455–1474 (2021). https://doi.org/10.1007/s11227-020-03156-7
- Number theoretic transform
- Homomorphic encryption
- Graphics processing unit