Abstract
RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
NVIDIA kepler-GK110-architecture whitepaper, http://www.nvidia.com/object/nvidia-kepler.html
Netcrafts SSL survey. Tech. rep. (2013), http://news.netcraft.com/SSL-survey
CUDA c programming guide 6.5 (August 2014), http://docs.nvidia.com/cuda/cuda-c-programming-guide/
Barker, E., Roginsky, A.: Transitions: Recommendation for transitioning the use of cryptographic algorithms and key lengths. NIST Special Publication 800, 131 (2011), http://www.gocs.eu/pages/fachberichte/archiv/075-sp800-131A.pdf
Cui, S., Großschädl, J., Liu, Z., Xu, Q.: High-speed elliptic curve cryptography on the NVIDIA GT200 graphics processing unit. In: Huang, X., Zhou, J. (eds.) ISPEC 2014. LNCS, vol. 8434, pp. 202–216. Springer, Heidelberg (2014)
Dussé, S.R., Kaliski Jr., B.S.: A cryptographic library for the motorola DSP 56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991)
Harrison, O., Waldron, J.: Efficient acceleration of asymmetric cryptography on graphics hardware. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 350–367. Springer, Heidelberg (2009)
Jang, K., Han, S., Han, S., Moon, S.B., Park, K.: SSLShader: Cheap SSL acceleration with commodity processors. In: NSDI (2011)
Koc, C.K.: High-speed RSA implementation. Tech. rep., Technical Report, RSA Laboratories (1994)
Koc, E.K., Acar, T.: Analyzing and comparing montgomery multiplication algorithms. IEEE Micro 16, 26–33 (1996)
Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 451–460. ACM, New York (2010)
Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications, ICSPC 2007, pp. 65–68 (November 2007)
Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985)
Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007)
Neves, S., Araujo, F.: On the performance of GPU public-key cryptography. In: 2011 IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 133–140 (September 2011)
Quisquater, J.-J., Couvreur, C.: Fast decipherment algorithm for RSA public-key cryptosystem. Electronics Letters 18(21), 905–907 (1982)
Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)
Szerwinski, R., Güneysu, T.: Exploiting the power of gPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)
Yang, Y., Guan, Z., Zhu, J., Dong, Q., Chen, Z.: Accelerating AES in javaScript with webGL. In: Qing, S., Zhou, J., Liu, D. (eds.) ICICS 2013. LNCS, vol. 8233, pp. 275–287. Springer, Heidelberg (2013)
Yanik, T., Savas, E., Koc, C.K.: Incomplete reduction in modular arithmetic. IEE Proceedings Computers and Digital Techniques 149(2), 46–52 (2002)
Zhao, L., Iyer, R., Makineni, S., Bhuyan, L.: Anatomy and performance of SSL processing. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2005, pp. 197–206 (2005)
Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of gPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Heidelberg (2014)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, Y., Guan, Z., Sun, H., Chen, Z. (2015). Accelerating RSA with Fine-Grained Parallelism Using GPU. In: Lopez, J., Wu, Y. (eds) Information Security Practice and Experience. ISPEC 2015. Lecture Notes in Computer Science(), vol 9065. Springer, Cham. https://doi.org/10.1007/978-3-319-17533-1_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-17533-1_31
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-17532-4
Online ISBN: 978-3-319-17533-1
eBook Packages: Computer ScienceComputer Science (R0)