Advertisement

Accelerating RSA with Fine-Grained Parallelism Using GPU

  • Yang Yang
  • Zhi Guan
  • Huiping Sun
  • Zhong Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9065)

Abstract

RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.

Keywords

RSA GPGPU CUDA Montgomery Multiplication CRT 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    NVIDIA kepler-GK110-architecture whitepaper, http://www.nvidia.com/object/nvidia-kepler.html
  2. 2.
    Netcrafts SSL survey. Tech. rep. (2013), http://news.netcraft.com/SSL-survey
  3. 3.
    CUDA c programming guide 6.5 (August 2014), http://docs.nvidia.com/cuda/cuda-c-programming-guide/
  4. 4.
    Barker, E., Roginsky, A.: Transitions: Recommendation for transitioning the use of cryptographic algorithms and key lengths. NIST Special Publication 800, 131 (2011), http://www.gocs.eu/pages/fachberichte/archiv/075-sp800-131A.pdf Google Scholar
  5. 5.
    Cui, S., Großschädl, J., Liu, Z., Xu, Q.: High-speed elliptic curve cryptography on the NVIDIA GT200 graphics processing unit. In: Huang, X., Zhou, J. (eds.) ISPEC 2014. LNCS, vol. 8434, pp. 202–216. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  6. 6.
    Dussé, S.R., Kaliski Jr., B.S.: A cryptographic library for the motorola DSP 56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991)CrossRefGoogle Scholar
  7. 7.
    Harrison, O., Waldron, J.: Efficient acceleration of asymmetric cryptography on graphics hardware. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 350–367. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Jang, K., Han, S., Han, S., Moon, S.B., Park, K.: SSLShader: Cheap SSL acceleration with commodity processors. In: NSDI (2011)Google Scholar
  9. 9.
    Koc, C.K.: High-speed RSA implementation. Tech. rep., Technical Report, RSA Laboratories (1994)Google Scholar
  10. 10.
    Koc, E.K., Acar, T.: Analyzing and comparing montgomery multiplication algorithms. IEEE Micro 16, 26–33 (1996)CrossRefGoogle Scholar
  11. 11.
    Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 451–460. ACM, New York (2010)Google Scholar
  12. 12.
    Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications, ICSPC 2007, pp. 65–68 (November 2007)Google Scholar
  13. 13.
    Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  15. 15.
    Neves, S., Araujo, F.: On the performance of GPU public-key cryptography. In: 2011 IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 133–140 (September 2011)Google Scholar
  16. 16.
    Quisquater, J.-J., Couvreur, C.: Fast decipherment algorithm for RSA public-key cryptosystem. Electronics Letters 18(21), 905–907 (1982)CrossRefGoogle Scholar
  17. 17.
    Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Szerwinski, R., Güneysu, T.: Exploiting the power of gPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  19. 19.
    Yang, Y., Guan, Z., Zhu, J., Dong, Q., Chen, Z.: Accelerating AES in javaScript with webGL. In: Qing, S., Zhou, J., Liu, D. (eds.) ICICS 2013. LNCS, vol. 8233, pp. 275–287. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  20. 20.
    Yanik, T., Savas, E., Koc, C.K.: Incomplete reduction in modular arithmetic. IEE Proceedings Computers and Digital Techniques 149(2), 46–52 (2002)CrossRefGoogle Scholar
  21. 21.
    Zhao, L., Iyer, R., Makineni, S., Bhuyan, L.: Anatomy and performance of SSL processing. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2005, pp. 197–206 (2005)Google Scholar
  22. 22.
    Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of gPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Heidelberg (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yang Yang
    • 1
    • 2
    • 3
  • Zhi Guan
    • 1
    • 2
    • 3
  • Huiping Sun
    • 2
    • 3
    • 4
  • Zhong Chen
    • 1
    • 2
    • 3
  1. 1.Institute of Software, School of EECSPeking UniversityBeijingChina
  2. 2.MoE Key Lab of High Confidence Software Technologies (PKU)BeijingChina
  3. 3.MoE Key Lab of Network and Software Security Assurance (PKU)BeijingChina
  4. 4.School of Software and MicroelectronicsPeking UniversityBeijingChina

Personalised recommendations