Skip to main content

Accelerating RSA with Fine-Grained Parallelism Using GPU

  • Conference paper
Information Security Practice and Experience (ISPEC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9065))

Abstract

RSA is a public key cryptography widely used for end-to-end authentication and key exchange in various Internet protocols, such as SSL and TLS. Compared with symmetric cryptography, the cryptographic operations in RSA is much more time consuming. This brings pressure on performance to service providers using secure protocols, and hinders these protocols from being more widely used. Graphics Processing Units (GPUs) are increasingly used for intensive data parallelism general purpose computing. GPUs often provide better throughput than CPUs at the same cost. In this paper, we propose a new approach to parallelize Montgomery multiplication under the Single Instruction Multiple Thread (SIMT) threading model of GPUs, and construct a parallel RSA implementation based on this approach, combining with other optimization techniques both in the algorithmic level and implementation level. The performance evaluation shows our RSA implementation achieves a record-breaking latency for RSA decryption implementations on GPUs: 2.6 ms for RSA-1024 and 6.5 ms for RSA-2048. The peak throughtput of decryptions per second of our implementation reaches 5,244 for RSA-2048 and 34,981 for RSA-1024 respectively, which is much faster than existing integer-based implementations. The peak throughput of our implementation is slightly slower than the fastest floating-point based implementation, while the latency of our implementation is 3 times faster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. NVIDIA kepler-GK110-architecture whitepaper, http://www.nvidia.com/object/nvidia-kepler.html

  2. Netcrafts SSL survey. Tech. rep. (2013), http://news.netcraft.com/SSL-survey

  3. CUDA c programming guide 6.5 (August 2014), http://docs.nvidia.com/cuda/cuda-c-programming-guide/

  4. Barker, E., Roginsky, A.: Transitions: Recommendation for transitioning the use of cryptographic algorithms and key lengths. NIST Special Publication 800, 131 (2011), http://www.gocs.eu/pages/fachberichte/archiv/075-sp800-131A.pdf

    Google Scholar 

  5. Cui, S., Großschädl, J., Liu, Z., Xu, Q.: High-speed elliptic curve cryptography on the NVIDIA GT200 graphics processing unit. In: Huang, X., Zhou, J. (eds.) ISPEC 2014. LNCS, vol. 8434, pp. 202–216. Springer, Heidelberg (2014)

    Chapter  Google Scholar 

  6. Dussé, S.R., Kaliski Jr., B.S.: A cryptographic library for the motorola DSP 56000. In: Damgård, I.B. (ed.) EUROCRYPT 1990. LNCS, vol. 473, pp. 230–244. Springer, Heidelberg (1991)

    Chapter  Google Scholar 

  7. Harrison, O., Waldron, J.: Efficient acceleration of asymmetric cryptography on graphics hardware. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 350–367. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  8. Jang, K., Han, S., Han, S., Moon, S.B., Park, K.: SSLShader: Cheap SSL acceleration with commodity processors. In: NSDI (2011)

    Google Scholar 

  9. Koc, C.K.: High-speed RSA implementation. Tech. rep., Technical Report, RSA Laboratories (1994)

    Google Scholar 

  10. Koc, E.K., Acar, T.: Analyzing and comparing montgomery multiplication algorithms. IEEE Micro 16, 26–33 (1996)

    Article  Google Scholar 

  11. Lee, V.W., Kim, C., Chhugani, J., Deisher, M., Kim, D., Nguyen, A.D., Satish, N., Smelyanskiy, M., Chennupaty, S., Hammarlund, P., Singhal, R., Dubey, P.: Debunking the 100x GPU vs. CPU myth: An evaluation of throughput computing on CPU and GPU. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 451–460. ACM, New York (2010)

    Google Scholar 

  12. Manavski, S.A.: CUDA compatible GPU as an efficient hardware accelerator for AES cryptography. In: IEEE International Conference on Signal Processing and Communications, ICSPC 2007, pp. 65–68 (November 2007)

    Google Scholar 

  13. Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44(170), 519–521 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  14. Moss, A., Page, D., Smart, N.P.: Toward acceleration of RSA using 3D graphics hardware. In: Galbraith, S.D. (ed.) Cryptography and Coding 2007. LNCS, vol. 4887, pp. 364–383. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  15. Neves, S., Araujo, F.: On the performance of GPU public-key cryptography. In: 2011 IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 133–140 (September 2011)

    Google Scholar 

  16. Quisquater, J.-J., Couvreur, C.: Fast decipherment algorithm for RSA public-key cryptosystem. Electronics Letters 18(21), 905–907 (1982)

    Article  Google Scholar 

  17. Rivest, R.L., Shamir, A., Adleman, L.: A method for obtaining digital signatures and public-key cryptosystems. Commun. ACM 21(2), 120–126 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  18. Szerwinski, R., Güneysu, T.: Exploiting the power of gPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  19. Yang, Y., Guan, Z., Zhu, J., Dong, Q., Chen, Z.: Accelerating AES in javaScript with webGL. In: Qing, S., Zhou, J., Liu, D. (eds.) ICICS 2013. LNCS, vol. 8233, pp. 275–287. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  20. Yanik, T., Savas, E., Koc, C.K.: Incomplete reduction in modular arithmetic. IEE Proceedings Computers and Digital Techniques 149(2), 46–52 (2002)

    Article  Google Scholar 

  21. Zhao, L., Iyer, R., Makineni, S., Bhuyan, L.: Anatomy and performance of SSL processing. In: IEEE International Symposium on Performance Analysis of Systems and Software, ISPASS 2005, pp. 197–206 (2005)

    Google Scholar 

  22. Zheng, F., Pan, W., Lin, J., Jing, J., Zhao, Y.: Exploiting the floating-point computing power of gPUs for RSA. In: Chow, S.S.M., Camenisch, J., Hui, L.C.K., Yiu, S.M. (eds.) ISC 2014. LNCS, vol. 8783, pp. 198–215. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Yang, Y., Guan, Z., Sun, H., Chen, Z. (2015). Accelerating RSA with Fine-Grained Parallelism Using GPU. In: Lopez, J., Wu, Y. (eds) Information Security Practice and Experience. ISPEC 2015. Lecture Notes in Computer Science(), vol 9065. Springer, Cham. https://doi.org/10.1007/978-3-319-17533-1_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17533-1_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17532-4

  • Online ISBN: 978-3-319-17533-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics