Abstract
RSA is an algorithm widely used in protecting the key exchange between two parties for secure mobile and wireless communication. Modular exponentiation is the main operation involved in RSA, which is very time consuming when the bit-size is large, usually in the range of 1024-bit to 4096-bit. The speed performance of RSA comes to concerns when thousands or millions of authentication requests are needed to handle by the server at a time, through a massive number of connected mobile and wireless devices. The performance of RSA can be improved by utilizing parallel computing architecture or enhancing existing modular exponentiation algorithm. In this paper, we exploit the massively parallel architecture in GPU to perform RSA computations. Various optimization techniques were proposed in this paper to achieve higher throughput in RSA computation in two GPU platforms. Moreover, we also incorporated signed-digit recoding to further improve the performance. To allow a fair comparison with existing implementation techniques, we proposed to evaluate the speed performance in the best case (least ‘0’ in exponent bits), average case (random exponent bits) and worse case (all ‘1’ in exponent bits). The overall throughput achieved by our implementation is about 12% higher in random exponent bits and 50% higher in all 1’s exponent bits compared to the implementation without signed-digit recoding technique. Our implementation is able to achieve 17713 and 89043 1024-bit modular exponentiation per second on random exponent bits in GTX 960 M and GTX 1080, which represent the two state of the art GPU architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Neves S, Araujo, F (2011) On the performance of GPU public-key cryptography. In: 2nd IEEE international conference on application-specific systems, architectures and processors, ASAP 2011
Leboeuf K, Muscedere R, Ahmadi M (2013) A GPU implementation of the montgomery multiplication algorithm for elliptic curve cryptography. In: 2013 IEEE international symposium on circuits and systems (ISCAS 2013) (2013)
Emmart N, Weems C (2015) Pushing the performance envelope of modular exponentiation across multiple generations of GPUs. In: 2015 IEEE international parallel and distributed processing symposium
Emmart N, Luitjens J, Weems C, Woolley C (2016) Optimizing modular multiplication for NVIDIA’s maxwell GPUs. In: 2016 IEEE 23nd symposium on computer arithmetic (ARITH) (2016)
Wu C-L, Lou D-C, Chang T-J (2008) An efficient montgomery exponentiation algorithm for public-key cryptosystems. In: 2008 IEEE international conference on intelligence and security informatics
Savas E, Koc C (2000) The montgomery modular inverse-revisited. IEEE Trans Comput 49:763–766
Montgomery P (1985) Modular multiplication without trial division. Math. Comput. 44:519
Koc CK, Acar T, Kaliski B (1996) Analyzing and comparing montgomery multiplication algorithms. IEEE Micro 16:26–33
Acknowledgements
This research is partially supported by the Malaysia Ministry of Science, Technology & Innovation (MOSTI) eScience fund 01-02-11-SF0201 and 01-02-11-SF0202.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media Singapore
About this paper
Cite this paper
Wong, XF., Goi, BM., Lee, WK., Phan, R.CW. (2018). Speeding up the Montgomery Exponentiation with CMM-SDR Over GPU with Maxwell and Pascal Architecture. In: Kim, K., Joukov, N. (eds) Mobile and Wireless Technologies 2017. ICMWT 2017. Lecture Notes in Electrical Engineering, vol 425. Springer, Singapore. https://doi.org/10.1007/978-981-10-5281-1_9
Download citation
DOI: https://doi.org/10.1007/978-981-10-5281-1_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5280-4
Online ISBN: 978-981-10-5281-1
eBook Packages: EngineeringEngineering (R0)