Abstract
GPU CUDA has developed quite a lot recent years, especially in the field of high performance computing, as well as asymmetric cryptographic applications. Much of the involved work has been done based on the coarsegrained method, in which each thread within thread blocks does a complete task process respectively. In this paper, we develop a fine-grained parallel approach for Montgomery multiplications, which is much different with previous work. All the threads within the thread block of GPU cooperate to deal with a complete task process. Experiment shows that the approach performs better when the number of tasks to be dealt with is small, and performs more or less equally effectively in other cases. And the acceleration is well reached compared with CPU-based implementation. Also the idea can be adopted in many acceleration applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Kanter, D.: NVIDIA’s GT200: inside a parallel processor (unpublished)
Giorgi, P., Izard, T., Tisserand, A.: Comparison of modular arithmetic algorithms on GPUs. In: Proc. 21st IEEE International Conference on Application-specific Systems Architectures and Processors (ASAP 2010), pp. 192–199. IEEE Press (July 2010), doi:10.1109/ASAP.2010.5541000
Szerwinski, R., Güneysu, T.: Exploiting the Power of GPUs for Asymmetric Cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008), doi: http://dx.doi.org/10.1007/978-3-540-85053-3_6
Fleissner, S.: GPU-Accelerated Montgomery Exponentiation. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 213–220. Springer, Heidelberg (2007), doi:10.1007/978-3-540-72584-8_28
Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proc. the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), pp. 73–82. ACM Press (2008), doi:10.1145/1345206.1345220
Mclvor, C., McLoone, M., McCanny, J.V.: Fast Montgomery modular multiplication and RSA cryptographic processor architectures. In: Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol. 1, pp. 379–384 (November 2003)
Granlund, T.: GNU multiple precision arithmetic library 5.0.1 (2010), http://gmplib.org
Zhao, K.Y.: Implementation of multiple-precision modular multiplication on GPU (unpublished)
Zhao, K.Y., Chu, X.W.: GPUMP: a multiple-precision integer library for GPUs. In: 10th IEEE International Conference on Computer and Information Technology, pp. 1164–1168. IEEE Press (2010), doi:10.1109/CIT.2010.211
NVIDIA, NVIDIA CUDA compute unified device architecture programming guide, ver. 3.0 (2010)
Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44, 519–521 (1985)
Harris, M.: Optimizing CUDA, Tech. report, NVIDIA Corporation (2009)
Harrison, O., Waldron, J.: Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 350–367. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02384-2-22
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag GmbH Berlin Heidelberg
About this chapter
Cite this chapter
Li, T., Li, H., Xiang, J. (2012). A GPU-Based Fine-Grained Parallel Montgomery Multiplication Algorithm. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25766-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-642-25766-7_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25765-0
Online ISBN: 978-3-642-25766-7
eBook Packages: EngineeringEngineering (R0)