Skip to main content

A GPU-Based Fine-Grained Parallel Montgomery Multiplication Algorithm

  • Chapter
  • First Online:
Book cover Recent Advances in Computer Science and Information Engineering

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 126))

  • 202 Accesses

Abstract

GPU CUDA has developed quite a lot recent years, especially in the field of high performance computing, as well as asymmetric cryptographic applications. Much of the involved work has been done based on the coarsegrained method, in which each thread within thread blocks does a complete task process respectively. In this paper, we develop a fine-grained parallel approach for Montgomery multiplications, which is much different with previous work. All the threads within the thread block of GPU cooperate to deal with a complete task process. Experiment shows that the approach performs better when the number of tasks to be dealt with is small, and performs more or less equally effectively in other cases. And the acceleration is well reached compared with CPU-based implementation. Also the idea can be adopted in many acceleration applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kanter, D.: NVIDIA’s GT200: inside a parallel processor (unpublished)

    Google Scholar 

  2. Giorgi, P., Izard, T., Tisserand, A.: Comparison of modular arithmetic algorithms on GPUs. In: Proc. 21st IEEE International Conference on Application-specific Systems Architectures and Processors (ASAP 2010), pp. 192–199. IEEE Press (July 2010), doi:10.1109/ASAP.2010.5541000

    Google Scholar 

  3. Szerwinski, R., Güneysu, T.: Exploiting the Power of GPUs for Asymmetric Cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008), doi: http://dx.doi.org/10.1007/978-3-540-85053-3_6

    Chapter  Google Scholar 

  4. Fleissner, S.: GPU-Accelerated Montgomery Exponentiation. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4487, pp. 213–220. Springer, Heidelberg (2007), doi:10.1007/978-3-540-72584-8_28

    Chapter  Google Scholar 

  5. Ryoo, S., Rodrigues, C.I., Baghsorkhi, S.S., Stone, S.S., Kirk, D.B., Hwu, W.M.: Optimization principles and application performance evaluation of a multithreaded GPU using CUDA. In: Proc. the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2008), pp. 73–82. ACM Press (2008), doi:10.1145/1345206.1345220

    Google Scholar 

  6. Mclvor, C., McLoone, M., McCanny, J.V.: Fast Montgomery modular multiplication and RSA cryptographic processor architectures. In: Proc. 37th Asilomar Conf. Signals, Syst. Comput., vol. 1, pp. 379–384 (November 2003)

    Google Scholar 

  7. Granlund, T.: GNU multiple precision arithmetic library 5.0.1 (2010), http://gmplib.org

  8. Zhao, K.Y.: Implementation of multiple-precision modular multiplication on GPU (unpublished)

    Google Scholar 

  9. Zhao, K.Y., Chu, X.W.: GPUMP: a multiple-precision integer library for GPUs. In: 10th IEEE International Conference on Computer and Information Technology, pp. 1164–1168. IEEE Press (2010), doi:10.1109/CIT.2010.211

    Google Scholar 

  10. NVIDIA, NVIDIA CUDA compute unified device architecture programming guide, ver. 3.0 (2010)

    Google Scholar 

  11. Montgomery, P.L.: Modular multiplication without trial division. Mathematics of Computation 44, 519–521 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  12. Harris, M.: Optimizing CUDA, Tech. report, NVIDIA Corporation (2009)

    Google Scholar 

  13. Harrison, O., Waldron, J.: Efficient Acceleration of Asymmetric Cryptography on Graphics Hardware. In: Preneel, B. (ed.) AFRICACRYPT 2009. LNCS, vol. 5580, pp. 350–367. Springer, Heidelberg (2009), doi:10.1007/978-3-642-02384-2-22

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tieniu Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag GmbH Berlin Heidelberg

About this chapter

Cite this chapter

Li, T., Li, H., Xiang, J. (2012). A GPU-Based Fine-Grained Parallel Montgomery Multiplication Algorithm. In: Qian, Z., Cao, L., Su, W., Wang, T., Yang, H. (eds) Recent Advances in Computer Science and Information Engineering. Lecture Notes in Electrical Engineering, vol 126. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25766-7_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25766-7_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25765-0

  • Online ISBN: 978-3-642-25766-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics