Abstract
Fast modular multiplication on the state-of-the-art digital signal processor (DSP) is studied in this work. More specifically, Montgomery multiplication over a prime field for an arbitrary 256-bit p is implemented on TMS320C6678 DSP by Texas Instruments. Two implementations optimized for latency and throughput are designed. The implementations are based on the k-bit divided Montgomery modular multiplication algorithm by Kornerup. The algorithm is extended to run two independent Montgomery multiplication in parallel thereby running efficiently on the target DSP by exploiting its symmetric functional units. The proposed implementations are advantageous than the previous implementation proposed by Itoh et al. in terms of latency and throughput. The latency of 0.496 [\(\upmu \)s] of the proposed implementation is only 17% of 2.86 [\(\upmu \)s] for the implementation proposed by Itoh et al. Moreover, the throughput \(4.03 \times 10^6\) [Montgomery multiplication(MM)/s] in the present case is more than \(\times \)10 the value of \(0.37 \times 10^6\) [MM/s] from the previous work.
This work was supported by SECOM Science and Technology Foundation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Many algorithms can support the case that M is odd.
References
Costello, C.: Pairings for beginners. http://www.craigcostello.com.au/pairings/PairingsForBeginners.pdf
Gura, N., Patel, A., Wander, A., Eberle, H., Shantz, S.C.: Comparing elliptic curve cryptography and RSA on 8-bit CPUs. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 119–132. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28632-5_9
Kargl, A., Pyka, S., Seuschek, H.: Fast arithmetic on ATmega128 for elliptic curve cryptography. Cryptology ePrint Archive, Report 2008/442 (2008). http://eprint.iacr.org/2008/442
Szerwinski, R., Güneysu, T.: Exploiting the power of GPUs for asymmetric cryptography. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 79–99. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85053-3_6
Gouvêa, C.P.L., Oliveira, L.B., López, J.: Efficient software implementation of public-key cryptography on sensor networks using the MSP430x microcontroller. J. Cryptograph. Eng. 2(1), 19–29 (2012)
Itoh, K., Takenaka, M., Torii, N., Temma, S., Kurihara, Y.: Fast implementation of public-key cryptography on a DSP TMS320C6201. In: Koç, Ç.K., Paar, C. (eds.) CHES 1999. LNCS, vol. 1717, pp. 61–72. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48059-5_7
Texas Instruments: C66x CPU and instruction set reference guide. http://www.ti.com/lit/ug/sprugh7/sprugh7.pdf
Texas Instruments: AM572x Sitara processors. http://www.tij.co.jp/jp/lit/ds/symlink/am5728.pdf
Kornerup, P.: High-radix modular multiplication for cryptosystems. In: Proceedings of the 11th Symposium on Computer Arithmetic 1993, pp. 277–283. IEEE (1993)
Texas Instruments: C66x DSP cache user’s guide. http://www.ti.com/lit/ug/sprugy8/sprugy8.pdf
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985)
Kaihara, M.E., Takagi, N.: Bipartite modular multiplication. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 201–210. Springer, Heidelberg (2005). https://doi.org/10.1007/11545262_15
Mentens, N., Sakiyama, K., Preneel, B., Verbauwhede, I.: Efficient pipelining for modular multiplication architectures in prime fields. In: Proceedings of the 17th ACM Great Lakes symposium on VLSI, pp. 534–539. ACM (2007)
Fan, J., Sakiyama, K., Verbauwhede, I.: Montgomery modular multiplication algorithm on multi-core systems. In: 2007 IEEE Workshop on Signal Processing Systems, pp. 261–266. IEEE (2007)
Knežević, M., Vercauteren, F., Verbauwhede, I.: Speeding up bipartite modular multiplication. In: Hasan, M.A., Helleseth, T. (eds.) WAIFI 2010. LNCS, vol. 6087, pp. 166–179. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13797-6_12
Koç, Ç.K., Acar, T., Kaliski, B.S.: Analyzing and comparing Montgomery multiplication algorithms. IEEE Micro 16(3), 26–33 (1996)
Recommended elliptic curves for federal government use. http://csrc.nist.gov/groups/ST/toolkit/documents/dss/NISTReCur.pdf
Tenca, A.F., Koç, Ç.K.: A scalable architecture for modular multiplication based on Montgomery’s algorithm. IEEE Trans. Comput. 52(9), 1215–1221 (2003)
Brown, M., Hankerson, D., López, J., Menezes, A.: Software implementation of the NIST elliptic curves over prime fields. In: Naccache, D. (ed.) CT-RSA 2001. LNCS, vol. 2020, pp. 250–265. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45353-9_19
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Miyamoto, E., Sugawara, T., Sakiyama, K. (2018). Efficient Software Implementation of Modular Multiplication in Prime Fields on TI’s DSP TMS320C6678. In: Kang, B., Kim, T. (eds) Information Security Applications. WISA 2017. Lecture Notes in Computer Science(), vol 10763. Springer, Cham. https://doi.org/10.1007/978-3-319-93563-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-93563-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-93562-1
Online ISBN: 978-3-319-93563-8
eBook Packages: Computer ScienceComputer Science (R0)