Abstract
This paper presents an optimized software implementation of the module-lattice-based key-encapsulation mechanism Kyber for the ARM Cortex-M4 microcontroller. Kyber is one of the round-2 candidates in the NIST post-quantum project. In the center of our work are novel optimization techniques for the number-theoretic transform (NTT) inside Kyber, which make very efficient use of the computational power offered by the “vector” DSP instructions of the target architecture. We also present results for the recently updated parameter sets of Kyber which equally benefit from our optimizations.
As a result of our efforts we present software that is 18% faster than an earlier implementation of Kyber optimized for the Cortex-M4 by the Kyber submitters. Our NTT is more than twice as fast as the NTT in that software. Our software runs at about the same speed as the latest speed-optimized implementation of the other module-lattice based round-2 NIST PQC candidate Saber. However, for our Kyber software, this performance is achieved with a much smaller RAM footprint. Kyber needs less than half of the RAM of what the considerably slower RAM-optimized version of Saber uses. Our software does not make use of any secret-dependent branches or memory access and thus offers state-of-the-art protection against timing attacks.
This work has been supported by the European Commission through the ERC Starting Grant 805031 (EPOQUE). Date: May 10, 2019.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
We also benchmarked our code using the February 2019 release of arm-none-eabi-gcc (8.3.0) which produced the same results.
- 4.
\(2n-1\) coefficients of 2 bytes each.
References
Alagic, G., et al.: Status report on the first round of the NIST post-quantum cryptography standardization process. National Institute of Standards and Technology Internal Report 8240 (2019). https://doi.org/10.6028/NIST.IR.8240
Alkim, E., et al.: NewHope: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://cryptojedi.org/papers/#newhopenist
Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange – a new hope. In: Holz, T., Savage, S. (eds.) Proceedings of the 25th USENIX Security Symposium. USENIX Association (2016). https://eprint.iacr.org/2015/1092
Alkim, E., Jakubeit, P., Schwabe, P.: NewHope on ARM cortex-M. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 332–349. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49445-6_19. http://cryptojedi.org/papers/#newhopearm
Avanzi, R., et al.: ARM Cortex-M4 optimized implementation of Kyber. https://github.com/pq-crystals/kyber/tree/cm4/cm4. Accessed 07 Mar 2019
Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://pq-crystals.org/kyber
Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specification and supporting documentation (version 2.0). Submission to the NIST Post-Quantum Cryptography Standardization Project (2019). https://pq-crystals.org/kyber
Bertoni, G., Daemen, J., Peeters, M., Assche, G.V.: The Keccak reference. Submission to the NIST SHA-3 competition (round 3) (2011). https://keccak.team/files/Keccak-reference-3.0.pdf
Bos, J.W., et al.: CRYSTALS – kyber: A cca-secure module-lattice-based KEM. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 353–367. IEEE (2018). https://eprint.iacr.org/2017/634
Bos, J.W., Friedberger, S., Martinoli, M., Oswald, E., Stam, M.: Fly, you fool! Faster Frodo for the ARM Cortex-M4. Cryptology ePrint Archive, Report 2018/1116 (2018). https://eprint.iacr.org/2018/1116
de Clercq, R., Roy, S.S., Vercauteren, F., Verbauwhede, I.: Efficient software implementation of ring-LWE encryption. In: Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, pp. 339–344. EDA Consortium (2015). http://eprint.iacr.org/2014/725
Cook, S.: On the Minimum Computation Time of Functions. Ph.D. thesis, Harvard University (1966)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex fourier series. Math. Comput. 19(90), 297–301 (1965). https://www.jstor.org/stable/2003354
Daemen, J., Hoffert, S., Peeters, M., Assche, G.V., Keer, R.V.: eXtended Keccak Code Package. https://github.com/XKCP/XKCP. Accessed 07 Mar 2019
Fujisaki, E., Okamoto, T.: Secure integration of asymmetric and symmetric encryption schemes. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 537–554. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_34
Goldstine, H.H.: A History of Numerical Analysis from the 16th through the 19th Century. Springer, New York (1977). https://doi.org/10.1007/978-1-4684-9472-3
Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38616-9_5. Document ID: d67aa537a6de60813845a45505c313, http://cryptojedi.org/papers/#lattisigns
Heideman, M.T., Johnson, D.H., Burrus, C.S.: Gauss and the history of the fast fourier transform. IEEE ASSP Mag. 1(4) (1984). http://www.cis.rit.edu/class/simg716/Gauss_History_FFT.pdf
Hülsing, A., Rijneveld, J., Schanck, J.M., Schwabe, P.: NTRU-KEM-HRSS: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://ntru-hrss.org
Kannwischer, M.J., Rijneveld, J., Schwabe, P.: Faster multiplication in \(\mathbb{Z}_{2^m}[x]\) on Cortex-M4 to speed up NIST PQC candidates (2018). https://eprint.iacr.org/2018/1018
Kannwischer, M.J., Rijneveld, J., Schwabe, P., Stoffelen, K.: PQM4: post-quantum crypto library for the ARM Cortex-M4. https://github.com/mupq/pqm4. Accessed 07 Mar 2019
Karatsuba, A., Ofman, Y.: Multiplication of multidigit numbers on automata. Sov. Phys. Dokl. 7, 595–596 (1963). Translated from Doklady Akademii Nauk SSSR, vol. 145, no. 2, pp. 293–294, July 1962. Scanned version on http://cr.yp.to/bib/1963/karatsuba.html
Karmakar, A., Mera, J.M.B., Roy, S.S., Verbauwhede, I.: Saber on ARM CCA-secure module lattice-based key encapsulation on ARM. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(3), 243–266 (2018). https://eprint.iacr.org/2018/682
Lyubashevsky, V., Seiler, G.: NTTRU: Truly fast NTRU using NTT. Cryptology ePrint Archive, Report 2019/040 (2019). https://eprint.iacr.org/2019/040
Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985). http://www.ams.org/journals/mcom/1985-44-170/S0025-5718-1985-0777282-X/S0025-5718-1985-0777282-X.pdf
National Institute for Standards and Technology: Submission requirements and evaluation criteria for the post-quantum cryptography standardization process (2017). https://csrc.nist.gov/csrc/media/projects/post-quantum-cryptography/documents/call-for-proposals-final-dec-2016.pdf
Oder, T., Pöppelmann, T., Güneysu, T.: Beyond ECDSA and RSA: lattice-based digital signatures on constrained devices. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. ACM (2014). https://www.sha.rub.de/media/attachments/files/2014/06/bliss_arm.pdf
Pöppelmann, T., Oder, T., Güneysu, T.: High-performance ideal lattice-based cryptography on 8-bit ATxmega microcontrollers. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 346–365. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-22174-8_19. Extended version, https://eprint.iacr.org/2015/382
Saarinen, M.J.O., Bhattacharya, S., Garcia-Morchon, O., Rietman, R., Tolhuizen, L., Zhang, Z.: Shorter messages and faster post-quantum encryption with Round5 on Cortex M. Cryptology ePrint Archive, Report 2018/723 (2018). https://eprint.iacr.org/2018/723
Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. Cryptology ePrint Archive, Report 2018/039 (2018). https://eprint.iacr.org/2018/039
Reference manual for STM32F405/415, STM32F407/417, STM32F427/437, and STM32F429/439 advanced ARM-based 32-bit MCUs (2019). https://www.st.com/resource/en/reference_manual/dm00031020.pdf
Toom, A.L.: The complexity of a scheme of functional elements realizing the multiplication of integers. Sov. Math. Dokl. 3, 714–716 (1963). www.de.ufpe.br/~toom/my-articles/engmat/MULT-E.PDF
Zhang, Z., Chen, C., Hoffstein, J., Whyte, W.: NTRUEncrypt: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://csrc.nist.gov/projects/post-quantum-cryptography/round-1-submissions
Acknowledgments
The authors would like to thank Pedro Massolino, Joost Rijneveld, and Ko Stoffelen for their help with obtaining reasonable cycle counts on the ARM Cortex-M4.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Botros, L., Kannwischer, M.J., Schwabe, P. (2019). Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4. In: Buchmann, J., Nitaj, A., Rachidi, T. (eds) Progress in Cryptology – AFRICACRYPT 2019. AFRICACRYPT 2019. Lecture Notes in Computer Science(), vol 11627. Springer, Cham. https://doi.org/10.1007/978-3-030-23696-0_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-23696-0_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23695-3
Online ISBN: 978-3-030-23696-0
eBook Packages: Computer ScienceComputer Science (R0)