Advertisement

Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4

  • Leon BotrosEmail author
  • Matthias J. KannwischerEmail author
  • Peter SchwabeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11627)

Abstract

This paper presents an optimized software implementation of the module-lattice-based key-encapsulation mechanism Kyber for the ARM Cortex-M4 microcontroller. Kyber is one of the round-2 candidates in the NIST post-quantum project. In the center of our work are novel optimization techniques for the number-theoretic transform (NTT) inside Kyber, which make very efficient use of the computational power offered by the “vector” DSP instructions of the target architecture. We also present results for the recently updated parameter sets of Kyber which equally benefit from our optimizations.

As a result of our efforts we present software that is 18% faster than an earlier implementation of Kyber optimized for the Cortex-M4 by the Kyber submitters. Our NTT is more than twice as fast as the NTT in that software. Our software runs at about the same speed as the latest speed-optimized implementation of the other module-lattice based round-2 NIST PQC candidate Saber. However, for our Kyber software, this performance is achieved with a much smaller RAM footprint. Kyber needs less than half of the RAM of what the considerably slower RAM-optimized version of Saber uses. Our software does not make use of any secret-dependent branches or memory access and thus offers state-of-the-art protection against timing attacks.

Keywords

ARM Cortex-M4 Number-theoretic transform Lattice-based cryptography Kyber 

Notes

Acknowledgments

The authors would like to thank Pedro Massolino, Joost Rijneveld, and Ko Stoffelen for their help with obtaining reasonable cycle counts on the ARM Cortex-M4.

References

  1. 1.
    Alagic, G., et al.: Status report on the first round of the NIST post-quantum cryptography standardization process. National Institute of Standards and Technology Internal Report 8240 (2019).  https://doi.org/10.6028/NIST.IR.8240
  2. 2.
    Alkim, E., et al.: NewHope: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://cryptojedi.org/papers/#newhopenist
  3. 3.
    Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange – a new hope. In: Holz, T., Savage, S. (eds.) Proceedings of the 25th USENIX Security Symposium. USENIX Association (2016). https://eprint.iacr.org/2015/1092
  4. 4.
    Alkim, E., Jakubeit, P., Schwabe, P.: NewHope on ARM cortex-M. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 332–349. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49445-6_19. http://cryptojedi.org/papers/#newhopearmCrossRefGoogle Scholar
  5. 5.
    Avanzi, R., et al.: ARM Cortex-M4 optimized implementation of Kyber. https://github.com/pq-crystals/kyber/tree/cm4/cm4. Accessed 07 Mar 2019
  6. 6.
    Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://pq-crystals.org/kyber
  7. 7.
    Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specification and supporting documentation (version 2.0). Submission to the NIST Post-Quantum Cryptography Standardization Project (2019). https://pq-crystals.org/kyber
  8. 8.
    Bertoni, G., Daemen, J., Peeters, M., Assche, G.V.: The Keccak reference. Submission to the NIST SHA-3 competition (round 3) (2011). https://keccak.team/files/Keccak-reference-3.0.pdf
  9. 9.
    Bos, J.W., et al.: CRYSTALS – kyber: A cca-secure module-lattice-based KEM. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 353–367. IEEE (2018). https://eprint.iacr.org/2017/634
  10. 10.
    Bos, J.W., Friedberger, S., Martinoli, M., Oswald, E., Stam, M.: Fly, you fool! Faster Frodo for the ARM Cortex-M4. Cryptology ePrint Archive, Report 2018/1116 (2018). https://eprint.iacr.org/2018/1116
  11. 11.
    de Clercq, R., Roy, S.S., Vercauteren, F., Verbauwhede, I.: Efficient software implementation of ring-LWE encryption. In: Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, pp. 339–344. EDA Consortium (2015). http://eprint.iacr.org/2014/725
  12. 12.
    Cook, S.: On the Minimum Computation Time of Functions. Ph.D. thesis, Harvard University (1966)Google Scholar
  13. 13.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex fourier series. Math. Comput. 19(90), 297–301 (1965). https://www.jstor.org/stable/2003354MathSciNetCrossRefGoogle Scholar
  14. 14.
    Daemen, J., Hoffert, S., Peeters, M., Assche, G.V., Keer, R.V.: eXtended Keccak Code Package. https://github.com/XKCP/XKCP. Accessed 07 Mar 2019
  15. 15.
    Fujisaki, E., Okamoto, T.: Secure integration of asymmetric and symmetric encryption schemes. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 537–554. Springer, Heidelberg (1999).  https://doi.org/10.1007/3-540-48405-1_34CrossRefGoogle Scholar
  16. 16.
    Goldstine, H.H.: A History of Numerical Analysis from the 16th through the 19th Century. Springer, New York (1977).  https://doi.org/10.1007/978-1-4684-9472-3CrossRefzbMATHGoogle Scholar
  17. 17.
    Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-38616-9_5. Document ID: d67aa537a6de60813845a45505c313, http://cryptojedi.org/papers/#lattisignsCrossRefGoogle Scholar
  18. 18.
    Heideman, M.T., Johnson, D.H., Burrus, C.S.: Gauss and the history of the fast fourier transform. IEEE ASSP Mag. 1(4) (1984). http://www.cis.rit.edu/class/simg716/Gauss_History_FFT.pdfCrossRefGoogle Scholar
  19. 19.
    Hülsing, A., Rijneveld, J., Schanck, J.M., Schwabe, P.: NTRU-KEM-HRSS: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://ntru-hrss.org
  20. 20.
    Kannwischer, M.J., Rijneveld, J., Schwabe, P.: Faster multiplication in \(\mathbb{Z}_{2^m}[x]\) on Cortex-M4 to speed up NIST PQC candidates (2018). https://eprint.iacr.org/2018/1018
  21. 21.
    Kannwischer, M.J., Rijneveld, J., Schwabe, P., Stoffelen, K.: PQM4: post-quantum crypto library for the ARM Cortex-M4. https://github.com/mupq/pqm4. Accessed 07 Mar 2019
  22. 22.
    Karatsuba, A., Ofman, Y.: Multiplication of multidigit numbers on automata. Sov. Phys. Dokl. 7, 595–596 (1963). Translated from Doklady Akademii Nauk SSSR, vol. 145, no. 2, pp. 293–294, July 1962. Scanned version on http://cr.yp.to/bib/1963/karatsuba.htmlGoogle Scholar
  23. 23.
    Karmakar, A., Mera, J.M.B., Roy, S.S., Verbauwhede, I.: Saber on ARM CCA-secure module lattice-based key encapsulation on ARM. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(3), 243–266 (2018). https://eprint.iacr.org/2018/682Google Scholar
  24. 24.
    Lyubashevsky, V., Seiler, G.: NTTRU: Truly fast NTRU using NTT. Cryptology ePrint Archive, Report 2019/040 (2019). https://eprint.iacr.org/2019/040
  25. 25.
    Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985). http://www.ams.org/journals/mcom/1985-44-170/S0025-5718-1985-0777282-X/S0025-5718-1985-0777282-X.pdfMathSciNetCrossRefGoogle Scholar
  26. 26.
    National Institute for Standards and Technology: Submission requirements and evaluation criteria for the post-quantum cryptography standardization process (2017). https://csrc.nist.gov/csrc/media/projects/post-quantum-cryptography/documents/call-for-proposals-final-dec-2016.pdf
  27. 27.
    Oder, T., Pöppelmann, T., Güneysu, T.: Beyond ECDSA and RSA: lattice-based digital signatures on constrained devices. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. ACM (2014). https://www.sha.rub.de/media/attachments/files/2014/06/bliss_arm.pdf
  28. 28.
    Pöppelmann, T., Oder, T., Güneysu, T.: High-performance ideal lattice-based cryptography on 8-bit ATxmega microcontrollers. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 346–365. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-22174-8_19. Extended version, https://eprint.iacr.org/2015/382CrossRefGoogle Scholar
  29. 29.
    Saarinen, M.J.O., Bhattacharya, S., Garcia-Morchon, O., Rietman, R., Tolhuizen, L., Zhang, Z.: Shorter messages and faster post-quantum encryption with Round5 on Cortex M. Cryptology ePrint Archive, Report 2018/723 (2018). https://eprint.iacr.org/2018/723
  30. 30.
    Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. Cryptology ePrint Archive, Report 2018/039 (2018). https://eprint.iacr.org/2018/039
  31. 31.
    Reference manual for STM32F405/415, STM32F407/417, STM32F427/437, and STM32F429/439 advanced ARM-based 32-bit MCUs (2019). https://www.st.com/resource/en/reference_manual/dm00031020.pdf
  32. 32.
    Toom, A.L.: The complexity of a scheme of functional elements realizing the multiplication of integers. Sov. Math. Dokl. 3, 714–716 (1963). www.de.ufpe.br/~toom/my-articles/engmat/MULT-E.PDF
  33. 33.
    Zhang, Z., Chen, C., Hoffstein, J., Whyte, W.: NTRUEncrypt: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017). https://csrc.nist.gov/projects/post-quantum-cryptography/round-1-submissions

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Radboud UniversityNijmegenThe Netherlands

Personalised recommendations