Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4

  • Leon BotrosEmail author
  • Matthias J. KannwischerEmail author
  • Peter SchwabeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11627)


This paper presents an optimized software implementation of the module-lattice-based key-encapsulation mechanism Kyber for the ARM Cortex-M4 microcontroller. Kyber is one of the round-2 candidates in the NIST post-quantum project. In the center of our work are novel optimization techniques for the number-theoretic transform (NTT) inside Kyber, which make very efficient use of the computational power offered by the “vector” DSP instructions of the target architecture. We also present results for the recently updated parameter sets of Kyber which equally benefit from our optimizations.

As a result of our efforts we present software that is 18% faster than an earlier implementation of Kyber optimized for the Cortex-M4 by the Kyber submitters. Our NTT is more than twice as fast as the NTT in that software. Our software runs at about the same speed as the latest speed-optimized implementation of the other module-lattice based round-2 NIST PQC candidate Saber. However, for our Kyber software, this performance is achieved with a much smaller RAM footprint. Kyber needs less than half of the RAM of what the considerably slower RAM-optimized version of Saber uses. Our software does not make use of any secret-dependent branches or memory access and thus offers state-of-the-art protection against timing attacks.


ARM Cortex-M4 Number-theoretic transform Lattice-based cryptography Kyber 



The authors would like to thank Pedro Massolino, Joost Rijneveld, and Ko Stoffelen for their help with obtaining reasonable cycle counts on the ARM Cortex-M4.


  1. 1.
    Alagic, G., et al.: Status report on the first round of the NIST post-quantum cryptography standardization process. National Institute of Standards and Technology Internal Report 8240 (2019).
  2. 2.
    Alkim, E., et al.: NewHope: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017).
  3. 3.
    Alkim, E., Ducas, L., Pöppelmann, T., Schwabe, P.: Post-quantum key exchange – a new hope. In: Holz, T., Savage, S. (eds.) Proceedings of the 25th USENIX Security Symposium. USENIX Association (2016).
  4. 4.
    Alkim, E., Jakubeit, P., Schwabe, P.: NewHope on ARM cortex-M. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 332–349. Springer, Cham (2016). Scholar
  5. 5.
    Avanzi, R., et al.: ARM Cortex-M4 optimized implementation of Kyber. Accessed 07 Mar 2019
  6. 6.
    Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017).
  7. 7.
    Avanzi, R., et al.: CRYSTALS-Kyber: algorithm specification and supporting documentation (version 2.0). Submission to the NIST Post-Quantum Cryptography Standardization Project (2019).
  8. 8.
    Bertoni, G., Daemen, J., Peeters, M., Assche, G.V.: The Keccak reference. Submission to the NIST SHA-3 competition (round 3) (2011).
  9. 9.
    Bos, J.W., et al.: CRYSTALS – kyber: A cca-secure module-lattice-based KEM. In: 2018 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 353–367. IEEE (2018).
  10. 10.
    Bos, J.W., Friedberger, S., Martinoli, M., Oswald, E., Stam, M.: Fly, you fool! Faster Frodo for the ARM Cortex-M4. Cryptology ePrint Archive, Report 2018/1116 (2018).
  11. 11.
    de Clercq, R., Roy, S.S., Vercauteren, F., Verbauwhede, I.: Efficient software implementation of ring-LWE encryption. In: Design, Automation & Test in Europe Conference & Exhibition, DATE 2015, pp. 339–344. EDA Consortium (2015).
  12. 12.
    Cook, S.: On the Minimum Computation Time of Functions. Ph.D. thesis, Harvard University (1966)Google Scholar
  13. 13.
    Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex fourier series. Math. Comput. 19(90), 297–301 (1965). Scholar
  14. 14.
    Daemen, J., Hoffert, S., Peeters, M., Assche, G.V., Keer, R.V.: eXtended Keccak Code Package. Accessed 07 Mar 2019
  15. 15.
    Fujisaki, E., Okamoto, T.: Secure integration of asymmetric and symmetric encryption schemes. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 537–554. Springer, Heidelberg (1999). Scholar
  16. 16.
    Goldstine, H.H.: A History of Numerical Analysis from the 16th through the 19th Century. Springer, New York (1977). Scholar
  17. 17.
    Güneysu, T., Oder, T., Pöppelmann, T., Schwabe, P.: Software speed records for lattice-based signatures. In: Gaborit, P. (ed.) PQCrypto 2013. LNCS, vol. 7932, pp. 67–82. Springer, Heidelberg (2013). Document ID: d67aa537a6de60813845a45505c313, Scholar
  18. 18.
    Heideman, M.T., Johnson, D.H., Burrus, C.S.: Gauss and the history of the fast fourier transform. IEEE ASSP Mag. 1(4) (1984). Scholar
  19. 19.
    Hülsing, A., Rijneveld, J., Schanck, J.M., Schwabe, P.: NTRU-KEM-HRSS: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017).
  20. 20.
    Kannwischer, M.J., Rijneveld, J., Schwabe, P.: Faster multiplication in \(\mathbb{Z}_{2^m}[x]\) on Cortex-M4 to speed up NIST PQC candidates (2018).
  21. 21.
    Kannwischer, M.J., Rijneveld, J., Schwabe, P., Stoffelen, K.: PQM4: post-quantum crypto library for the ARM Cortex-M4. Accessed 07 Mar 2019
  22. 22.
    Karatsuba, A., Ofman, Y.: Multiplication of multidigit numbers on automata. Sov. Phys. Dokl. 7, 595–596 (1963). Translated from Doklady Akademii Nauk SSSR, vol. 145, no. 2, pp. 293–294, July 1962. Scanned version on Scholar
  23. 23.
    Karmakar, A., Mera, J.M.B., Roy, S.S., Verbauwhede, I.: Saber on ARM CCA-secure module lattice-based key encapsulation on ARM. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(3), 243–266 (2018). Scholar
  24. 24.
    Lyubashevsky, V., Seiler, G.: NTTRU: Truly fast NTRU using NTT. Cryptology ePrint Archive, Report 2019/040 (2019).
  25. 25.
    Montgomery, P.L.: Modular multiplication without trial division. Math. Comput. 44(170), 519–521 (1985). Scholar
  26. 26.
    National Institute for Standards and Technology: Submission requirements and evaluation criteria for the post-quantum cryptography standardization process (2017).
  27. 27.
    Oder, T., Pöppelmann, T., Güneysu, T.: Beyond ECDSA and RSA: lattice-based digital signatures on constrained devices. In: 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. ACM (2014).
  28. 28.
    Pöppelmann, T., Oder, T., Güneysu, T.: High-performance ideal lattice-based cryptography on 8-bit ATxmega microcontrollers. In: Lauter, K., Rodríguez-Henríquez, F. (eds.) LATINCRYPT 2015. LNCS, vol. 9230, pp. 346–365. Springer, Cham (2015). Extended version, Scholar
  29. 29.
    Saarinen, M.J.O., Bhattacharya, S., Garcia-Morchon, O., Rietman, R., Tolhuizen, L., Zhang, Z.: Shorter messages and faster post-quantum encryption with Round5 on Cortex M. Cryptology ePrint Archive, Report 2018/723 (2018).
  30. 30.
    Seiler, G.: Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography. Cryptology ePrint Archive, Report 2018/039 (2018).
  31. 31.
    Reference manual for STM32F405/415, STM32F407/417, STM32F427/437, and STM32F429/439 advanced ARM-based 32-bit MCUs (2019).
  32. 32.
    Toom, A.L.: The complexity of a scheme of functional elements realizing the multiplication of integers. Sov. Math. Dokl. 3, 714–716 (1963).
  33. 33.
    Zhang, Z., Chen, C., Hoffstein, J., Whyte, W.: NTRUEncrypt: algorithm specification and supporting documentation. Submission to the NIST Post-Quantum Cryptography Standardization Project (2017).

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Radboud UniversityNijmegenThe Netherlands

Personalised recommendations