Abstract
The NIST PQC standardization project evaluates multiple new designs for post-quantum Key Encapsulation Mechanisms (KEMs). Some of them present challenging tradeoffs between communication bandwidth and computational overheads. An interesting case is the set of QC-MDPC based KEMs. Here, schemes that use the Niederreiter framework require only half the communication bandwidth compared to schemes that use the McEliece framework. However, this requires costly polynomial inversion during the key generation, which is prohibitive when ephemeral keys are used. One example is BIKE, where the BIKE-1 variant uses McEliece and the BIKE-2 variant uses Niederreiter. This paper shows an optimized constant-time polynomial inversion method that makes the computation costs of BIKE-2 key generation tolerable. We report a speedup of \(11.8{\times }\) over the commonly used NTL library, and \(55.5{\times }\) over OpenSSL. We achieve additional speedups by leveraging the latest Intel’s Vector- instructions on a laptop machine, \(14.3{\times }\) over NTL and \(96.8{\times }\) over OpenSSL. With this, BIKE-2 becomes a competitive variant of BIKE.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aguilar Melchor, C., et al.: Hamming Quasi-Cyclic (HQC) (2017). https://pqc-hqc.org/doc/ hqc-specification_2017-11-30.pdf
Amazon Web Services: s2n (2020). https://github.com/awslabs/s2n. Accessed 16 Feb 2020
Aragon, N., et al.: BIKE: Bit Flipping Key Encapsulation (2017). https://bikesuite.org/files/round2/spec/BIKE-Spec-2019.06.30.1.pdf
Baldi, M., Barenghi, A., Chiaraluce, F., Pelosi, G., Santini, P.: LEDAcrypt (2019). https://www.ledacrypt.org/
Bernstein, D.J., Yang, B.Y.: Fast constant-time GCD computation and modular inversion. IACR Trans. Crypt. Hardw. Embed. Syst. 2019(3), 340–398 (2019). https://doi.org/10.13154/tches.v2019.i3.340-398
Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12678-9_14
Chien-Hsing, W., Chien-Ming, W., Shieh, M.-D., Hwang, Y.-T.: High-speed, low-complexity systolic designs of novel iterative division algorithms in \(gf(2^m)\). IEEE Trans. Comput. 53(3), 375–380 (2004). https://doi.org/10.1109/TC.2004.1261843
Drucker, N., Gueron, S.: A toolbox for software optimization of QC-MDPC code-based cryptosystems. J. Crypt. Eng. 9(4), 341–357 (2019). https://doi.org/10.1007/s13389-018-00200-4
Drucker, N., Gueron, S., Kostic, D.: Additional implementation of BIKE. https://bikesuite.org/additional.html (2019)
Drucker, N., Gueron, S., Kostic, D.: QC-MDPC decoders with several shades of gray. Technical report. Report 2019/1423, December 2019. https://eprint.iacr.org/2019/1423
Drucker, N., Gueron, S., Krasnov, V.: Fast multiplication of binary polynomials with the forthcoming vectorized VPCLMULQDQ instruction. In: 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp. 115–119, June 2018. https://doi.org/10.1109/ARITH.2018.8464777
Gueron, S.: October 2018. https://github.com/open-quantum-safe/openssl/issues/42#issuecomment-433452096
Guimar, A., Borin, E., Aranha, D.F., Guimarães, A., Borin, E., Aranha,D.F.: Introducing arithmetic failures to accelerate QC-MDPC code-based cryptography. Code-Based Cryptogr. 2, 44–68 (2019). https://doi.org/10.1007/978-3-030-25922-8
Guimarães, A., Aranha, D.F., Borin, E.: Optimized implementation of QC-MDPC code-based cryptography. Concurr. Comput.: Pract. Exp. 31(18), e5089 (2019). https://doi.org/10.1002/cpe.5089
Hülsing, A., Rijneveld, J., Schanck, J., Schwabe, P.: High-speed key encapsulation from NTRU. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 232–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_12
Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF(2\(^{{\rm m}}\)) using normal bases. Inf. Comput. 78(3), 171–177 (1988). https://doi.org/10.1016/0890-5401(88)90024-7
McEliece, R.J.: A public-key cryptosystem based on algebraic coding theory. Deep Space Netw. Prog. Rep. 44, 114–116 (1978). https://ui.adsabs.harvard.edu/abs/1978DSNPR..44..114M
Misoczki, R.: BIKE - bit-flipping key encapsulation (2019). https://csrc.nist.gov/CSRC/media/Presentations/bike-round-2-presentation/images-media/bike-misoczki.pdf. Accessed 18 Feb 2020
Niederreiter, H.: Knapsack-type cryptosystems and algebraic coding theory. Prob. Contr. Inform. Theory 15(2), 157–166 (1986). https://ci.nii.ac.jp/naid/80003180051/en/
NIST: Post-Quantum Cryptography (2019). https://csrc.nist.gov/projects/post-quantum-cryptography. Accessed 20 Aug 2019
Open Quantum Safe Project: liboqs (2020). https://github.com/open-quantum-safe/liboqs. Accessed 16 Feb 2020
Pierrick G., Richard Brent, P.Z., Thome, E.: gf2x-1.2, July 2017. https://gforge.inria.fr/projects/gf2x/
Sendrier, N., Vasseur, V.: On the decoding failure rate of QC-MDPC bit-flipping decoders. In: Ding, J., Steinwandt, R. (eds.) PQCrypto 2019. LNCS, vol. 11505, pp. 404–416. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25510-7_22
Shoup, V.: Number theory C++ library (NTL) version 11.3.2, November 2018. http://www.shoup.net/ntl
The OpenSSL Project: OpenSSL 1.1.1: The open source toolkit for SSL/TLS. https://github.com/openssl/openssl
Acknowledgements
This research was partly supported by: NSF-BSF Grant 2018640; The BIU Center for Research in Applied Cryptography and Cyber Security, and the Center for Cyber Law and Policy at the University of Haifa, both in conjunction with the Israel National Cyber Bureau in the Prime Minister’s Office.
We would also like to thank Thorsten Kleinjung for his valuable comments on this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Squaring Using and
This appendix describes our C implementation for squaring in \(\mathcal {R}\), using . For brevity, we replace the long names of the C intrinsics with shorter macros as follows.
When (and not vector- ) is available, the square function is
When vector- is available, four multiplications can be executed in parallel and the code is
The permutation and thus some of the flow’s serialization can be removed by using the instruction.
However, our experiments show slower results with this instruction.
Table 6 compares squaring and k-squaring in \(\mathcal {R}\) using our code. Our implementation starts with squaring up to the described threshold and then continues with k-squaring. The threshold depends on the platform.
B A \(16 \times 16\) Quad-Words Multiplication Using
This appendix describes the C code of our recursive Karatsuba multiplication.
The function performs four 128-bit Karatsuba multiplications in parallel.
Then, we define the function that receives two 512-bit  registers (a, b) as input, multiplies them and writes the result into the two registers zh||zl. The function performs several permutations to reorganize the quad-words. The relevant masks are:
The variables that are used in this function are: a) , . These hold the lower and upper parts of the 128-bit Karatsuba sub-multiplications; b) , , , , . These are used for the middle term of the 256-bit Karatsuba sub-multiplications; c) , , , , . These are used for middle term of the top 512-bit Karatsuba multiplication; d) that holds all the temporary products to of the middle words.
Define
Then set
where
and set the lower 128 bits of , to (ignoring the upper bits)
The implementation invokes three times: a) for calculating the lower and the upper 512-bit words; b) for the two middle 256-bit words; c) for the middle 512-bit word in the top-level Karatsuba. The number of invocations of for the entire is only 9.
Finally, we complete the four 128-bit Karatsuba by
and the 512-bit result is
For higher efficiency, our Karatsuba implementation holds the data in  registers in order to save memory operations when invoking .
C Fast Permutation
The inverted bit permutation (\(a=map(b)\)) of Sect. 4 can be implemented in a straightforward way as follows. We first convert the map to two maps and , where [i] is the byte index of map(b[i]) and [i] is the position of the relevant bit inside this byte.
A simpler way to apply the map is possible if we store every bit in a byte (that has the value or ).
This can involve a costly conversion to and from across the representations but fortunately, we can speed it up with AVX512 (when available)
Note that the instruction can also be used for the latter conversion (see next). This instruction requires the extension while the instruction requires only which is more common.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Drucker, N., Gueron, S., Kostic, D. (2020). Fast Polynomial Inversion for Post Quantum QC-MDPC Cryptography. In: Dolev, S., Kolesnikov, V., Lodha, S., Weiss, G. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2020. Lecture Notes in Computer Science(), vol 12161. Springer, Cham. https://doi.org/10.1007/978-3-030-49785-9_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-49785-9_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49784-2
Online ISBN: 978-3-030-49785-9
eBook Packages: Computer ScienceComputer Science (R0)