Skip to main content

Fast Polynomial Inversion for Post Quantum QC-MDPC Cryptography

  • Conference paper
  • First Online:
Cyber Security Cryptography and Machine Learning (CSCML 2020)

Abstract

The NIST PQC standardization project evaluates multiple new designs for post-quantum Key Encapsulation Mechanisms (KEMs). Some of them present challenging tradeoffs between communication bandwidth and computational overheads. An interesting case is the set of QC-MDPC based KEMs. Here, schemes that use the Niederreiter framework require only half the communication bandwidth compared to schemes that use the McEliece framework. However, this requires costly polynomial inversion during the key generation, which is prohibitive when ephemeral keys are used. One example is BIKE, where the BIKE-1 variant uses McEliece and the BIKE-2 variant uses Niederreiter. This paper shows an optimized constant-time polynomial inversion method that makes the computation costs of BIKE-2 key generation tolerable. We report a speedup of \(11.8{\times }\) over the commonly used NTL library, and \(55.5{\times }\) over OpenSSL. We achieve additional speedups by leveraging the latest Intel’s Vector- instructions on a laptop machine, \(14.3{\times }\) over NTL and \(96.8{\times }\) over OpenSSL. With this, BIKE-2 becomes a competitive variant of BIKE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguilar Melchor, C., et al.: Hamming Quasi-Cyclic (HQC) (2017). https://pqc-hqc.org/doc/ hqc-specification_2017-11-30.pdf

  2. Amazon Web Services: s2n (2020). https://github.com/awslabs/s2n. Accessed 16 Feb 2020

  3. Aragon, N., et al.: BIKE: Bit Flipping Key Encapsulation (2017). https://bikesuite.org/files/round2/spec/BIKE-Spec-2019.06.30.1.pdf

  4. Baldi, M., Barenghi, A., Chiaraluce, F., Pelosi, G., Santini, P.: LEDAcrypt (2019). https://www.ledacrypt.org/

  5. Bernstein, D.J., Yang, B.Y.: Fast constant-time GCD computation and modular inversion. IACR Trans. Crypt. Hardw. Embed. Syst. 2019(3), 340–398 (2019). https://doi.org/10.13154/tches.v2019.i3.340-398

  6. Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12678-9_14

    Chapter  Google Scholar 

  7. Chien-Hsing, W., Chien-Ming, W., Shieh, M.-D., Hwang, Y.-T.: High-speed, low-complexity systolic designs of novel iterative division algorithms in \(gf(2^m)\). IEEE Trans. Comput. 53(3), 375–380 (2004). https://doi.org/10.1109/TC.2004.1261843

    Article  Google Scholar 

  8. Drucker, N., Gueron, S.: A toolbox for software optimization of QC-MDPC code-based cryptosystems. J. Crypt. Eng. 9(4), 341–357 (2019). https://doi.org/10.1007/s13389-018-00200-4

    Article  Google Scholar 

  9. Drucker, N., Gueron, S., Kostic, D.: Additional implementation of BIKE. https://bikesuite.org/additional.html (2019)

  10. Drucker, N., Gueron, S., Kostic, D.: QC-MDPC decoders with several shades of gray. Technical report. Report 2019/1423, December 2019. https://eprint.iacr.org/2019/1423

  11. Drucker, N., Gueron, S., Krasnov, V.: Fast multiplication of binary polynomials with the forthcoming vectorized VPCLMULQDQ instruction. In: 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp. 115–119, June 2018. https://doi.org/10.1109/ARITH.2018.8464777

  12. Gueron, S.: October 2018. https://github.com/open-quantum-safe/openssl/issues/42#issuecomment-433452096

  13. Guimar, A., Borin, E., Aranha, D.F., Guimarães, A., Borin, E., Aranha,D.F.: Introducing arithmetic failures to accelerate QC-MDPC code-based cryptography. Code-Based Cryptogr. 2, 44–68 (2019). https://doi.org/10.1007/978-3-030-25922-8

  14. Guimarães, A., Aranha, D.F., Borin, E.: Optimized implementation of QC-MDPC code-based cryptography. Concurr. Comput.: Pract. Exp. 31(18), e5089 (2019). https://doi.org/10.1002/cpe.5089

    Article  Google Scholar 

  15. Hülsing, A., Rijneveld, J., Schanck, J., Schwabe, P.: High-speed key encapsulation from NTRU. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 232–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_12

    Chapter  Google Scholar 

  16. Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF(2\(^{{\rm m}}\)) using normal bases. Inf. Comput. 78(3), 171–177 (1988). https://doi.org/10.1016/0890-5401(88)90024-7

    Article  MathSciNet  MATH  Google Scholar 

  17. McEliece, R.J.: A public-key cryptosystem based on algebraic coding theory. Deep Space Netw. Prog. Rep. 44, 114–116 (1978). https://ui.adsabs.harvard.edu/abs/1978DSNPR..44..114M

  18. Misoczki, R.: BIKE - bit-flipping key encapsulation (2019). https://csrc.nist.gov/CSRC/media/Presentations/bike-round-2-presentation/images-media/bike-misoczki.pdf. Accessed 18 Feb 2020

  19. Niederreiter, H.: Knapsack-type cryptosystems and algebraic coding theory. Prob. Contr. Inform. Theory 15(2), 157–166 (1986). https://ci.nii.ac.jp/naid/80003180051/en/

  20. NIST: Post-Quantum Cryptography (2019). https://csrc.nist.gov/projects/post-quantum-cryptography. Accessed 20 Aug 2019

  21. Open Quantum Safe Project: liboqs (2020). https://github.com/open-quantum-safe/liboqs. Accessed 16 Feb 2020

  22. Pierrick G., Richard Brent, P.Z., Thome, E.: gf2x-1.2, July 2017. https://gforge.inria.fr/projects/gf2x/

  23. Sendrier, N., Vasseur, V.: On the decoding failure rate of QC-MDPC bit-flipping decoders. In: Ding, J., Steinwandt, R. (eds.) PQCrypto 2019. LNCS, vol. 11505, pp. 404–416. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25510-7_22

    Chapter  Google Scholar 

  24. Shoup, V.: Number theory C++ library (NTL) version 11.3.2, November 2018. http://www.shoup.net/ntl

  25. The OpenSSL Project: OpenSSL 1.1.1: The open source toolkit for SSL/TLS. https://github.com/openssl/openssl

Download references

Acknowledgements

This research was partly supported by: NSF-BSF Grant 2018640; The BIU Center for Research in Applied Cryptography and Cyber Security, and the Center for Cyber Law and Policy at the University of Haifa, both in conjunction with the Israel National Cyber Bureau in the Prime Minister’s Office.

We would also like to thank Thorsten Kleinjung for his valuable comments on this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nir Drucker .

Editor information

Editors and Affiliations

Appendices

A Squaring Using and

This appendix describes our C implementation for squaring in \(\mathcal {R}\), using . For brevity, we replace the long names of the C intrinsics with shorter macros as follows.

figure v

When (and not vector- ) is available, the square function is

figure y

When vector- is available, four multiplications can be executed in parallel and the code is

figure aa

The permutation and thus some of the flow’s serialization can be removed by using the instruction.

figure ac

However, our experiments show slower results with this instruction.

Table 6 compares squaring and k-squaring in \(\mathcal {R}\) using our code. Our implementation starts with squaring up to the described threshold and then continues with k-squaring. The threshold depends on the platform.

Table 6. Squaring and k-squaring in \(\mathcal {R}\) using our code. Columns 2 and 3 count cycles, where lower is better (threshold = floor(k-square/square). The r values correspond to the IND-CCA variants of BIKE for Level-1/3/5.

B A \(16 \times 16\) Quad-Words Multiplication Using

This appendix describes the C code of our recursive Karatsuba multiplication.

The function performs four 128-bit Karatsuba multiplications in parallel.

figure af

Then, we define the function that receives two 512-bit  registers (a, b) as input, multiplies them and writes the result into the two registers zh||zl. The function performs several permutations to reorganize the quad-words. The relevant masks are:

figure ai

The variables that are used in this function are: a) , . These hold the lower and upper parts of the 128-bit Karatsuba sub-multiplications; b) , , , , . These are used for the middle term of the 256-bit Karatsuba sub-multiplications; c) , , , , . These are used for middle term of the top 512-bit Karatsuba multiplication; d) that holds all the temporary products to of the middle words.

Define

$$\begin{aligned} AX[i] = a[128(i+1)-1 : 128i] \\ BX[i] = b[128(i+1)-1 : 128i] \\ AY[i]= a[256(i+1)-1 : 256i] \\ BY[i]= b[256(i+1)-1 : 256i] \\ \end{aligned}$$

Then set

$$\begin{aligned}&t[0] = AX1 \oplus AX3|| AX2 \oplus AX3 || AX0 \oplus AX2 || AX0 \oplus AX1 \\&t[1] = BX1 \oplus BX3|| BX2 \oplus BX3 || BX0 \oplus BX2 || BX0 \oplus BX1 \end{aligned}$$

where

$$\begin{aligned}&AX1 \oplus AX3 || AX0 \oplus AX2 = (AX1 || AX0) \oplus (AX3 || AX2) = AY0 \oplus AY1 \\&BX1 \oplus BX3 || BX0 \oplus BX2 = (BX1 || BX0) \oplus (BX3 || BX2) = BY0 \oplus BY1 \end{aligned}$$

and set the lower 128 bits of , to (ignoring the upper bits)

$$\begin{aligned}&t[2][127:0] = AX1 \oplus AX3 \oplus AX0 \oplus AX2 \\&t[3][127:0] = BX1 \oplus BX3 \oplus BX0 \oplus BX2 \end{aligned}$$
figure ba

The implementation invokes three times: a) for calculating the lower and the upper 512-bit words; b) for the two middle 256-bit words; c) for the middle 512-bit word in the top-level Karatsuba. The number of invocations of for the entire is only 9.

figure be

Finally, we complete the four 128-bit Karatsuba by

figure bf

and the 512-bit result is

figure bg

For higher efficiency, our Karatsuba implementation holds the data in  registers in order to save memory operations when invoking .

figure bk

C Fast Permutation

The inverted bit permutation (\(a=map(b)\)) of Sect. 4 can be implemented in a straightforward way as follows. We first convert the map to two maps and , where [i] is the byte index of map(b[i]) and [i] is the position of the relevant bit inside this byte.

figure bp

A simpler way to apply the map is possible if we store every bit in a byte (that has the value or ).

figure bs

This can involve a costly conversion to and from across the representations but fortunately, we can speed it up with AVX512 (when available)

figure bt
figure bu

Note that the instruction can also be used for the latter conversion (see next). This instruction requires the extension while the instruction requires only which is more common.

figure bz

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Drucker, N., Gueron, S., Kostic, D. (2020). Fast Polynomial Inversion for Post Quantum QC-MDPC Cryptography. In: Dolev, S., Kolesnikov, V., Lodha, S., Weiss, G. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2020. Lecture Notes in Computer Science(), vol 12161. Springer, Cham. https://doi.org/10.1007/978-3-030-49785-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-49785-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-49784-2

  • Online ISBN: 978-3-030-49785-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics