Fast Polynomial Inversion for Post Quantum QC-MDPC Cryptography

Drucker, Nir; Gueron, Shay; Kostic, Dusan

doi:10.1007/978-3-030-49785-9_8

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12161))

Included in the following conference series:

International Symposium on Cyber Security Cryptography and Machine Learning

1004 Accesses
11 Citations

Abstract

The NIST PQC standardization project evaluates multiple new designs for post-quantum Key Encapsulation Mechanisms (KEMs). Some of them present challenging tradeoffs between communication bandwidth and computational overheads. An interesting case is the set of QC-MDPC based KEMs. Here, schemes that use the Niederreiter framework require only half the communication bandwidth compared to schemes that use the McEliece framework. However, this requires costly polynomial inversion during the key generation, which is prohibitive when ephemeral keys are used. One example is BIKE, where the BIKE-1 variant uses McEliece and the BIKE-2 variant uses Niederreiter. This paper shows an optimized constant-time polynomial inversion method that makes the computation costs of BIKE-2 key generation tolerable. We report a speedup of $11.8{\times }$ over the commonly used NTL library, and $55.5{\times }$ over OpenSSL. We achieve additional speedups by leveraging the latest Intel’s Vector- instructions on a laptop machine, $14.3{\times }$ over NTL and $96.8{\times }$ over OpenSSL. With this, BIKE-2 becomes a competitive variant of BIKE.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aguilar Melchor, C., et al.: Hamming Quasi-Cyclic (HQC) (2017). https://pqc-hqc.org/doc/ hqc-specification_2017-11-30.pdf
Amazon Web Services: s2n (2020). https://github.com/awslabs/s2n. Accessed 16 Feb 2020
Aragon, N., et al.: BIKE: Bit Flipping Key Encapsulation (2017). https://bikesuite.org/files/round2/spec/BIKE-Spec-2019.06.30.1.pdf
Baldi, M., Barenghi, A., Chiaraluce, F., Pelosi, G., Santini, P.: LEDAcrypt (2019). https://www.ledacrypt.org/
Bernstein, D.J., Yang, B.Y.: Fast constant-time GCD computation and modular inversion. IACR Trans. Crypt. Hardw. Embed. Syst. 2019(3), 340–398 (2019). https://doi.org/10.13154/tches.v2019.i3.340-398
Bos, J.W., Kleinjung, T., Niederhagen, R., Schwabe, P.: ECC2K-130 on cell CPUs. In: Bernstein, D.J., Lange, T. (eds.) AFRICACRYPT 2010. LNCS, vol. 6055, pp. 225–242. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12678-9_14
Chapter Google Scholar
Chien-Hsing, W., Chien-Ming, W., Shieh, M.-D., Hwang, Y.-T.: High-speed, low-complexity systolic designs of novel iterative division algorithms in $gf(2^m)$. IEEE Trans. Comput. 53(3), 375–380 (2004). https://doi.org/10.1109/TC.2004.1261843
Article Google Scholar
Drucker, N., Gueron, S.: A toolbox for software optimization of QC-MDPC code-based cryptosystems. J. Crypt. Eng. 9(4), 341–357 (2019). https://doi.org/10.1007/s13389-018-00200-4
Article Google Scholar
Drucker, N., Gueron, S., Kostic, D.: Additional implementation of BIKE. https://bikesuite.org/additional.html (2019)
Drucker, N., Gueron, S., Kostic, D.: QC-MDPC decoders with several shades of gray. Technical report. Report 2019/1423, December 2019. https://eprint.iacr.org/2019/1423
Drucker, N., Gueron, S., Krasnov, V.: Fast multiplication of binary polynomials with the forthcoming vectorized VPCLMULQDQ instruction. In: 2018 IEEE 25th Symposium on Computer Arithmetic (ARITH), pp. 115–119, June 2018. https://doi.org/10.1109/ARITH.2018.8464777
Gueron, S.: October 2018. https://github.com/open-quantum-safe/openssl/issues/42#issuecomment-433452096
Guimar, A., Borin, E., Aranha, D.F., Guimarães, A., Borin, E., Aranha,D.F.: Introducing arithmetic failures to accelerate QC-MDPC code-based cryptography. Code-Based Cryptogr. 2, 44–68 (2019). https://doi.org/10.1007/978-3-030-25922-8
Guimarães, A., Aranha, D.F., Borin, E.: Optimized implementation of QC-MDPC code-based cryptography. Concurr. Comput.: Pract. Exp. 31(18), e5089 (2019). https://doi.org/10.1002/cpe.5089
Article Google Scholar
Hülsing, A., Rijneveld, J., Schanck, J., Schwabe, P.: High-speed key encapsulation from NTRU. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 232–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_12
Chapter Google Scholar
Itoh, T., Tsujii, S.: A fast algorithm for computing multiplicative inverses in GF(2$^{{\rm m}}$) using normal bases. Inf. Comput. 78(3), 171–177 (1988). https://doi.org/10.1016/0890-5401(88)90024-7
Article MathSciNet MATH Google Scholar
McEliece, R.J.: A public-key cryptosystem based on algebraic coding theory. Deep Space Netw. Prog. Rep. 44, 114–116 (1978). https://ui.adsabs.harvard.edu/abs/1978DSNPR..44..114M
Misoczki, R.: BIKE - bit-flipping key encapsulation (2019). https://csrc.nist.gov/CSRC/media/Presentations/bike-round-2-presentation/images-media/bike-misoczki.pdf. Accessed 18 Feb 2020
Niederreiter, H.: Knapsack-type cryptosystems and algebraic coding theory. Prob. Contr. Inform. Theory 15(2), 157–166 (1986). https://ci.nii.ac.jp/naid/80003180051/en/
NIST: Post-Quantum Cryptography (2019). https://csrc.nist.gov/projects/post-quantum-cryptography. Accessed 20 Aug 2019
Open Quantum Safe Project: liboqs (2020). https://github.com/open-quantum-safe/liboqs. Accessed 16 Feb 2020
Pierrick G., Richard Brent, P.Z., Thome, E.: gf2x-1.2, July 2017. https://gforge.inria.fr/projects/gf2x/
Sendrier, N., Vasseur, V.: On the decoding failure rate of QC-MDPC bit-flipping decoders. In: Ding, J., Steinwandt, R. (eds.) PQCrypto 2019. LNCS, vol. 11505, pp. 404–416. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-25510-7_22
Chapter Google Scholar
Shoup, V.: Number theory C++ library (NTL) version 11.3.2, November 2018. http://www.shoup.net/ntl
The OpenSSL Project: OpenSSL 1.1.1: The open source toolkit for SSL/TLS. https://github.com/openssl/openssl

Download references

Acknowledgements

This research was partly supported by: NSF-BSF Grant 2018640; The BIU Center for Research in Applied Cryptography and Cyber Security, and the Center for Cyber Law and Policy at the University of Haifa, both in conjunction with the Israel National Cyber Bureau in the Prime Minister’s Office.

We would also like to thank Thorsten Kleinjung for his valuable comments on this work.

Author information

Authors and Affiliations

University of Haifa, Haifa, Israel
Nir Drucker & Shay Gueron
Amazon, Seattle, USA
Nir Drucker & Shay Gueron
EPFL Switzerland, Lausanne, Switzerland
Dusan Kostic

Authors

Nir Drucker
View author publications
You can also search for this author in PubMed Google Scholar
Shay Gueron
View author publications
You can also search for this author in PubMed Google Scholar
Dusan Kostic
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nir Drucker .

Editor information

Editors and Affiliations

Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
Shlomi Dolev
School of Computer Science, Georgia Institute of Technology, Atlanta, GA, USA
Vladimir Kolesnikov
Tata Consultancy Services, Chennai, Tamil Nadu, India
Sachin Lodha
Department of Computer Science, Ben-Gurion University of the Negev, Be'er Sheva, Israel
Gera Weiss

Appendices

A Squaring Using and

This appendix describes our C implementation for squaring in $\mathcal {R}$, using . For brevity, we replace the long names of the C intrinsics with shorter macros as follows.

When (and not vector- ) is available, the square function is

When vector- is available, four multiplications can be executed in parallel and the code is

The permutation and thus some of the flow’s serialization can be removed by using the instruction.

However, our experiments show slower results with this instruction.

Table 6 compares squaring and k-squaring in $\mathcal {R}$ using our code. Our implementation starts with squaring up to the described threshold and then continues with k-squaring. The threshold depends on the platform.

Table 6. Squaring and k-squaring in $\mathcal {R}$ using our code. Columns 2 and 3 count cycles, where lower is better (threshold = floor(k-square/square). The r values correspond to the IND-CCA variants of BIKE for Level-1/3/5.

Full size table

B A $16 \times 16$ Quad-Words Multiplication Using

This appendix describes the C code of our recursive Karatsuba multiplication.

The function performs four 128-bit Karatsuba multiplications in parallel.

Then, we define the function that receives two 512-bit registers (a, b) as input, multiplies them and writes the result into the two registers zh||zl. The function performs several permutations to reorganize the quad-words. The relevant masks are:

The variables that are used in this function are: a) , . These hold the lower and upper parts of the 128-bit Karatsuba sub-multiplications; b) , , , , . These are used for the middle term of the 256-bit Karatsuba sub-multiplications; c) , , , , . These are used for middle term of the top 512-bit Karatsuba multiplication; d) that holds all the temporary products to of the middle words.

Define

$$\begin{aligned} AX[i] = a[128(i+1)-1 : 128i] \\ BX[i] = b[128(i+1)-1 : 128i] \\ AY[i]= a[256(i+1)-1 : 256i] \\ BY[i]= b[256(i+1)-1 : 256i] \\ \end{aligned}$$

Then set

$$\begin{aligned}&t[0] = AX1 \oplus AX3|| AX2 \oplus AX3 || AX0 \oplus AX2 || AX0 \oplus AX1 \\&t[1] = BX1 \oplus BX3|| BX2 \oplus BX3 || BX0 \oplus BX2 || BX0 \oplus BX1 \end{aligned}$$

where

$$\begin{aligned}&AX1 \oplus AX3 || AX0 \oplus AX2 = (AX1 || AX0) \oplus (AX3 || AX2) = AY0 \oplus AY1 \\&BX1 \oplus BX3 || BX0 \oplus BX2 = (BX1 || BX0) \oplus (BX3 || BX2) = BY0 \oplus BY1 \end{aligned}$$

and set the lower 128 bits of , to (ignoring the upper bits)

$$\begin{aligned}&t[2][127:0] = AX1 \oplus AX3 \oplus AX0 \oplus AX2 \\&t[3][127:0] = BX1 \oplus BX3 \oplus BX0 \oplus BX2 \end{aligned}$$

The implementation invokes three times: a) for calculating the lower and the upper 512-bit words; b) for the two middle 256-bit words; c) for the middle 512-bit word in the top-level Karatsuba. The number of invocations of for the entire is only 9.

Finally, we complete the four 128-bit Karatsuba by

and the 512-bit result is

For higher efficiency, our Karatsuba implementation holds the data in registers in order to save memory operations when invoking .

C Fast Permutation

The inverted bit permutation ($a=map(b)$) of Sect. 4 can be implemented in a straightforward way as follows. We first convert the map to two maps and , where [i] is the byte index of map(b[i]) and [i] is the position of the relevant bit inside this byte.

A simpler way to apply the map is possible if we store every bit in a byte (that has the value or ).

This can involve a costly conversion to and from across the representations but fortunately, we can speed it up with AVX512 (when available)

Note that the instruction can also be used for the latter conversion (see next). This instruction requires the extension while the instruction requires only which is more common.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Drucker, N., Gueron, S., Kostic, D. (2020). Fast Polynomial Inversion for Post Quantum QC-MDPC Cryptography. In: Dolev, S., Kolesnikov, V., Lodha, S., Weiss, G. (eds) Cyber Security Cryptography and Machine Learning. CSCML 2020. Lecture Notes in Computer Science(), vol 12161. Springer, Cham. https://doi.org/10.1007/978-3-030-49785-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-49785-9_8
Published: 25 June 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-49784-2
Online ISBN: 978-3-030-49785-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Fast Polynomial Inversion for Post Quantum QC-MDPC Cryptography

Abstract

Access this chapter

References

Acknowledgements