Keywords

1 Introduction

Some classes of encrypted data must remain confidential for a long period of time – often at least few decades in national security applications. Therefore high-security cryptography should be resistant to attacks even with projected future technologies. As there are no physical or theoretical barriers preventing progressive development of quantum computing technologies capable of breaking current RSA- and Elliptic Curve based cryptographic standards (using polynomial-time quantum algorithms already known [37, 42]), a need for such quantum-resistant algorithms in national security applications has been identified [33].

In December 2016 NIST issued a standardization call for quantum-resistant public key algorithms, together with requirements and evaluation criteria [32]. This has made “Post-Quantum Cryptography” (PQC) central to cryptographic engineers who must now design concrete proposals for standardization. Practical issues such as performance, reliability, message and key sizes, implementation and side-channel security, and compatibility with existing and anticipated applications, protocols, and standards are as relevant as mere theoretical security and asymptotic feasibility when evaluating these proposals.

Ring-LWE lattice primitives offer some of the best performance and key size characteristics among quantum-resistant candidates [16]. These algorithms rely on “random noise” for security and always have some risk of decryption failure. This reliability issue can pose problems when used in non-interactive applications which are not designed to tolerate errors. The issue of decryption failure can be addressed via reconciliation methods, which is the focus of present work.

Structure of This Paper and Our Contributions. Section 2 provides a practical introduction to Ring-LWE Key Exchange and prior work on reconciliation. Section 3 introduces our new reconciliation techniques, together with detailed analysis. Section 4 discusses design, analysis, and implementation of XE5, a simple constant-time error correction code suitable for Ring-LWE. Section 5 contains the specification and implementation benchmarks for our instantiation HILA5, designed to meet the NIST PQC criteria at high security level. We conclude in Sect. 6. Additional algorithmic listings are provided in Appendix A.

2 Ring-LWE Key Exchange and Key Encapsulation

Notation and Basic Properties. Reduction \(x~\bmod ~q\) puts a number in non-negative range \(0 \le x < q\). We write the rounding function as \(\lfloor x \rceil = \lfloor x + \frac{1}{2} \rfloor \).

Let \(\mathcal {R}\) be a ring with elements \(\mathbf {v} \in \mathbb {Z}_q^n\). Its coefficients \(v_i \in [0, q-1]\) (\(0 \le i < n\)) can be interpreted as a polynomial via \(v(x) = \sum _{i=0}^{n-1} v_i x^i\), or as a zero-indexed vector. Addition, subtraction, and scaling (scalar multiplication with c) follow the basic rules for polynomials or vectors with coefficients in \(\mathbb {Z}_q\).

For multiplication in \(\mathcal {R}\) we use cyclotomic polynomial basis \(\mathbb {Z}_q[x] / (x^n+1)\). Products are reduced modulo q and \(x^n+1\) and results are bound by degree \(n-1\) since \(x^n \equiv q-1\) in \( \mathcal {R}\). We may write a direct wrap-around multiplication rule:

$$\begin{aligned} \mathbf {h} = \mathbf {f} * \mathbf {g} ~\bmod ~( x^n + 1 )~ \iff h_i = \sum _{j=0}^i f_{j}g_{(i-j)}-\sum _{j=i+1}^{n-1} f_{j}g_{(n+i-j)}. \end{aligned}$$
(1)

Algorithmically the multiplication rule of Eq. 1 requires \(O(n^2)\) elementary operations. However, there is an \(O(n \log n)\) method using the Number Theoretic Transform (NTT), originally from Nussbaumer [34]. For efficient NTT implementation n should be a power of two and q a small prime, with \(2n ~|~ q-1\).

Definition 1

(Informal). With all distributions and computations in ring \(\mathcal {R}\), let \(\mathbf {s},\mathbf {e}\) be elements randomly chosen from some non-uniform distribution \(\chi \), and \(\mathbf {g}\) be a uniformly random public value. Determining \(\mathbf {s}\) from \((\mathbf {g}, \mathbf {g} * \mathbf {s} + \mathbf {e})\) in ring \(\mathcal {R}\) is the (Normal Form Search) Ring Learning With Errors (\(\text {RLWE}_{\mathcal {R},\chi }\)) problem.

Typically, \(\chi \) is chosen so that each coefficient is a Discrete Gaussian or from some other “Bell-Shaped” distribution that is relatively tightly concentrated around zero. The hardness of the problem is a function of n, q, and \(\chi \).Footnote 1

2.1 Noisy Diffie-Hellman in a Ring

A key exchange method analogous to Diffie-Hellman can be constructed in \(\mathcal {R}\) in a straightforward manner, as first described in [1, 35]. Let \(\mathbf {g} {\mathop {\leftarrow }\limits ^{\$}} \mathcal {R}\) be a uniformly random common parameter (“generator”), and \(\chi \) a non-uniform distribution.

figure a

We see that the way messages \(\mathbf {A}, \mathbf {B}\) are generated makes the security of the scheme equivalent to Definition 1. This commutative scheme “almost” works like Diffie-Hellman because the shared secrets only approximately agree; \(\mathbf {x} \approx \mathbf {y}\). Since the ring \(\mathcal {R}\) is commutative, substituting \(\mathbf {A}\) and \(\mathbf {B}\) gives

$$\begin{aligned} \mathbf {x} = (\mathbf {g} * \mathbf {b} + \mathbf {e'}) * \mathbf {a}&= \mathbf {g} * \mathbf {a} * \mathbf {b} + \mathbf {e'} * \mathbf {a} \end{aligned}$$
(2)
$$\begin{aligned} \mathbf {y} = (\mathbf {g} * \mathbf {a} + \mathbf {e}) * \mathbf {b}&= \mathbf {g} * \mathbf {a} * \mathbf {b} + \mathbf {e} * \mathbf {b}. \end{aligned}$$
(3)

The distance \(\varDelta \) therefore consists only of products of “noise” parameters:

$$\begin{aligned} \varDelta = \mathbf {x} - \mathbf {y} = \mathbf {e'} * \mathbf {a} - \mathbf {e} * \mathbf {b}. \end{aligned}$$
(4)

We observe that each of \(\{\mathbf {a}, \mathbf {b}, \mathbf {e}, \mathbf {e'} \}\) in \(\varDelta \) are picked independently from \(\chi \), which should be relatively “small’ and zero-centered. The coefficients of both \(\mathbf {x}\) and \(\mathbf {y}\) are dominated by common, uniformly distributed factor \(\mathbf {g} * \mathbf {a} * \mathbf {b} \approx \mathbf {x} \approx \mathbf {y}\). Up to n shared bits can be decoded from coefficients of \(\mathbf {x}\) and \(\mathbf {y}\) by a simple binary classifier such as \(\lfloor \frac{2x_i}{q} \rfloor \approx \lfloor \frac{2y_i}{q} \rfloor \). This type of generation will generate some disagreeing bits due to error \(\varDelta \), however. Furthermore, the output of the classifier is slightly biased when q is odd. This is why additional steps are required.

2.2 Reconciliation

Let \(\mathbf {x} \approx \mathbf {y}\) be two vectors in \(\mathbb {Z}^n_q\) with a relatively small difference in each coefficient; the distribution of the distance \(\delta _i = x_i - y_i\) is strongly centered around zero. In reconciliation, we wish the holders of \(\mathbf {x}\) and \(\mathbf {y}\) (Alice and Bob, respectively) to be able to arrive at exactly the same shared secret (key) \(\mathbf {k}\) with a small amount of communication \(\mathbf {c}\). However, single-message reconciliation can also be described simply as a part of an encryption algorithm (not a protocol).Footnote 2

Peikert’s Reconciliation and BCNS Instantiation. In Peikert’s reconciliation for odd modulus [36], Bob first generates a randomization vector \(\mathbf {r}\) such that each \(r_i \in \{0,\pm 1\}\) is uniform modulo two. Bob can then determine the public reconciliation \(\mathbf {c}\) and shared secret \(\mathbf {k}\) via

$$\begin{aligned} c_i = \left\lfloor \frac{2(2y_i - r_i)}{q} \right\rfloor \bmod 2 ~~ ~~ k_i = \left\lfloor \frac{2y_i - r_i}{q} \right\rceil \bmod 2. \end{aligned}$$
(5)

We define disjoint helper sets \(I_0=[0,\lfloor \frac{q}{2} \rfloor ]\) and \(I_1 = [-\lfloor \frac{q}{2} \rfloor , -1 ]\) and \(E = [-\frac{q}{4}, \frac{q}{4})\). Alice uses \(\mathbf {x}\) to arrive at the shared secret \(\mathbf {k'} = \mathbf {k}\) via

$$\begin{aligned} k'_i = \left\{ \begin{array}{rl} 0, &{} \text {if } 2x_i \in I_{c_i} + E ~\bmod 2q \\ 1, &{} \text {otherwise.} \end{array} \right. \end{aligned}$$
(6)

This mechanism is illustrated in Fig. 1. Peikert’s reconciliation was adopted for the Internet-oriented “BCNS” instantiation [14], which has a vanishingly small failure probability; \(Pr(\mathbf {k'} \ne \mathbf {k}) < 2^{-16384}\).

Fig. 1.
figure 1

Simplified view of Peikert’s original reconciliation mechanism [36], ignoring randomized rounding. Alice and Bob have points \(x \approx y \in \mathbb {Z}_q\) that are close to each other. Bob uses y to choose k and c as shown on left, and transmits c to Alice. Alice can use xc to always arrive at the same shared bit \(k'\) if \(|x-y| < \frac{q}{8}\), as shown on right. Without randomized smoothing the two halves \(k=0\) and \(k=1\) have an area of unequal size (when q is an odd prime) and the resulting key will be slightly biased.

New Hope Variants. “New Hope” is a prominent, more recent instantiation of Peikert’s key exchange scheme [5]. New Hope is parametrized at \(n=1024\), yet produces a 256-bit secret key k. This allowed the designers to develop a relatively complex reconciliation mechanism that uses \(\frac{1024}{256} = 4\) coefficients of \(\mathbf {x}\) and \(2 * 4 = 8\) bits of reconciliation information to reach \(< 2^{-60}\) failure rate.

In a follow-up paper [4] the New Hope authors let Bob unilaterally choose the secret key, and significantly simplified their approach. This version also uses four coefficients, but requires \(3 * 4 = 12\) bits of reconciliation (or “ciphertext”) information per key bit. The total failure probability is the same \(< 2^{-60}\).

Security Level and Failure Probability. Note that despite having a higher failure probability, the security level of New Hope (Sect. 2.2) is higher than that of BCNS (Sect. 2.2). Security of RLWE is closely related to the entropy and deviation of noise distribution \(\chi \) in relation to modulus q. Higher noise ratio increases security against attacks, but also increases failure probability [3]. This is a fundamental trade-off in all Ring-LWE schemes.

2.3 Formalization as a KEM

Following the NIST call [32] and Peikert [36], such a scheme can be formalized as a Key Encapsulation Mechanism (KEM), which consists of three algorithms:

  • \((\mathsf {PK}, \mathsf {SK}) \leftarrow \mathsf {KeyGen}()\). Generate a public key \(\mathsf {PK}\) and a secret key \(\mathsf {SK}\) (pair).

  • \((\mathsf {CT}, \mathsf {K}) \leftarrow \mathsf {Encaps}(\mathsf {PK})\). Encapsulate a (random) key \(\mathsf {K}\) in ciphertext \(\mathsf {CT}\).

  • \(\mathsf {K} \leftarrow \mathsf {Decaps}(\mathsf {SK}, \mathsf {CT})\). Decapsulate shared key \(\mathsf {K}\) from \(\mathsf {CT}\) with \(\mathsf {SK}\).

In this model, reconciliation data is a part of ciphertext produced by \(\mathsf {Encaps}\). The three KEM algorithms constitute a natural single-roundtrip key exchange:

figure b

Even though a KEM cannot encrypt per se, a hybrid set-up that uses a KEM to determine random shared keys for message payload confidentiality (symmetric encryption) and integrity (via a message authentication code) is usually preferable to using asymmetric encryption directly on payload [18].

NIST requires at least IND-CPA [9] security from such a scheme. For a KEM without “plaintext”, this essentially means that valid \((\mathsf {PK}, \mathsf {CT}, \mathsf {K})\) triplets are computationally indistinguishable from \((\mathsf {PK}, \mathsf {CT}, \mathsf {K'})\), where \(\mathsf {K'}\) is random.

3 New Reconciliation Method

We define a simpler, deterministic key and reconciliation bit generation rule from Bob’s share \(\mathbf {y}\) to be

$$\begin{aligned} k_i = \left\lfloor \frac{2 y_i}{q} \right\rfloor ~~ \text {and} ~~ c_i = \left\lfloor \frac{4 y_i}{q} \right\rfloor \bmod 2. \end{aligned}$$
(7)

Input \(y_i\) can be assumed to be uniform in range \([0,q-1]\). If taken in this plain form, the generator is slightly biased towards zero, since the interval for \(k_i=0\), \([0, \lfloor \frac{q}{2} \rfloor ]\) is 1 larger than the interval \([\lceil \frac{q}{2} \rceil , q-1]\) for \(k_i=1\) when q is odd.

Fig. 2.
figure 2

We use \(k=\lfloor \frac{2y}{2} \rfloor \) (\(k=1\) on left half) instead of signed rounding \(k=\lfloor \frac{2y}{2} + \epsilon \rceil \) (\(k=1\) in lower half) of Peikert (Fig. 1). Illustration on the left gives intuition for the simple key bit selection and SafeBits without reconciliation. Bob uses window parameter b to select “safe” bits \(d=1\) which are farthest away from the negative (\(k=1\))/positive (\(k=0\)) threshold. The bit selection d is sent to Alice, who then chooses the same bits as part of the shared secret \(k'\). On right, safe bit selection when reconciliation bits c are used; this doubles the SafeBits “area”. Each section constitutes a fraction \(\frac{2b+1}{q}\), so bits are unbiased. However the number of shared bits is not constant.

Intuition: Selecting Safe Bits (without Reconciliation). Let’s assume that we don’t need all n bits given by the ring dimension. There is a straight-forward strategy for Bob to select m indexes in \(\mathbf {y}\) that are most likely to agree. These safe coefficients are those that are closest to center points of \(k=0\) and \(k=1\) ranges, which in this case are \(\frac{q}{4}\) and \(\frac{3q}{4}\), respectively. Bob may choose a boundary window b, which defines shared bits to be used, and then communicate his binary selection vector \(\mathbf {d}\) to Alice:

$$\begin{aligned} d_i = \left\{ \begin{array}{rl} 1 &{} \text {if } y_i \in \left[ \lfloor \frac{q}{4} \rceil - b, \lfloor \frac{q}{4} \rceil + b \right] ~~\text {or}~~ y_i \in \left[ \lfloor \frac{3q}{4} \rceil - b, \lfloor \frac{3q}{4} \rceil + b \right] \\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(8)

This simple case is illustrated on left side of Fig. 2.

Since \(\mathbf {y}\) is uniform in \(\mathbb {Z}^n_q\), the Hamming weight of \(\mathbf {d} = \mathsf {SafeBits}(\mathbf {y})\) satisfies \(\mathsf {Wt}(\mathbf {d}) = \sum _{i=1}^{n-1} d_i \approx \frac{4b+2}{q} n\). Note that if not enough bits for the required payload can be obtained with bound b, Bob should re-randomize \(\mathbf {y}\) rather than raising b as that can have an unexpected effect on failure rate. If there are too many selection bits for desired payload, one can just ignore them.

Importantly, both partitions are of equal size \(2b+1\) and therefore k is unbiased if there are no bit failures. If Alice also uses the simple rule \(k'_i = \lfloor \frac{2 x_i}{q} \rfloor \) to derive key bits (without \(c_i\)), the distance between shares must be at least \(|x_i - y_i| > \frac{q}{4}-b\) for a bit error to occur.

3.1 Even Safer Bits via Peikert’s Reconciliation

Let Bob use Eq. 7 to determine his private key bits \(k_i\) and reconciliation bits \(c_i\). Bob also uses a new \(\mathbf {d} = \mathsf {SafeBits}(\mathbf {y}, b)\) function that accounts for Peikert-style reconciliation via

$$\begin{aligned} d_i = \left\{ \begin{array}{rl} 1 &{} \text {if } | (y_i \bmod \lfloor \frac{q}{4} \rceil ) - \lfloor \frac{q}{8} \rfloor | \le b \\ 0 &{} \text {otherwise}. \end{array} \right. \end{aligned}$$
(9)

Note that there are now four “safe zones” (Fig. 2, right side). Bob sends his bit selection vector \(\mathbf {d}\) to Alice, along with reconciliation bits \(c_i\) at selected positions with \(d_i=1\). Alice can then get corresponding \(k'_i\) using \(c_i\) via

$$\begin{aligned} k'_i = \left\lfloor \frac{2}{q} \left( x_i - c_i \left\lfloor \frac{q}{4} \right\rceil + \left\lfloor \frac{q}{8} \right\rceil ~\bmod q\right) \right\rfloor . \end{aligned}$$
(10)

Both parties derive a final key of length \(m \le \mathsf {Wt}(d)\) bits by concatenating the selected bits. Since \(\mathbf {y}\) is uniform, each partition is still of size \(2b+1\), and the expected weight is now \(\mathsf {Wt}(\mathbf {d}) = \sum _{i=1}^{n-1} d_i \approx \frac{8b+4}{q} n\), allowing the selection to be made essentially twice as tight while producing unbiased output.

Note that when selection mechanism is used, one needs to “pack” keys to payload size m by removing \(k_i\) and \(k'_i\) at positions where \(d_i=0\). Algorithms 3 and 4 in Appendix A implement Eqs. 9 and 10 with packing.

3.2 Instantiation and Failure Analysis

We adopt the well-analyzed and optimized external ring parameters (\(q=12289\), \(n=1024\), and \(\chi =\varPsi _{16}\)) from New Hope [4, 5] in our instantiation.

Definition 2

Let \(\varPsi _k\) be a binomial distribution source

$$\begin{aligned} \varPsi _k = \sum _{i=0}^k b_i-b'_i \text { where } b_i,b'_i {\mathop {\leftarrow }\limits ^{\$}} \{0,1\}. \end{aligned}$$
(11)

For random variable X from \(\varPsi _k\) we have \(P(X = i) = 2^{-2k} \left( {\begin{array}{c}2k\\ k + i\end{array}}\right) \). Furthermore, \(\varPsi ^n_k\) is a source of \(\mathcal {R}\) elements where each one of n coefficients is independently chosen from \(\varPsi _k\). Since scheme is uses \(k=16\), a typical sampler implementation just computes the Hamming weight of a 32-bit random word and subtracts 16.

Lemma 1

Let \(\varepsilon , \varepsilon '\) be vectors of length 2n from \(\varPsi ^{2n}_k\). Individual coefficients \(\delta = \varDelta _i\) of distance Eq. 4 will have distribution equivalent to

$$\begin{aligned} \delta = \sum _{i=1}^{2n} \varepsilon _i \varepsilon '_i. \end{aligned}$$
(12)

Proof

When we investigate the multiplication rule of Eq. 1, we see that each coefficient of independent polynomials \(\{\mathbf {a}, \mathbf {b}, \mathbf {e}, \mathbf {e'} \}\) (or its inverse) in \(\varDelta \) is used in computation of each \(\varDelta _i=\delta \) exactly once. One may equivalently pick coefficients of \(\varepsilon , \varepsilon '\) from \(\{\pm \mathbf {e}, \pm \mathbf {e'}, \pm \mathbf {s}_A, \pm \mathbf {s}_B \}\), without repetition. Therefore coefficients of \(\varepsilon _i, \varepsilon _i'\) are independent and have distribution \(\varPsi _k\).    \(\square \)

Independence Assumption. Even though all of the variables in the sum of individual element \(\delta = \varDelta _i\) are independent in Eq. 12, they are reused in other sums for \(\varDelta _j, i \ne j\). Therefore, while the average-case distribution of each one of the n coefficients of \(\varDelta \) is the same and precisely analyzable, they are not fully independent. In this work we perform error analysis on a single coefficient and then simply expand it to the whole vector. This independence assumption is analogous to our extension of LWE security properties to Ring-LWE with more structure and less independent variables.

The assumption is supported by our strictly bound error distribution \(\varPsi _k\) (when using discrete Gaussian distributions, which are infinite up to a tail bound, a few highly anomalous values would be more likely to cause multiple errors) and the structure of convolutions of signed random vectors (Eq. 1). Our error estimate has a significant safety margin, however.

Estimation via Central Limit Theorem. The distribution of the product from two random variables from \(\varPsi _k\) in Eq. 12 is no longer binomial. Clearly its range is \([-k^2,k^2]\), but not all values are possible; for example, primes \(p > k\) cannot occur in the product. However, it is easy to verify that the product is zero-centered and its standard deviation is exactly

$$\begin{aligned} \sigma = \sqrt{ \sum _{i=-k}^k \sum _{j=-k}^k \frac{\left( {\begin{array}{c}2k\\ k+i\end{array}}\right) \left( {\begin{array}{c}2k\\ k+j\end{array}}\right) }{2^{4k}} (ij)^2 } = \frac{k}{2}. \end{aligned}$$
(13)

Hence, we may estimate \(\delta \) of Eq. 12 using the Central Limit Theorem as a Gaussian distribution with deviation

$$\begin{aligned} \sigma = \frac{k}{2} \sqrt{2n} \end{aligned}$$
(14)

With our parameter selection this yields \(\sigma \approx 362.0386\) (variance \(\sigma ^2 = 2^{17}\)). Figure 3 illustrates this error distribution.

Fig. 3.
figure 3

The error distribution E of \(\delta = x_i - y_i\) (which we compute with high precision) is bell-shaped with variance \(\sigma ^2=2^{17}\). Its statistical distance to corresponding discrete Gaussian (with same \(\sigma \)) is \(\approx 2^{-12.6}\), which has a significant effect on the bit failure rate. This is why we compute the discrete distributions numerically.

More Precise Computation via Convolutions. The distribution of \(X = \varepsilon _i \varepsilon '_i\) in Eq. 12 is far from being “Bell-shaped” – its (total variation) statistical distance to a discrete Gaussian (with the same \(\sigma = 8\)) is \(\approx \) 0.307988.

We observe that since our domain \(\mathbb {Z}_q\) is finite, we may always perform full convolutions between statistical distributions of independent random variables X and Y to arrive at the distribution of \(X + Y\). The distributions can be represented as vectors of q real numbers (which are non-negative and add up to 1).

In order to get the exact shape of the error distribution we start with X, which is a “square” of \(\varPsi _{16}\) and can be computed via binomial coefficients, as is done in Eq. 13. The error distribution (Eq. 12) is a sum \(X + X + \cdots + X\) of 2n independent variables from that distribution. Using the convolution summing rule we can create a general “scalar multiplication algorithm” (analogous to square-and-multiply exponentiation) to quickly arrive at \(E = 2048 \times X\).

We implemented finite distribution evaluation arithmetic in 256-bit floating point precision using the GNU MPFR libraryFootnote 3. From these computations we know that the statistical distance of E to a discrete Gaussian with (same) \(\sigma ^2 = 2^{17}\) is approximately 0.0001603 or \(2^{-12.6}\).

Table 1. Potential window b sizes for safe bit selection (Eq. 9) for different payload sizes. We target a payload of 496 bits, of which 256 are actual key bits and 240 bits are used to encrypt a five-error correcting code from XE5.

Proposition 1

Bit selection mechanism of Sect. 3.1 yields unbiased shared secret bits \(k=k'\) if \(\mathbf {y}\) is uniform. Discrete failure rate for individual bits \(k \ne k'\) can be computed with high precision in our instance.

Proof

Consider Bob’s k value from in Eq. 7, Bob’s c and Alice’s \(k'\) from Eq. 10, and the four equiv-probable SafeBits ranges in Eq. 9. With our \(q=12289\) instantiation the four possible \(k \ne k'\) error conditions are:

figure c

We examine each case separately (See Fig. 2). Since the four non-overlapping \(y_i\) ranges are of the same size \(2b+1\) and together constitute all selectable points \(d_i = 1\) (Eq. 9), the distribution of \(k=k'\) is uniform. Furthermore, bit fail probability \(k \ne k'\) is the average of these four cases. For each case, compute distribution Y which is uniform in the range of \(y_i\). Then convolute it with error distribution to obtain \(X=Y+E\), the distribution of \(x_i\). The probability of failure is the sum of probabilities in X in the corresponding \(x_i\) failure range.    \(\square \)

Parameter Selection for Instantiation. Based on our experiments, the relationship between window size b and bit failure rate is almost exponential.

Some representative window sizes and payloads are given in Table 1, which also puts our selection \(b=799\) in context. Five-error correction (Sect. 4) lowers the message failure probability to roughly \((2^{-27})^5 \approx 2^{-135}\) or even lower as \(99\%\) of six-bit errors are also corrected. We therefore meet the \(2^{-128}\) message failure requirement with some safety margin.

4 Constant-Time Error Correction

We note that in our application the error correction mechanism operates on secret data. As with all other components of the scheme it is highly desirable that decoding can be implemented with an algorithm that requires constant processing time regardless of number of errors present. We are not aware of satisfactory constant-time decoding algorithms for BCH, Reed-Solomon, or other standard block multiple-error correcting codes [31].

We chose to design a linear block code specifically for our application. The design methodology is general, and a similar approach was used by the Author in the Trunc8 Ring-LWE lightweight authentication scheme [41]. However, that work did not provide a detailed justification for the error correction code.

Definition 3

XE5 has a block size of 496 bits, out of which 256 bits are payload bits \(\mathbf {p} = (p_0, p_1, \cdots , p_{255})\) and 240 provide redundancy \(\mathbf {r}\). Redundancy is divided into ten subcodewords \(r_0, r_1, \cdots , r_9\) of varying bit length \(|r_i|=L_i\) with

$$\begin{aligned} (L_0, L_1, \cdots , L_9) = ( 16, 16, 17, 31, 19, 29, 23, 25, 27, 37 ). \end{aligned}$$
(15)

Bits in each \(r_i\) are indexed \(r_{(i,0)}, r_{(i,1)}, \cdots , r_{(i,L_i-1)}\). Each bit \(k \in [0,~L_0-1]\) in first subcodeword \(r_0\) satisfies the parity equation

$$\begin{aligned} r_{0,k} = \sum _{j=0}^{15} p_{(16k+j)} ~~ (\bmod ~2) \end{aligned}$$
(16)

and bits in \(r_1, r_2, \cdots , r_9\) satisfy the parity congruence

$$\begin{aligned} r_{i,k} = \sum _{j - k ~|~ L_i} p_j ~~ (\bmod ~2). \end{aligned}$$
(17)

We see that \(r_{0,k}\) in Eq. 16 is the parity of \(k+1\):th block of 16 bits, while the \(r_{i,k}\) in Eq. 17 is parity of all \(p_{j}\) at congruent positions \(j \equiv k ~ (\bmod ~L_i)\).

Definition 4

For each payload bit position \(p_i\) we can assign corresponding integer “weight” \(w_i \in [0,10]\) as a sum

$$\begin{aligned} w_i = r_{(0,\lfloor i/16 \rfloor )} + \sum _{j=1}^9 r_{(j, i \bmod L_j)}. \end{aligned}$$
(18)

Lemma 2

If message payload \(\mathbf {p}\) only has a single nonzero bit \(p_e\), then \(w_e = 10\) and \(w_i \le 1\) for all \(i \ne e\).

Proof

Since each \(L_i \ge \sqrt{|\mathbf {p}|}\) and all \(L_{i \ge 1}\) are coprime (each is a prime power) it follows from the Chinese Remainder Theorem that any nonzero \(i \ne j\) pair can satisfy both \(r_{i,a \bmod L_i} = 1\) and \(r_{j,a \bmod L_j} = 1\) only at \(a=e\). Similar argument can be made for pairing \(r_{0,a}\) with \(r_{i \ge 1}\). Since the residues can be true pairwise only at e, weight \(w_a\) cannot be 2 or above when \(a \ne e\). The \(w_e = 10\) case follows directly from the Definition 3.    \(\square \)

Definition 5

Given XE5 input block \(\mathbf {p} ~|~ \mathbf {r}\), we deliver a redundancy check \(\mathbf {r'}\) from \(\mathbf {p}\) via Eqs.  16 and 17. Furthermore we have distance \(\mathbf {r}^\varDelta = \mathbf {r} \oplus \mathbf {r'}\). Payload distance weight vector \(\mathbf {w}^\varDelta \) is derived from \(\mathbf {r}^\varDelta \) via Eq. 18.

Since the code is entirely linear, Lemma 2 implies a direct way to correct a single error in \(\mathbf {p}\) using Definition 5 – just flip bit \(p_x\) at position x where \(w^\varDelta _x = 10\). In fact any two redundancy subcodewords \(r_i\) and \(r_j\) would be sufficient to correct a single error in the payload; it’s where \(w^\varDelta _i \ge 2\). It’s easy to see if the single error would be in the redundancy part (\(r_i\) or \(r_j\)) instead of the payload, this is not an issue since in that case \(w^\varDelta _x \le 1\) for all x. This type of reasoning leads to our main error correction strategy that is valid for up to five errors:

Theorem 1

Let \(\mathbf {b ~|~ r}\) be an XE5 message block as in Definition 5. Changing each bit \(p_i\) when \(w^\varDelta _i \ge 6\) will correct a total of five bit errors in the block.

Proof

We first note that if all five errors are in the redundancy part \(\mathbf {r}\), then \(w^\varDelta _i \le 5\) and no modifications in payload are done. If there are 4 errors in \(\mathbf {r}\) and one in payload we still have \(w^\varDelta _x \ge 6\) at the payload error position \(p_x\), etc. For each payload error \(p_x\), each of ten subcodeword \(\mathbf {r_i}\) will contribute one to weight \(w^\varDelta _x\) unless there is another congruent error \(p_y\) – i.e. we have \(\lfloor x/16 \rfloor = \lfloor y/16 \rfloor \) for \(r_0\) or \(x \equiv y~(\bmod ~L_i)\) for \(r_{i \ge 1}\). Four errors cannot generate more than four such congruences (due to properties shown in the proof of Lemma 2), leaving fifth correctable via remaining six subcodewords (\(w^\varDelta _i \ge 6\)).    \(\square \)

In order to verify the correctness of our implementation, we also performed a full exhaustive test (search space \(\sum _{i=0}^5 \frac{496!}{i! (496-i)!} \approx 2^{37.8}\)). Experimentally XE5 corrects \(99.4 \%\) of random 6-bit errors and \(97.0 \%\) of random 7-bit errors.

Efficient Constant-Time Implementation. The code generation and error correcting schemes can be implemented in bit-sliced fashion, without conditional clauses or table-lookups on secret data. Please refer to the implementations under https://mjos.fi/hila5 and the full version of this paper at https://eprint.iacr.org/2017/424 for more information about these techniques.

The block is encoded simply as a 496-bit concatenation \(\mathbf {p}~|~\mathbf {r}\). The reason for the ordering of \(L_i\) in Eq. 15 is so that they can be packed into byte boundaries: \(17 + 31 = 48\), \(19 + 29 = 48\), \(23 + 25 = 48\) and \(27 + 37 = 64\).

5 Instantiation and Implementation

Our instantiation – codenamed HILA5Footnote 4 – shares core Ring-LWE parameters with various “New Hope” variants, but uses an entirely different error management strategy. Algorithm 1 contains a pseudocode overview of the entire HILA5 Key Encapsulation Mechanism, using a number of auxiliary primitives and functions.

figure d

Notation and Auxiliary Functions. We represent elements of \(\mathcal {R}\) in two different domains; the normal polynomial representation \(\mathbf {v}\) and Number Theoretic Transform representation \(\hat{\mathbf {v}}\). Convolution (polynomial multiplication) in the NTT domain is a linear-complexity operation, written \(\hat{\mathbf {x}} \circledast \hat{\mathbf {y}}\). Addition and subtraction work as in normal representation. The transform and its inverse are denoted \(\mathsf {NTT}(\mathbf {v}) = \hat{\mathbf {v}}\) and \(\mathsf {NTT}^{-1}(\hat{\mathbf {v}}) = \mathbf {v}\), respectively. The transform algorithm is adopted from Longa and Naehrig [28], and not detailed here.

The XE5 error correction functions \(\mathbf {r} = \mathsf {XE5\_Cod}(\mathbf {p})\) and \(\mathbf {p'} = \mathsf {XE5\_Fix}(\mathbf {r} \oplus \mathbf {r'}) \oplus \mathbf {p}\) are discussed in Sect. 4. Here we have “error key” \(\mathbf {k} = \mathbf {p} ~|~ \mathbf {r}\) with the payload key \(\mathbf {p} \in \{0,1\}^{256}\) and redundancy \(\mathbf {r} \in \{0,1\}^{240}\).

The hash h(x) is SHA3-256 [24]. Appendix A contains pseudocode algorithm listings for additional auxiliary functions. Function \(\mathsf {Parse}()\) (Algorithm 2) deterministically samples a uniform \(\hat{\mathbf {g}} \in \mathcal {R}\) based on arbitrary seed s using SHA3’s XOF mode SHAKE-256 [24]. While New Hope uses the slightly faster SHAKE-128 for this purpose, we consistently use SHAKE-256 or SHA3-256 in all parts of HILA5. For sampling modulo q we use the 5q trick suggested by Gueron and Schlieker in [25]. Binomial distribution values \(\varPsi _{16}\) can be computed directly from 32 random bits per Definition 2.

Bob’s reconciliation function SafeBits() (Algorithm 3) captures Eqs. 7 and 9 from Sect. 3. Conversely Alice’s reconciliation function Select() (Algorithm 4) captures Eq. 10.

Encoding – Shorter Messages. Ring elements, whether or not in NTT domain, are encoded into \(|\mathcal {R}| = \lceil \log _2 q \rceil n\) bits \(= 1,792\) bytes. This is the private key size. Alice’s public key \(\mathsf {PK}\) with a 256-bit seed s and \(\hat{\mathsf {A}}\) is 1, 824 bytes. Ciphertext \(\mathsf {CT}\) is \(|\mathcal {R}| + n + m + |\mathbf {r}|\) bits or 2, 012 bytes; 36 bytes less than New Hope [5], 196 bytes less than the variant of [4], and 1, 572 bytes less than LP11 [27].

5.1 Encryption: From Noisy Diffie-Hellman to Noisy ElGamal

Modification of the scheme for public-key encryption is straightforward. Compared to the more usual “LP11” Ring-LWE Public Key Encryption construction [27] our reconciliation approach saves about 44 % in ciphertext size.

For minimal ciphertext expansion with only passive security, one may replace SHA3 at the end of Encaps() and Decaps() with SHAKE-256 and use the output K as keystream to XOR with plaintext to produce ciphertext or vice versa.

However, for active security we suggest that K is used as keying material for an AEAD (Authenticated Encryption with Associated Data) scheme such as AES256-GCM [22, 23] or Keyak [12] in order to protect message integrity. See Sect. 5 of [36] for details of the formal security argument.

5.2 Security

In Algorithm 1 the error correction data \(\mathbf {r}\) is transmitted encrypted with shared secret bits \(\mathbf {z}\), and therefore does not leak entropy about the actual key data \(\mathbf {p}\), also derived from the shared secret. Shared secret bits are unbiased. The shared key \(\mathsf {K}\) also includes plaintext \(\mathsf {PT}\) and ciphertext \(\mathsf {CT}\) in the final hash to protect against a class of active attacks.

Our reconciliation mechanism has no effect on the security against (quantum) lattice attacks, so estimates in [2, 5] are applicable (\(2^{255}\) quantum security, with \(2^{199}\) attacks plausible). Pre-image security is expected from SHA3 and SHAKE-256 in HILA5. Breaking the construction via these algorithms is expected to require approximately \(2^{166}\) logical-qubit-cycles [7, 19, 45].

This leads us to claim that the HILA5 meets NIST’s “Category 5” post-quantum security requirement ([32], Sect. 4.A.5): Compromising key K in a passive attack requires computational resources comparable to or greater than those required for key search on a block cipher with a 256-bit key (e.g. AES 256). The scheme can also be made secure against active attacks with an appropriate AEAD mechanism, as discussed in Sect. 5.1.

Implementation Security. HILA5 has been designed from ground-up to be resistant against timing and side-channel attacks. The sampler \(\varPsi _{16}\) is constant-time, as is our error correction code XE5. Ring arithmetic can also be implemented in constant time, but leakage can be further minimized via blinding [40] (Sect. 6).

Table 2. Performance of HILA5 within the Open Quantum Safe test bench C implementations [43]. The slight (under \(4\%\)) performance difference to New Hope is principally due to our use of error correction and SHAKE-256. Testing was performed on an Ubuntu 17.04 workstation with Core i7-6700 @ 3.40 GHz. For reference and scale we are also including RSA numbers with OpenSSL 1.0.2 (system default) on this target. A single Elliptic Curve DH operation requires \(45.4\,\upmu \mathrm{s}\) for the NIST P-256 curve (highly optimized implementation), and \(331.7\,\upmu \mathrm{s}\) for NIST P-521. Full source code of our implementation is available at https://mjos.fi/hila5/

5.3 Performance

Our main contribution, a new reconciliation mechanism, has a minor effect on performance of the scheme, but a significant impact on failure probability.

We chose to recycle “New Hope” NTT (nq) and sampler (q, \(\varPsi _{16}\)) parameters as they have been extensively vetted for security against lattice attacks and originally selected for performance. A significant effort has subsequently been dedicated (by several research groups) for the optimization of NTT and Sampler components. There already exists a number of permissively licensed open source implementations and a body of publications detailing specific optimizations for these particular NTT and sampler parameters. New Hope has also been integrated in TLS stacks and cryptographic toolkits in 2016-17 by Google (BoringSSL), the Open Quantum Safe project, Microsoft (MS Lattice Library), ISARA Corporation, and possibly others.

There are at least two very fast AVX2 Intel optimized versions of the NTT core and \(\varPsi _{16}\) sampler – the original [5] and one by Longa and Naehrig [28]. Further sampler optimizations have been suggested in [25]. Implementations have also been reported for ARM Cortex-M microcontrollers [6], ARM NEON SIMD instruction set [44], and for FPGA hardware [26].

Our prototype implementation was integrated into a branch of the Open Quantum Safe (OQS) frameworkFootnote 5 where it was benchmarked against other quantum-resistant KEM schemes [43]. Table 2 summarizes the performance of our implementation. It is essentially the same as New Hope C implementation, with slightly smaller message size.

6 Conclusions

With NIST’s ongoing post-quantum standardization effort, the practical performance, implementation security, and reliability of Ring-LWE public key encryption and key exchange implementations have emerged as major research area.

We have described an improved general reconciliation scheme for Ring-LWE. Our SafeBits selection technique avoids randomized “blurring” of previous Peikert’s, Ding’s, and New Hope reconciliation schemes to achieve unbiased secret bits, therefore needing less randomness. We have given detailed, precise arguments for its effectiveness.

The failure probability can also be addressed using error correcting codes. For this purpose we described a class of linear forward-error correcting block codes that can be implemented without branches or table lookups on secret data, guarding against side-channel attacks.

We instantiate the new techniques in “HILA5” with well-studied and efficient “New Hope” Ring-LWE parameters. The new reconciliation methods are shown to have minimal negative performance impact, while significantly improving the failure probability. The failure probability, which is shown to be under \(2^{-128}\), allows the KEM to be used for actively secure public key encryption in addition to interactive key exchange protocols. Furthermore the message sizes are shorter than with previous proposals, especially when used for public key encryption.

We claim that the HILA5 instantiation meets “Category 5” NIST PQC security requirements as a KEM and public key encryption scheme. Furthermore, it has been explicitly designed to be robust against side-channel attacks.