Keywords

1 Introduction

As years go by, quantum computers becomes an increasingly concrete threat. The scientific community is already anticipating the changes in the hardness of various problems such a computer would produce. Cryptology is one of the affected disciplines. Indeed, the current state-of-the-art asymmetric primitives would become insecure, and the NIST has launched a competition for finding new primitives.

Symmetric cryptography, essential for enabling secure communications, seemed much less affected at first sight: for a long time, the greatest known threat was Grover’s algorithm, which allows exhaustive key searches in the square root of the normal complexity. Thus, it was believed that doubling the key lengths suffices to maintain an equivalent security in the post-quantum world.

At the same time, the security proofs in symmetric cryptography often need to make unrealistic assumptions. Therefore, the security of concrete symmetric primitives is mainly based on cryptanalysis: we only gain confidence in their security through extensive and continuous scrutiny. Hence, it is not possible to determine if a symmetric primitive is secure in the quantum world without first understanding how a quantum adversary can attack it. New results in this direction have appeared lately, like quantum generic meet-in-the-middle attacks on iterative block ciphers [28], quantum linear and differential attacks [30], or improved algorithms for collisions or multicollisions [17, 27].

Using Simon’s Algorithm. Some other recent attacks are based on the polynomial-time quantum algorithm of Simon [43]. It began with [34], which presented a distinguisher for 3-round Feistel schemes. It has then been followed among other works by an attack against the classically secure Even-Mansour construction [35], some quantum related-key attacks [41] or a key-recovery attack against the CAESAR candidate AEZ [11].

This algorithm has also been proven efficient against well-known modes of operation for MACs and authenticated encryption at Crypto 2016 [29], where a quantum slide attacks was also demonstrated, with a complexity linear in the block size (see also [42]). An analysis of the FX construct against quantum adversaries was presented at Asiacrypt 2017 [37]. A combination of Grover and Simon showed it was much less secure than expected, and for instance the PRINCE cipher is broken in the quantum setting. These surprising results were the first clearly showing that doubling the key-length of symmetric primitives is not enough – in some cases – to provide an equivalent security against quantum adversaries when considering the superposition scenario, that we discuss next.

The Attack Model. These last mentioned attacks apply in a scenario of superposition quantum queries. It means that the adversary is not only allowed to perform local computations on a quantum computerFootnote 1, but is also allowed to perform superposition queries to a remote quantum cryptographic oracle, to obtain the superposition of the outputs. These attacks have been described as superposition attacks [20], quantum chosen message attacks [10] or quantum security [50].

This is a strong model for the attacker, but there are very good arguments for studying the security of symmetric primitives in this setting (see for instance [24] or [25] for more detailed justifications of the model):

  1. 1.

    This model is simple. Using another model would imply artificial and hard to respect measures with respect to cryptographic oracles in a world with quantum resources, with complex manipulations of yet uncertain outcomeFootnote 2.

  2. 2.

    Safety in this model implies safety in any other scenario, even advanced ones (e.g. obfuscated algorithms).

  3. 3.

    Though powerful, this model is not trivial: not all primitives are broken in it. Actually, several resistant constructions have been proposed [4, 10, 20, 25, 44].

All the attacks proposed in this paper fit in this model.

Countering the Attacks [2]. At Eurocrypt 2017, a proposal for countering the attacks from [29] was presented [2]. The authors propose to replace the common addition, vulnerable to the Simon algorithm, with other operations that imply a harder problem to solve. The most promising of these operations, because of efficiency and implementations issues, already used in several symmetric schemes (i.e. [26, 40, 49]), is addition over , i.e. modular addition. The authors claim the quantum hardness of the hidden shift problem proves the security of their proposal against quantum chosen-plaintext attacks.

This approach is a priori an interesting direction to analyze and study. The authors did not provide a deeper analysis of the impact of various parameters on the security. The attacks are no longer in O(n) (with n the state size) when using the modular addition, as Simon’s algorithm does not apply anymore, but we could describe attacks that are still a lot faster than the generic ones by using Kuperberg’s algorithm [32], e.g. \(2^{O(\sqrt{n})}\) instead of \(O\left( \sqrt{2^{n}}\right) \).

Indeed, classically, a symmetric primitive is considered secure when no attack better than the generic attack exists. While the complexity of the generic exhaustive search is exponential (\(2^{n/2}\)), the quantum attacks on primitives with modular additions have a sub-exponential complexity. This implies a need for a redefinition of security, when building secure primitives with these counter measures, as the best generic attacks that define the security of the cipher (based on Kuperberg now) will be better than the exhaustive search. Also, concrete proposals for the size of the primitives needed in order to provide the typical security needs (i.e. 128 bits) are missing.

Describing in detail the new best quantum attacks on the proposed constructions is necessary to provide concrete designs for a given wanted security. To evaluate the interest of such constructions, we should compare these designs with concrete parameters to other (quantum-secure) ones, like the Advanced Encryption Standard (AES) [19].

On Kuperberg’s Complexity, Improvements, Applications. Studying in detail Kuperberg’s algorithm, proposing improvements and simulating the complexity for concrete parameters has not been done before and is of algorithmic general interest. Such an analysis is required to determine suitable parameter sizes for a given security level. Hidden shift algorithms have an impact beyond the symmetric variants we just mentioned, and can threaten other primitives, such as Poly1305 [6], which uses modular additions. Hidden shift problems also arise in some other cryptographic areas, such as isogenies. They are for example relevant to assess the security of CSIDH [16].

1.1 Our Contributions

1. Kuperberg’s algorithm: improvement, generalization. We studied Kuperberg’s quantum algorithm for hidden shifts in the group  [32] and its applications in symmetric cryptography.Footnote 3 We focus on the groups , which are widely used in symmetric cryptography. The original algorithm retrieves one bit of the secret shift at a time and uses a reducibility property to get the next bit. We propose a variant that performs better by getting all the bits in one step, allowing a drastic cost reduction of the attack on Poly1305. In the extended version of this paper [12], we also propose a generic algorithm to solve the hidden shift problem in non-abelian groups.

2. Simon Meets Kuperberg. We propose a new quantum algorithm that considers a generalization for products of cyclic groups ( and its subgroups), commonly used in symmetric primitives. The problem is more easily solvable in these groups than in . Our complexity analysis shows how it meets Simon (\(w=1\)) and Kuperberg (\(p=1\)) in each extreme.

3. Simulation of the algorithms. We have implemented the classical part of these algorithms (Kuperberg, improved Kuperberg and Simon-meets-Kuperberg) and simulated them in order to estimate the asymptotic query complexity, and to get values for parameters of interest, verifying the expected complexitiesFootnote 4.

4. Attack on Poly1305 in the superposition model. We propose a quantum attack on Poly1305 [6], a MAC that has been standardized for TLS 1.2 [36] and 1.3 [1], and is notably used by OpenSSH, Firefox and Chrome. In [8] a classical and quantum security of 128 bits is claimed for Poly1305: “‘Information-theoretic’ MACs such as GMAC and Poly1305 already protect against quantum computers without any modifications: their security analysis already assumes an attacker with unlimited computing power." Our attack, that works in the superposition model, has a complexity of \(2^{38}\) and uses our improved Kuperberg’s algorithm. It recovers half of the 234-bit key, allowing forgeries of authenticator messages with the same nonce. The attack is not a direct application of the algorithm and requires some additional techniques.

5. Attack on the FX variants. We answer an open question asked in [37], assessing the quantum security of the FX construction with any group law. If the inner key addition is done with a commutative group law, the security gain of the construct is marginal, and the best we can hope to achieve with a non-abelian group is a gain of around n / 3 bits of security for an n-bit inner key.

6. Evaluate the proposed countermeasures from [2]. The final aim was to determine how to size the symmetric primitives in order to offer a certain desired security, and to decide whether the proposed countermeasure was sufficient, and efficient enough in practice. Using modular additions in vulnerable constructions instead of xors for key addition increases the complexity of the corresponding quantum key-recovery attack, but we show that the proposal from [2] does not seem practical. It would require an internal state size of a few thousand bits, to be compared with the size of the internal state of AES-256, which is 128 bits.

Organization of the Paper. Section 2 introduces some preliminary material. Section 3 presents our study on Kuperberg’s algorithm and our improvement, several generalizations, our simulations and the inferred complexities. Section 4 describes our new quantum algorithm for parallel additions. Section 5 presents the first quantum attack on Poly1305 in the superposition model, using Kuperberg’s algorithm. Section 6 estimates the strength of the FX construct with new group laws. Section 7 applies our previous results to actual symmetric primitives, deducing the key or internal state size that must be used in those constructs to offer a desired quantum security. The paper ends with a conclusion in Sect. 8.

2 Preliminaries

In this section, we present the quantum symmetric attacks from [29, 37], the proposed solution from [2] and our cost model.

2.1 Quantum Attacks Using Simon’s Algorithm from [29]

In [29] Simon’s quantum algorithm was applied to cryptanalyze several widely used modes of operation and CAESAR candidates. This was possible due to the exponential speedup of Simon’s algorithm, that solves the following problem:

Let \(f: \{0,1\}^n \rightarrow \{0,1\}^n\). Given the promise that there exists \(s \in \{0,1\}^n\) such that for any \((x,y) \in \{0,1\}^n,\) \([f(x)=f(y)]\Leftrightarrow [x\oplus y \in \{0^n, s\}]\), find s.

The authors applied Simon’s algorithm to find a secret information in time linear in the block size (O(n) instead of \(O(2^{n/2})\) classically). One implication of the problem was not verified in the attacks: with a small probability, we might have \(f(x)=f(y)\) and \(x\oplus y \notin \{0^n, s\}\). However, they showed that the algorithm is still efficient with a random function in place of a random permutation.

2.2 Solution Proposed in [2]

In [2], the authors propose to change the group law in the primitives broken by [29], to prevent the use of Simon’s algorithm. They also propose a security reduction from the primitives to the corresponding hidden shift problem, and claim that they are safe, as no polynomial algorithm for these problems is known. They notably propose (for which Kuperberg’s algorithm is, in a sense, not a threat, as it is superpolynomial), or the symmetric group \(\mathcal {S}_n\).

2.3 Cryptanalysis of the FX Construction [37]

The FX construction [31] uses a block cipher \(E_k\) and two additional keys \(k_1,k_2\), and is defined as \(\text {FX}_{k_0,k_1,k_2}(x) = E_{k_0}(x\oplus k_1)\oplus k_2\). It can be broken by combining Simon’s and Grover’s algorithms: one can perform an exhaustive search on \(k_0\) and then see the FX construct as an Even-Mansour with the public permutation \(E_{k_0}\), which can be broken with Simon’s algorithm. The authors left as an open problem the case where the whitening keys were added with modular addition.

2.4 Cost Model

We’re interested in the explicit costs of the algorithms we study. These algorithm have all a similar shape: they use a generation circuit that produces some relevant qubits, a combination circuit that uses the produced qubits, and a control circuit that chooses which qubits are to be combined. The generation circuit is a Quantum Fourier Transform applied to an oracle, whose total cost in time and memory is the number of queries. The combination circuit has a fixed cost, and can only be used once per query. The control circuit can be more complex, but only have to reason about classical values, and hence can be implemented purely classically. Its cost in time and memory will be the cost in query, with a polynomial overhead. As we expect that a classical computer will be far more efficient than a quantum computer to apply the same number of gates, we estimated that the bottleneck of our algorithm will be the quantum part of it, and that the relevant cost unit here is the number of queries.

3 New Results on Kuperberg’s Algorithm

In this section, we study Kuperberg’s quantum algorithm for solving the hidden shift problem. While the final aim is to be able to accurately estimate the complexities of the cryptanalysis on primitives whose security rely on the hidden shift problem, we have also performed a deeper work that verifies and helps better understanding Kuperberg’s algorithm and its performance. We propose a new variant of the algorithm that reduces its cost, and that will allow to build the performant attack from Sect. 5. We’ve implemented the classical part of these algorithms and made some simulations in order to get concrete estimates of the asymptotic complexity and values for parameters of interest, that match and refine the theoretical expectations. In the extended version of this paper [12], we also propose a generic algorithm to solve the hidden shift problem in non-abelian groups.

3.1 Hidden Shift Problem and Quantum Algorithms

The hidden shift problem (HSP) is defined as follows:

Let f, g be two injective functions, \((\mathbb {G},\cdot )\) a group. Given the promise that there exists \(s \in \mathbb {G}\) such that, for all x, \(f(x) = g(x\cdot s)\), retrieve s.

We say that f is a shifted version of g, the shift being s. To estimate the complexity, we consider \(n = \log _2|\mathbb {G}|\). The hardness of the problem depends on the group law. If it is a bitwise xor, Simon’s algorithm [43] solves it in polynomial time. If the group law is a modular addition, it can be solved with a linear number of queries [21]. This method requires an exponential-time classical post-processing, and as such, won’t be interesting for us. The first sub-exponential (in quantum query and quantum and classical time) algorithms are presented in [32]. They have a time and space complexity in \(2^{O(\sqrt{n})}\) for a group of size \(2^n\). Other variants were developed later, with an algorithm with quantum polynomial space, but slightly worse time complexity, in \(2^{O(\sqrt{n\log (n)})}\) [39], and some algorithms in [33], that generalize the previous one, allowing some trade-offs between classical and quantum memory and time.

From this point, we focus on additions modulo a power of 2, as they are very common in symmetric cryptography, due to implementation reasons.

Single Modular Addition. All these algorithms are in two parts: an oracle that calls f and g to produce some labeled qubits , with \(\ell \) a classical value that we call a label, and a combination circuit that transforms them into more interesting ones. The oracle part uses the quantum oracle

Generation. The oracle circuit (Fig. 1a) produces the uniform superposition in the registers b and x with Hadamard gates (H), feeds them to the oracle (O), and then measures register y. This measurement gives a result \(y_0\) and collapses the b and x registers in the state , which is the state for a given (unknown) \(x_0\), thanks to the promise. We then apply a quantum Fourier transform (QFT) on the x register and measure the result. This gives us a uniformly distributed \(\ell \), and collapses the remaining qubit in the state .

This qubit depends on s, but is not directly exploitable. The qubit is very interesting, as it is if the lowest bit of s is 0, and else is . Hence, if we measure it in the basis, we get one bit of s.

Fig. 1.
figure 1

Quantum circuits for Kuperberg’s algorithm

Combination. We have then a combination part, that uses the produced qubits to generate some more interesting ones. The combination is done with the circuit in Fig. 1b, that consists of one controlled-not and a measurement of the second register. By doing so, we destroy two elements in order to produce one. Before the measurement, the system is in the state

If we measure a 0 we’ll get the qubit , and if we measure a 1 we’ll get . Both outcomes are equiprobable. If we only look at the labels, the combination routine destroys 2 labels and produces a new label, which is either their sum or difference. We want to obtain the label \(2^{n-1}\). This abstract problem would be a problem of subset-sum modulo \(2^n\) if the operation at each combination was fixed, and not picked randomly in \(\{+, -\}\), as we would want to find a tuple satisfying

$$\begin{aligned} \sum _{i\in I} \ell _i = 2^{n-1} \mod 2^n. \end{aligned}$$

However, in our situation, the problem is closer to finding a tuple satisfying

$$\begin{aligned} \sum _{i\in I} \delta _i\ell _i = 2^{n-1} \mod 2^n, \end{aligned}$$

with \(\delta _i \in \{-1, 1\}\) unknown before the actual destructive computation.

With these quantum tools, we can produce random elements and combine them, but we need an algorithm to choose which elements to combine.

Choosing the Elements to Combine. As a combination produces either \(a+b\) or \(a-b\), we need to find a property preserved in both cases, to not lose everything if the wrong outcome occurs. It turns out divisibility by 2 is such a property: if both a and b are multiples of \(2^k\), \(a+b\) and \(a-b\) will also be multiples of \(2^k\). Hence comes naturally the idea of the combination algorithm: from the elements we have, generate elements with a higher divisibility by 2, until we get \(2^{n-1}\). To achieve this, we can combine elements such that \(a+b\) or \(a-b\) has a high divisibility by 2 (e.g. have a long trail of 0 in their binary representation).

Hence, an algorithm to find \(2^{n-1}\) is then to separate the elements in pools by their divisibility by 2, and, beginning with the odd numbers, to combine the two elements that can produce a number with the highest possible divisibility by 2. As this property corresponds to the longest partial collision in the binary representation of the elements, they can be efficiently found with a radix tree. There is however one caveat: we don’t want the useless 0 element, so we try to not combine two identical elements, or one element and its opposite.

As the interesting a and b collide on their lowest bits, they have the same divisibility by 2, hence \(a=2^k(2a'+1)\) and \(b=2^k(2b'+1)\). Then, \(a+b = 2^{k+1}(a+b+1)\) and \(a-b = 2^{k+1}(a-b)\). This means that even in the bad case (with a small divisibility by 2), we still get a slightly better divisibility by 2. Then, the algorithm consists in using this heuristic until we get \(2^{n-1}\).

This is Algorithm 1, which is Algorithm 3 of [32]. The paper also presents a sketch of proof that its complexity is in \(\widetilde{O}\left( 2^{\sqrt{2\log _2(3)n}}\right) \). As the paper only focuses on the asymptotic exponent complexity, the polynomial part is not well known. We can however deduce from the sketch of proof a complexity in \(O\left( n\sqrt{n}2^{\sqrt{2\log _2(3)n}}\right) \) to retrieve the whole hidden shift, which may not be a tight bound (both for the polynomial and the exponent), due to the way the sketch of proof works.

figure a

If this succeeds, we get the value of the lowest significant bit of the hidden shift, \(s_0\). We have then to retrieve the other bits of s. This can be done using a recursive procedure: with the knowledge of \(s \mod 2 = s_0\), we can construct the functions \(f'(x) = f(2x)\) and \(g'(x) = g(2x+s_0)\), that have the hidden shift \(s' = (s-s_0)/2\) in . The 2nd bit of s is the lowest bit of \(s'\), and we can reapply the routine, and so on until we get all the bits.

Quantum Memory Cost. This algorithm has a cost in quantum memory of one qubit per query, plus the memory cost of the quantum oracle, which depends on the concrete instance. This memory is weakly entangled: after the quantum oracle queries, we will have only pairs of entangled qubits. This notably implies that a corrupted qubit will not disrupt the whole computation, one would only need to erase the corresponding label.

Classical Cost. The classical part needs to search for the best colliding pairs. This can be efficiently implemented using a radix tree. Moreover, as for our purposes, the labels x and \(-x\) are equivalent, we can normalize them, for example by forcing, in the binary representation, the most significant bit or the bit after the lowest significant one to be a zero. The time and memory cost will be in \(O(N\log (N))\). With \(N = O(2^{\sqrt{2\log _2(3)n}})\), the complexity will be in \(O(\sqrt{n}2^{\sqrt{2\log _2(3)n}})\). This represents a logarithmic overhead compared to the quantum query cost. This part is purely classical, and we consider here that the relevant cost metric is the quantum cost.

3.2 New Variant with Improved the Time Complexity

In this section we propose an optimization of the previous algorithm that allows to perform the attack in Sect. 5. Previously each bit of the shift was retrieved independently. We have noticed that if some qubits remains once we have found the target qubit, we can reuse them in the rest of the computation. The phase of the element \(\ell \) is \(2\pi \frac{\ell s}{2^n} = 2\pi \frac{\ell (s_0+2s')}{2^n} = 2\pi \frac{\ell s'}{2^{n-1}} + 2\pi \frac{\ell s_0}{2^n}\). If \(s_0 = 0\), we can reuse them directly as elements of to retrieve the next bit (we just have to see the label modulo \(2^{n-1}\), that is, drop its most significant bit).

If \(s_0 = 1\), we have an additional phase of \(2\pi \frac{\ell }{2^n}\) that prevents us to do so. We can get rid of it by applying a phase shift gate of angle \(-2\pi \frac{\ell }{2^n}\) (which is the identity for and changes the phase of by a given angle) before reusing it. We can apply this trick to reuse the remaining qubits, once we have found \(2^{n-1}\). Moreover, in the 2nd phase, the interesting elements are 010...0 and 110...0, that is, any element of the penultimate pool. Likewise, we can use an element in a pool to retrieve one bit of the shift if we know all the preceding bits.

This strategy leads to the improved Algorithm 2, where we ensure that each pool stays non-empty. If we miss one qubit, won’t have the value of the corresponding bit of s, and, as we won’t know which rotations to do, on the following bits of s.

Differences in Complexity. The elements we keep to retrieve the whole shift are not used in a combination, hence we can have \(n-1\) combination less. The combination not done will be the least interesting one, hence this will have a negligible impact. Another constraint is that all the pools must be nonempty. As the hardest to fill is the one targeted by the original algorithm, it does not change much the cost. Empirically, we found that the overhead compared to the original algorithm is less than 2, and converges to 1 as n grows. The main difference is that we only need to proceed once and not n times.

The complexity proof of the original algorithm naturally carries to the new one. The core idea is that by combining \(2^e\) elements, we can expect to produce around \(2^e/3\) elements whose label is a multiple of \(2^e\). Applying this principle multiple times leads to needing an initial pool of size \(O\left( n2^{\sqrt{2\log _2(3)n}}\right) \) in order produce the wanted qubit with a negligible failure probability. Hence, the complexity is in \(O\left( n2^{\sqrt{2\log _2(3)n}}\right) \). As the only difference in both algorithms concerns the use of 2n qubits, the same principle is applicable for both algorithms. With these complexity estimates, we obtain that the cost to retrieve the whole secret with our algorithm is in \(O\left( n2^{\sqrt{2\log _2(3)n}}\right) \), while the original algorithm would be in \(O\left( n\sqrt{n}2^{\sqrt{2\log _2(3)n}}\right) \). We found out that it is better in practice, as developed in Sect. 3.4.

The memory complexity is negligibly increased, as we have slightly more elements to deal with at once.

figure b

3.3 Approximated Promise

In concrete attacks, we may want to use this algorithm on functions that respect partially the promise. We study in this section various cases.

Lemma 1

(Unwanted collisions). Let be a random function, , g such that \(g(x) = f(x+s)\). Given a quantum oracle access to f and g, we can retrieve s in Q quantum queries if we can solve the hidden shift problem in with a permutation using Q / e quantum queries.

Proof

This case was studied in Sect. 2.2 of [29] in the context of Simon’s algorithm. It corresponds to the hidden subgroup problem with a non-injective function. It then still respect for all x, \(f(x) = g(x+s)\) for a secret s.

Let’s decompose each step. The measurement of the third register of

produces

and the measurement yields \(f(x_j)\) with probability \(c/2^n\). After the QFT, the measurement will give us a label \(\ell \) and a qubit

As a qubit is invariant by a global phase shift, we still get a valid element. However, it is not uniformly sampled, and the probability of getting a given \(\ell \) is

$$\begin{aligned} p = \frac{1}{c2^n}\left| \sum _{j = 1}^c\exp \left( 2i\pi \frac{x_j\ell }{2^n}\right) \right| ^2. \end{aligned}$$

Notably, the case \(\ell = 0\), which is useless for us, is the most probable.

It is known [23] that for a random function, the expected number of images with r preimages is \(2^n/\left( er!\right) \). The first measurement samples on the images, uniformly if it is a bijection, and proportionally to the number of preimages in the general case. That means we’ll have a probability of \(r/(er!) = 1/(e(r-1)!)\) of getting an image with r preimages. We’ll get a unique preimage with probability 1 / e, so that means with e times the number of samples, we’ll get enough elements with only one preimage. This is a very rough approximation, as the multiple preimages induces only a bias on the generated elements.

Remark 1

Alternatively, we can consider the function \(F(x) = (f(x),f(x+1),\dots )\), that has the same shifts as f, but has a smaller probability of unwanted collisions, at the cost of having to query f multiple times for one query of F.

Lemma 2

(Multiple shifts). Let , let fg two permutations of such that, for all xi, \(f(x) = g(x+s_i)\). The first bits of the \(s_i\) can be retrieved if and only if they are all equal. They can be retrieved by solving the HSP in with the same functions, with \(2^k = \gcd _{i\ne j}(2^n,s_i-s_j)\).

Proof

We can study what happens with two shifts, s and t. We have, for all x, \(f(x) = g(x+s) = g(x+t)\).

From these equalities, we can deduce that for all x and \(\lambda \), \(f(x) = f(x+\lambda (s-t)) = g(x+s+\lambda (s-t))\). The functions have in fact plenty of shifts: \(s+\lambda (s-t)\), the exact number depending on the divisibility by 2 of \(s-t\). The bits of x that are above this level have in fact no impact on the value of f, so this problem is degenerate: if \(s-t = 2^k\mu \), we have an instance of the problem in , with a hidden shift \(s' = s \mod 2^k = t \mod 2^k\), and we have \(2^k = \gcd (2^n,s-t)\). We cannot get the other bits of s or t, as all the \(s+\lambda (s-t)\) are also valid shifts.

For more shifts, we need to consider the difference that have the smallest divisibility by two, that is, the gcd of all the differences with \(2^n\).

As the divisibility by two of the difference corresponds to an equality in the first bits, the lemma holds.

Remark 2

If we don’t know that the functions have multiple shifts, or if the gcd is not known in advance, this is still detectable, as the labels we measure will always divide \(2^{n-k}\).

Proof of the Remark

The formula of the probability of measuring \(\ell \) is \( p(\ell ) = \frac{1}{c2^n}\left| \sum _{j = 1}^c\exp \left( 2i\pi \frac{x_j\ell }{2^n}\right) \right| ^2 \) with c shifts. This reduces to

$$\begin{aligned} \frac{1}{c2^n}\left| \exp \left( 2i\pi \frac{x\ell }{2^n}\right) \sum _\lambda \exp \left( 2i\pi \frac{\lambda 2^k\ell }{2^n}\right) \right| ^2. \end{aligned}$$

This is 0 if \(\exp \left( 2i\pi \frac{2^k\ell }{2^n}\right) \ne 1\), that is, if \(2^{n-k} \not \mid \ell \). This means we’ll only get some \(\ell \)s with at least \(n-k\) trailing zeros.

The following lemma addresses the problem of functions which respect the shift promise only for a subset of their input, and shows this is still resolvable if the number of wrong inputs is small enough.

Lemma 3

(Partial shift). Let f, g two permutations of , , such that, for all \(x\in X\), \(f(x) = g(x+s)\). Then if the hidden subgroup problem in costs Q queries, we can retrieve s given quantum oracle access to f and g in Q queries, with probability \((|X|/N)^Q\).

Proof

If we measure an f(x) whose x is in X, then we have a valid element. This happens with probability |X| / N. If this is not the case, we get a malformed qubit. We can expect the algorithm to succeed only if all the Q queried elements are valid, which happens with probability \((|X|/N)^Q\).

Remark 3

It would also be possible for the algorithm to succeed if we have a way to identify the bad x from the value f(x) / g(x), which is measured, as it would allow us to drop the corrupted qubit when we create it. The problem would then only concern the unidentified bad x.

Lemma 4

(Input restriction). Let fg be two permutations of , such that, for all x, \(f(x) = g(x+s)\). Given a quantum oracle access to f and g restricted to the inputs \(0\le x < 2^n\), if \(0\le s < 2^{n-1}\) and the hidden subgroup problem in can be solved in Q queries, s can be retrieved in \(eQ^2\) queries.

Proof

We are only given access to the interval \([0 ; 2^n)\). We cannot see the hidden shift in as a hidden shift in . However, if s is small enough, we have an instance of a partial hidden shift, the valid elements being the ones such that \(0\le x < 2^n\) and \(0\le x+s < 2^n\). The probability to get a bad element is less than \(s/2^n\) in this case. If we need Q queries, and \(s/2^n \simeq 1/Q\), then the success probability will be greater than \((1-1/Q)^Q \simeq 1/e\). This fails for greater s.

However, we can query a subinterval of \([0 ; 2^n)\) for f and g. For \(A\in [0 ; 2^{n-1})\), if we query \([0 ; 2^{n-1})\) to f(x) and \(g(x+A)\), we will retrieve s with probability 1 / e if \(0\le s-A < 2^{n-1}/Q'\), if we need \(Q'\) queries to solve the hidden subgroup problem in .

To retrieve s, we can sequentially test for all A multiples of \(2^{n-1}/Q'\), until we reach \(2^{n-1}\). We then have \(Q'\) intervals to test, and each test costs \(Q'\) queries. Moreover, the algorithm will succeed if the test with the right guess of A succeeds, and can be verified with a few classical queries. As the right guess has a success probability greater than 1 / e, we expect to find the shift in \(eQ'^2\) queries.

Remark 4

Here, we do a sequential test of the intervals. We could do a Grover search on it instead, but we would need to choose a slightly higher number of queries, in order to have a success probability very close to one. Moreover, it would force us to implement all the control system that chooses which qubit to collide quantumly and not classically.

Remark 5

We can see this method as trying to solve the HSP in \(\mathbb {Z}\). It also shows that considering only the cyclic groups allows to solve the problem in any cyclic group in subexponential time, despite a different group structure.

3.4 Simulations

We have simulated the classical part of the algorithm by replacing the quantum measurements by random outcomes. We used this to get an estimate of the query complexity: We generate a certain amount of random numbers, and then combine them in order to get the values we want. We hence get an estimate of the success probability for a given amount of samples (Fig. 2), and deduce from it an asymptotic complexity for a constant success probability. Table 1 shows some results of these simulations for different values of n, for 90% success probability. The code of this implementation is available at https://who.paris.inria.fr/Xavier.Bonnetain/extra/code.tar.gz.

Fig. 2.
figure 2

Estimated success probability in the number of samples, for 64 bits

Figure 2 shows the estimated probability of retrieving the whole secret in function of the number of initial queries for a 64-bit secret. We’ve considered this parameter instead of some finer ones, such as the numbers of bits we retrieved because of the dependency between the bits we can retrieve: we have to retrieve them in order, and the first ones are the hardest to get. We can try to guess the missing bits, but as we destroy our qubits when we measure them, we can’t recover from a wrong guess. It shows a transition from a negligible probability of success to a negligible probability of failure in less than a factor 2. As the algorithm is collision-based, it performs significantly better if it is run once with a bigger initial pool than many times with smaller pools. It also shows that the gap to get an arbitrarily small failure probability is small, which is useful if we want to combine it with another quantum algorithm, like a Grover search.

Table 1. Some results of the simulation of Algorithm 2 for 90% success probability

We can then deduce a heuristic complexity in query of \(0.7\times 2^{1.8\sqrt{n}}\) for a 90% success probability for Algorithm 2, which matches the exponent complexity of \(\widetilde{O}\left( 2^{\sqrt{2\log _2(3)n}}\right) \) of the less efficient Algorithm 1, as \(\sqrt{2\log _2(3)} \simeq 1.8\). We also performed a few simulations of Algorithm 1, which gave slightly smaller results (the ratio was less than 2, and decreased as n increased). As Algorithm 1 needs to be repeated n times, it performs worse, both asymptotically and for the ranges we were able to simulate. We see that the polynomial part is in fact a constant next to 1 for Algorithm 2, which hints that the bound in [32] for Algorithm 1 of \(O\left( n2^{\sqrt{2\log _2(3)n}}\right) \) to retrieve the last bit is probably tight for the exponent part, but not for the polynomial part.

4 New Algorithm: Simon Meets Kuperberg

We describe in this section a new quantum algorithm, that, for the first time, solves efficiently the HSP problem when considering a product of cyclic groups, which often appears in symmetric constructions [5, 7, 22, 45]. We also provide a simulation of the algorithm in Sect. 4.3, showing that our complexity estimations are correct.

4.1 Solving the Hidden Shift Problem for Parallel Modular Additions

An interesting generalization for, inter alia, symmetric cryptography is to consider p termwise additions modulo \(2^w\), that is, a modular addition in . The hidden shift in this case is a vector \(s = (s_1,\dots , s_p)\) of p words of w bits each. The aim of this section is to propose a new algorithm that deals efficiently with that group. The first natural approach was to apply an adapted variant of Kuperberg (as suggested in [32, Thm 7.1]), but its complexity of \(2^{O(\sqrt{n})}\) significantly differs from optimal: we explain in this section how to considerably improve this. We propose a new algorithm which complexity is close to optimal. It exploits three facts in particular that allow us to consequently improve the complexity. In order to describe our algorithm, we need to previously adapt the first part of Kuperberg’s algorithm by considering a quantum Fourier transform compatible with the group law, so the original one is changed to into a termwise variant. The oracle circuit produces the qubits , the product is replaced by an inner product. The combination circuit also works the same way, and produces a termwise sum or difference.

Better Worst-Case Gain. The first fact that allows to improve the complexity over a basic algorithm is realizing that, though the combination strategy can be quite similar with a research of partial collisions on the lowest significant bits of each term, there is however a difference in the behavior in the disadvantageous case: while we gained only one 0 in the former situation, here, we’ll get a 0 in each term in which we have a collision in the lowest 1 (p zeros) while the size of the corresponding list is big enough. We also have more choices in the combinations, and we can have various equivalent and incompatible possibilities, with collisions on different parts of the vector.

With p+1 Equations We Can Always Gain p Zeros. As before, we can separate the elements in pools, depending on the divisibility by 2 of each term. Instead of looking at the position of the first one, we look at the position of the first one in any component of the vector to separate in pools. In each w pool, we can restrict ourselves to the bit slice corresponding to the corresponding level. This slice corresponds to a vector in . Hence, we can produce a vector that will fit in the next pool if we manage to find some linearly dependent vectors, that is, whose sum (or difference, as it is the same in ) is 0.

Recovering the Shift. We realized that the elements with \(\ell _j \in \left\{ 0, 2^{w-1}\right\} \) are of the form , so measuring them in the basis will give us the parity of \(\sum s_j\ell _j\), that is, a linear equation in the parity bits of the \(s_j\). In the case \(w = 1\), we get a variant of Simon’s algorithm for hidden shifts.

We describe next how to apply each approach separately, and then describe how our algorithm combines them to obtain an optimized complexity, that will be discussed and analyzed in Sect. 4.2.

First Idea: Kuperberg’s Variant with a Better Worst-Case Gain. A simple strategy represented in Algorithm 3 is to mimic the former one: we apply directly the strategy with the first term to zero all its bits except the most significant one, and then process the second term, and so on. We can also apply it the other way around: we can see the vector \((s^{w-1}_1\cdots s^0_1, \dots , s^{w-1}_p\cdots s^0_p)\) as the number \(s^{w-1}_ps^{w-1}_{p-1}\cdots s^{w-1}_{1}s^{w-2}_p\cdots s^0_p\cdots s^0_1\), and apply directly the former strategy, until we get enough elements of the form \(s^{w-1}_ps^{w-1}_{p-1}\cdots s^{w-1}_{1}0\cdots 0\) that we can measure. Another approach is to weight all the possible combinations with the expected gain in the total number of trailing zeros, and choose the most favorable one. The first two have the advantage of being classically easy to implement, with a radix tree.

figure c

Second Idea: p+1 Dependent Equations Always Gain p Zeros. There is however another way to use the parallel structure of the hidden subgroup: given \(p+1\) random elements, we can find a subset whose sum (or difference) will always be even on all the components: if we look at the parity vector of the elements, this corresponds to a linearly dependent subset of the vectors. This approach can be useful if p is big with respect to the size of the pools: with on average \(p/2+1\) vectors, we can zero p bits. We can then iterate the technique to set to zero the next row of bits, and so on. This is described in Algorithm 4.

Moreover, seeing the elements in a pool as equations allows us to perform the same optimisation we have proposed for the case \(p=1\), to get all the secret in one pass. Instead of storing one element per pool, we have to store p elements that are linearly independent, that is, a full system of equations. As this optimisation does not depend on what we do to each pool, we can also apply it to improve Algorithm 3.

As, on average, we combine \((p/2+1)\) elements, we divide at each w step the pool by \((p/2+1)\). This algorithm has a complexity in \(O((p/2+1)^w)\). If \(w = 1\), it matches Simon’s complexity (and is, indeed, Simon’s algorithm). It is interesting for big p, as it is polynomial in p, but it quickly becomes costly if w rises, as it is exponential in it.

Our New Algorithm: Combining both Ideas. As the two variants merge the elements to progressively create new elements with a greater number of zeros, we can, to be more efficient, combine both methods. Algorithm 4 is more efficient when we have a small number of qubits to deal with, with a large p, but is exponential in w, while Algorithm 3, having the same structure than the original algorithm, is subexponential. Hence, the most efficient way to combine them is to begin with Algorithm 3 until we produce some elements that lies in a subgroup with a sufficiently small \(w'\), where we can use Algorithm 4.

figure d

To estimate the complexity, we reasoned backwards: we estimated how many elements in we needed to obtain a given number of elements in , with \(w' < w\), for increasing w.

The point at which we change of algorithm is the threshold, \(\tau \). The value of this threshold is estimated and studied in the next section. Our new algorithm is described in Algorithm 5, where all the bits are also recovered in one pass thanks to our adapted improvement.

4.2 Complexity Analysis

In this section we provide a complexity analysis of the previously described algorithm, that will depend on the relation between the parameters w and p. A summary can be found in Table 4.

We first estimate the complexity of Algorithms 3 and 4, and then combine these costs to compute the best thresholds, and derive the final complexity.

Complexity Using Partial Collisions. To estimate the complexity of partial collisions, we first need to estimate the cost of the partial collisions with independent bits (in ), which is a more favourable situation than the collisions in , as we do not have any uncertainty in the outcome of a combination. We had the same approach as for the original algorithm: we performed simulations. An optimistic approach could estimate that the complexity is \(2^{\sqrt{2p}}\), which would mean that a pool of \(2^e\) elements produces a pool of \(2^{e-1}\) elements that all have e more zeroes. In practice, this is not what we observed, and we found a complexity of around \(2^{\sqrt{2.3p}}\), as presented in Table 2. This algorithm is far from the best method to solve this problem, but it can become relevant if we need a huge number of elements that are zeroed on p bits.

figure e
Table 2. Some results of the simulations, for 90% success.

Collision Cost. We have two heuristics for collision cost. When we don’t have enough elements to have some collisions, we estimate from our simulations the cost to produce E elements to be around \(2^{\sqrt{2.3p+\log _2(E)^2}}\) in this situation.

The minimal cost is around 2E, as we need to combine every element with another one. Doing simulations, we found that it is \((2E+2^{p-2})/(1+1/2^p)\), asymptotically in E. The 2E comes from the fact that we do for almost all non-zero elements one combination, the \(2^{p-2}\) corresponds to the small number of elements that don’t have a total collision with another element, and the \((1+1/2^p)\) corresponds to the zero element that can naturally occur, with probability \(1/2^p\).

Alternative Approach. Instead of considering only one row of bits and try to zero it, we can consider a bigger chunk. That is, we want to produce E elements that have a certain amount (greater than p) of zeroes in their firsts bits. In order to estimate this cost, we refer to Kuperberg’s original algorithm. In practice, the algorithm will be more efficient, but we can approximate it with the original complexity. We can then estimate the cost of the algorithm to zero q bits to be \(2^{\sqrt{2\log _2(3)q + \log _2(E)^2}}.\) As before, this will not hold if we have to many elements to produce, as the minimal cost is 3E. We should never be in this regime, as this would mean that we can obtain what the wanted value in one good combination.

Complexity Using Equations

Lemma 5

(Equation cost). An iteration of the outer for loop of Algorithm 4 produces on average \(N/(p/2+1)\) elements with p zeroed bits using N elements, and needs p qubits.

Proof

A step of Algorithm 4 uses random equations to produce a zeroed element. If we have p elements that form a basis of , any other element is a linear combination of p / 2 elements, on average, in this basis. If we have a basis, we can hence get an equation that has, on average \(p/2+1\) elements, and that sums to zero on the p bits. We can then construct such a basis by choosing p random elements: if they form a free family, we have a basis, if not, we then have some elements that sums to zero. This allows to perform the algorithm on-the-fly: each time a new element arrives, we can try to form a basis with the previous ones. If we new element is linearly independent, we add it in our memory. If it is not, we combine all the elements that sums to zero.

Theorem 1

Algorithm 4 has a complexity in quantum queries and time of around \(2(p/2+1)^w\). It needs \(2p(w-1)\) quantum memory, plus the oracle cost.

Proof

At each outer for loop iteration, we store p independent elements that will allow us to retrieve p bits, and divide the remaining number of elements by \(p/2+1\) (Lemma 5). At the end, we want p elements (with only p elements, as they would be random, the success probability is only of 1 / e, but we can get arbitrarily close to 1 with a fixed overhead). The total cost is then of

$$\begin{aligned} p(p/2+1)^{w-1}+p(p/2+1)^{w-2}+\dots +p, \end{aligned}$$

which reduces to \(2(p/2+1)^w\). The total cost in quantum memory is then \(p(w-1)\) qubits for the \(w-1\) steps, and \(p(w-1)\) qubits that will yield an equation in the bits of the shift, but that we cannot measure immediately. This cost in memory is optional, as we could do the algorithm w times, but we would then have to pay the constant overhead at each step and not only at the last one.

Remark 6

We found that the marginal cost of \((p/2+1)\) elements to produce one can be beaten if the total number of elements is huge by sorting them before searching for a zero-sum set. As extracting values from a radix tree naturally produces a sorted list, this was observed in our simulations.

Determining the Total Complexity. To determine the complexity, we will run the algorithm backwards : we estimate how many elements we need at a point of the algorithm, and then deduce how many elements we need before to obtain this number of elements. More precisely, we consider a fixed p, and estimate what we have to do to get the elements we want as w grows. We considered four cases. First, when w is small enough to use Algorithm 4. Next, when we have to use Algorithm 3 but the number of queries is small enough to have partial collisions on independent bits, such that we never have bad outcomes in our combination. For bigger w, we considered another approximation, which is that a combination gains at least one zero in each independent component in the worst case. This is not a relevant model when \(p = 1\), as it produces estimates exponential in w, but it is interesting here. Finally, the last approximation model is to neglect the gains due to the parallel additions, and consider Kuperberg’s original algorithm complexity estimation.

The Final Steps. The final steps uses Algorithm 4. The complexity to process w rows is then \(C_0(p,w) = 2(p/2+1)^w\).

Changing to Collision Finding. With collision finding, we can erase one row and produce E elements at a cost of \(2^{\sqrt{2.3p+\log _2(E)^2}}\). Hence, if we combine this algorithm, the cost is \(C_1(p,w) = 2^{\sqrt{2.3p(w-\tau )+\log _2(C_0(p,\tau ))^2}}\), with \(\tau \) rows handled by the other algorithm.

Threshold. We want to change of algorithm if \(C_0(p,w) > 2^{\sqrt{2.3p+\log _2(C_0(p,w-1))^2}}\). This means that \(2(p/2+1)^w \ge 2^{\sqrt{2.3p+(1+(w-1)\log _2(p/2+1))^2}}\), which implies

$$\begin{aligned} \tau = \lfloor 1.15p/\log _2(p/2+1)^2+1/\log _2(p/2+1)-1/2\rfloor . \end{aligned}$$

This threshold is the number of steps in which we should use Algorithm 4, and the previous steps are solved using Algorithm 3.

Saturated Regime of Collisions. We saw before that the cost of zeroing one row is asymptotically around 2E, and cannot outperform this bound. We can now estimate when our previous estimate violates this bound. This occurs when \(2^{\sqrt{2.3p+\log _2(E)^2}}\le 2E\), which implies \(E\ge 2^{\frac{2.3p-1}{2}}\simeq 2^{1.15p}\). Using this constraint to the previous complexity, we get that w must be lower than

$$\begin{aligned} w_1 = \lfloor 2.3p/4+\tau -(1+\tau \log _2(p/2+1))^2/2.3p\rfloor . \end{aligned}$$

We can still use the algorithm in this saturated regime, and estimate that one row can be erased if we divide by 2 the number of elements. Then, the complexity is \(C_2(p,w) = 2^{w-w_1+\sqrt{(1+\tau \log _2(p/2+1))^2 + 2.3p(w_1-\tau )}}\).

Multiple Steps at Once. The complexity we got at the previous step does not have any constraint. It can however become irrelevant, as we have a better approximation if w is big enough, as it is exponential in w. Indeed, we can consider Kuperberg‘s original algorithm model and estimate that we can erase \(pw'\) zeros and get E elements at a cost of \(2^{\sqrt{2\log _2(3)pw'+\log _2(E)^2}}\), as the combinations we are doing here are more favorable than the ones done with a cyclic group. This approximation will become relevant when \(2^{\sqrt{2\log _2(3)p+\log _2(C_2(p,w-1))^2}} \le C_2(p,w)\), which implies

$$\begin{aligned} w\ge w_2 = \lfloor \log _2(3)p-1/2+w_1 -\sqrt{(1+\tau \log _2(p/2+1))^2 + 2.3p(w_1-\tau )}\rfloor . \end{aligned}$$

The total complexity is then \(C_3(p,w) = 2^{\sqrt{2\log _2(3)p(w-w_2) + \left( \log _2(C_2(p,w_2)\right) ^2}}.\)

Quantum Memory Cost. The quantum memory cost is the same than in the original algorithm, of one qubit per query, plus the oracle cost. Also, as the original algorithm, we only have at most pairs of entangled qubits at a time in the combinations.

Classical Cost. The classical cost for Algorithm 3 is the same as the original algorithm, as the only change to choose the combinations is the computation of a sum and a difference of labels. The other difference is that we take a basis before looking for combinations. This can be done in time linear in the size of the basis, hence it is negligible.

Algorithm 4 is slightly less memory-consuming, as it can look for combinations on-the-fly, hence performing classically in O(N), with N queries (Table 3).

Table 3. Threshold points for Algorithm 5.
Table 4. Complexity of Algorithm 5.

Simon Meets Kuperberg. From Table 4 we can see how Simon’s complexity is met in the extreme case where \(w=1\) and Kuperberg’s complexity is obtained when \(p=1\), as expected. It also shows that even if asymptotically in w, the complexity becomes closer to the complexity of Kuperberg’s algorithm in , the last \(w_2\) rows of bits of the state do not provide as much security.

4.3 Simulations of the Algorithm

We have performed various simulations of the algorithm, in order to confirm our models and theoretical complexities. For \(w = 1\), the obtained complexity corresponds to solving an equation system, hence it needs around p queries, and our model holds. For \(p = 1\), the complexity is reduced to \(2^{\sqrt{2\log _2(3)p}}\), which corresponds to our previous simulations. We’ve considered two types of simulations in order to confirm the model of complexity of our algorithm. First, as before, we simulated the success probability of the algorithm for a given input size. Second, we simulated the number of elements at each step of the algorithm, in order to see more precisely the accuracy of each model.

Table 5. Simulations compared with our model, with a success probability of \(90\%\), 1000 tests per estimation, in log scale, for \(pw = 100\) and 128.

From Table 5, we see that our estimates correspond to the simulations in the ranges we were able to simulate, with a slightly pessimistic estimation when p is not too small and w is bigger than p. In order to estimate the accuracy of our different models, we also simulated the number of elements in each pool at the beginning of each step, as for example in Fig. 3.

Fig. 3.
figure 3

Comparison between our model and our simulations for the number of elements in each pool at each step for \(p, w = 8,16\), in \(\log \) scale.

The computed thresholds for Fig. 3 are (2,3,7). As they are in reverse, they correspond to (13,12,8) on the graph. The two curves are converging at around step 9, which suggests that our models 3 and 4 are slightly pessimistic. This is explained by the fact that model 3 neglects the gains of good combinations and model 4 the gains of bad combinations.

5 Cryptanalysis of Poly1305 in the Superposition Model

We propose in this section the first quantum superposition attack on the Poly1305 primitive, with a complexity of about \(2^{38}\) in time and queries, that shows that it is not secure in the superposition model.

5.1 Poly1305 Description

Poly1305 is a MAC designed by Bernstein [6]. It has been standardized for TLS 1.2 [36], is currently a part of a recommended cipher suite in the TLS 1.3 draft [1], and is notably supported by OpenSSH, Firefox and Chrome. The designer announced in [8] a classical and quantum security of Poly1305 of 128 bits. We’ll describe Poly1305-AES, but our analysis works with any internal block cipher used.

Poly1305-AES uses two 128-bit keys (rk) and a 128-bit nonce n, takes as input a variable-length message m considered as an array of 128-bit blocks, and outputs a 128-bit tag. For efficiency purposes, some bits of r are fixed to 0, which means it can only take \(2^{106}\) different values. The function is

$$\begin{aligned} \text {Poly1305-AES}_{(r,k,n)}((m_i)_{i\le q}) \!=\! \left( \sum _{i = 1}^q (m_{q-i+1} + 2^{128})r^i\mod 2^{130}-5\right) + \text {AES}_k(n). \end{aligned}$$

5.2 Quantum Attack in the Superposition Setting

For our quantum attack, we consider having access to the oracle

The nonce is classical, and changes at each query. As we consider the superposition scenario, we consider that the function can be called in superposition. We aim at retrieving r (and not k), as r is sufficient to retrieve \(\text {AES}_k(n)\) for any tag, which allows some forgeries. If one also wants k, one can perform a Grover search on it, with an additional cost of \(2^{64}\). In the long version of this paper [12] we describe a distinguisher on Poly1305 and a simple key-recovery attack, but in this section we propose a more evolved attack that uses Kuperberg’s algorithm.

Poly1305 uses a polynomial structure for hashing, and the commutative algebra contains many possible shift structures, both in (with addition) and in (with multiplication). For example, one can consider the two functions \(f(x) = xr + r^2 + 2^{128}(r+r^2)\) and \(g(x) = xr + 2^{128}(r+r^2)\), which satisfies \(f(x) = g(x + r)\). We cannot call them directly, but we can call \(F(x) = \text {Poly1305-AES}_{(r,k,n)}(1,x)\) and \(G(x) = \text {Poly1305-AES}_{(r,k,n)}(0,x)\), which also satisfy \(F(x) = G(x+r)\) if the nonce is the same.

There are two issues that do not allow the direct application of Kuperberg’s algorithm: first, the nonce changes at each query, which means that in order to have \(F(x) = G(x + r)\), we must compute F and G in only one query to Poly1305. This is feasible, as both are of the form \(\text {Poly1305}(a(x))\), with a(x) a function of x: one can compute \(a_F(x) = (1,x)\) and \(a_G(x) = (0,x)\) in superposition in an auxiliary register, and then call \(Poly_n\) on it. Second, and more annoyingly, the inputs of Poly1305 are restrained to be between 0 and \(2^{128}-1\), which means we cannot sample all group elements.

This can still be solved by using Lemma 4, as we can query \([0;2^{128})\). Solving the hidden shift in costs around \(2^{20}\). We can thus set the interval size at \(2^{106}\). r can be retrieved if it is below \(2^{127}\). This is the case, as the bit constraints on r implies \(r < 2^{124}\), which means we need only to test \(2^{18}\) intervals. The total cost is then \(2^{20}\times 2^{18} = 2^{38}\), for a success probability better than one half. We can check if the found r is the right one by trying to forge some valid messages, or we can use the distinguisher presented in Appendix A of [12].

Quantum Memory Use. This attack needs around \(2^{20}\) qubits, plus the cost of the quantum oracle, which would need at least a few hundred qubits.

Grover Acceleration. As the previous attack involves an exhaustive search on the correct interval among the \(2^{18}\), one might want to use Grover’s algorithm, in order to gain up to \(2^9\) on the attack. We automatically lose a factor 2 because of the uncomputation of the algorithm. Moreover, we would need to compute all the qubit choices quantumly, and we must have a success probability of the inner function very close to one. All these factors make the attack more efficient in queries (around \(2^{31}\)), with a small time gain.

5.3 Impact of Our Improvements

The total cost of the attack is highly dependent of the precise cost of Kuperberg’s algorithm. The original algorithm, with an estimated complexity of around \(n\sqrt{n}2^{\sqrt{2\log _3(2^n)}}\), has here a cost of around \(2^{31}\) queries. The total attack is then more costly, around \(2^{59}\), which is very close to the cost of a simple exhaustive search on the key if AES-128 is used, and exceeds the cost of the simple quantum attack described in [12]. We could also use a Grover search, which would lead to a cost estimate of around \(2^{45}\), which is higher than both the non-Grover variant of our attack (at \(2^{38}\)) and the Grover variant (at \(2^{31}\)).

6 Attack on the FX Construction

The FX construction, proposed by Killian and Rogaway [31], is a simple way to extend the key-length of a block cipher. It uses a block cipher \(E_{k_0}\) and two additional keys \(k_1, k_2\) whose length is the block size of the block cipher, and the new cipher is

$$\begin{aligned} \text {FX}_{k_0,k_1,k_2}(x) = E_{k_0}(x\oplus k_1)\oplus k_2. \end{aligned}$$

We can see it as an Even-Mansour construction, with a block cipher taking the role of the random permutation. The quantum security of this scheme has been studied by Leander and May in [37] in the superposition model. Their conclusion is that this construction is essentially as secure as the inner cipher \(E_{k_0}\).

Their approach is close to the quantum attack against the Even-Mansour construction, with the addition that the key of the inner cipher has to be sought. They consider the function \(f(k,x) = \text {FX}_{k_0,k_1,k_2}(x)\oplus E_k(x)\), which fulfills the promise \(f(k_0, x) = f(k_0,x\oplus k_1)\). They then recover \(k_0\) and \(k_1\) by performing a Grover search on \(k_0\), with a test function that is the application of Simon’s algorithm to the partial function \(x \mapsto f(k_0,x)\). If this function is periodic, then \(k_0\) has a very high probability of being correct, and the period of the function is \(k_1\). It can moreover be efficiently checked, by testing the periodicity for a few values. Once \(k_0\) and \(k_1\) are known, \(k_2\) can be retrieved with one known plaintext/ciphertext pair. The total cost is around \(2|k_1|2^{|k_0|/2}\).

This leads to some efficient attacks against the FX-based primitives DESX, PRINCE [14] and PRIDE [3]. For PRINCE and PRIDE, \(|k_0| = |k_1| = 64\), the attack costs around \(2^{39}\) queries and time, whereas for DESX, \(|k_0| = 56\) and \(|k_1| = 64\), the attack costs around \(2^{35}\) queries and timeFootnote 5.

The authors only considered the original construction, that uses some xors, and left as an open problem the evaluation of the security using another group law. We can here give an answer for the most natural variant, which is to use modular additions instead of xors, with the cipher

$$\begin{aligned} \text {FX+}_{k_0,k_1,k_2}(x) = E_{k_0}(x+k_1) + k_2. \end{aligned}$$

The function is no longer periodic in this situation, but we can find a hidden shift problem with the two functions \(f(k,x) = \text {FX+}_{k_0,k_1,k_2}(x)+E_k(-x)\) and \(g(k,x) = \text {FX+}_{k_0,k_1,k_2}(-x)+E_k(x)\), which fulfills the promise \(f(k_0,x) = g(k_0,x+k_1)\). These two function can efficiently be computed in superposition, for a total cost of one query and one encryption. The attack consists then in a Grover search that uses Kuperberg’s algorithm as a test function. The Grover search needs the same number of iterations (\(2^{|k_0|/2}\)), but Kuperberg’s algorithm needs around \(2^{1.8\sqrt{|k_1|}}\) samples. The total cost is around \(2^{|k_0|/2+1.8\sqrt{|k_1|}}\times 2\) queries (we can factor the query to f and g to only one query to FX+, and we double to uncompute Kuperberg’s algorithm).

Other Group Laws. If the group is abelian, the attack can be straightforwardly applied. If the group law is not abelian, we need a slightly different approach which is developed in the extended version of this paper [12].

Quantum Attack on PRINCE+ and PRIDE+. We can directly attack a variant of PRINCE and PRIDE where the key whitening is done through a modular addition. Concretely, we can attack them in around \(2^{47.4}\) queries and time, which is smaller than the ideal 64-bits of quantum security. We also attack DESX+ in \(2^{43.4}\) queries and time.

7 Concrete Proposals

The most interesting idea from [2] for preventing Simon-based attacks is using modular additions, which is already common in symmetric primitives (see for instance [26, 40, 49]). Based on the complexities of the new algorithms and attacks from the previous sections, we can now correctly size some of the primitives that were broken using Simon-based algorithms, now patched to use modular additions, in order to provide a certain desired post-quantum security.

Let us point out that we used a slightly unconventional definition of the security: we consider a cipher to provide a security of Q bits when no attack of complexity lower than \(2^{Q}\) exists (the more conventional definition being when no attack better than the generic exhaustive search is known, whose complexity usually is \(2^{Q}=2^{k/2}\)).

7.1 Concrete Parameters and Security of Some Generic Constructions

If we consider the generic Even-Mansour constructions, with a xor, it will provide a security of 8 or 9 bits for an state size of 128 or 256 bits respectively. When using one or several modular additions, this security is augmented, becoming 20 or 28.5 bits for states of 128 or 256 bits respectively, but all the constructions are far from the ideal \(2^{64}\) security offered by an ideal cipher with a 128-bit key, and even more from the \(2^{128}\) offered quantumly by a classical primitive with a 256-bit key. To the best of our knowledge, the quantum security offered by the AES [19] meets these ideal claims.

In Table 6 we show the needed security parameters of some popular constructions in order to resist their corresponding attacks when using Kuperberg’s algorithm. As expected, p modular additions of words of size w provide less security than one modular addition of the state size. We can see that, in all the cases, the size of the state needed to achieve a certain security becomes much bigger than for common symmetric primitives (128 bits for instance), needing to be bigger than 5200 bits in some cases. The problem of a bigger state is not limited to implementation issues: designing a secure permutation for such a big state would be a very challenging task. We considered the needed size when using non-abelian groups in the extended version [12].

Table 6. Examples of parameters for 128-bit security when using modular additions instead of \(\oplus \). E-M stands for Even-Mansour and O-M for operation modes.

8 Conclusion

Modular Additions Are Not Enough. We have shown that the proposal from [2], even though it is interesting and could provide any wanted security with the right (big) parameters, does not seem practical. Indeed, using modular additionsFootnote 6 instead of xors, in most constructions vulnerable to Simon’s attacks, would increase the complexity of such attacks when using Kuperberg’s algorithm, and therefore also the security, but it would need a much larger internal state to provide reasonable security, far beyond the size of typical classical symmetric constructions providing an equivalent degree of security. For instance, a key-alternating cipher with modular addition that provides a 128-bit security would need around 5200 bits of internal state and key size, to be compared with the 128 bits of internal state and 256 key-bits of AES-256. Beyond the obvious efficiency drawback, the design of a correspondingly large secure permutation would be a very challenging task. Intuitively, we can see how more bits need to be mixed, which will imperatively imply more rounds and bigger transformations (also slower to mix).

That is why, regarding constructions that are vulnerable to Simon’s or Kuperberg’s attacks, the substitution of xors by modular additions seems hazardous, as can be seen in the previous section, and we rather recommend the use of some of the resistant constructions (many exist).

Kuperberg’s Algorithm Simulation and Verification. We have been able to study, improve and simulate Kuperberg’s algorithm: the concrete complexity of our tweaked version is \(2^{1.8\sqrt{n}}\), which is small enough for a practical use on typical parameters of n (we have therefore implemented the part that could be simulated and verified this). We also have presented a way to solve the hidden shift problem in various situations (which extends to non-abelian groups), and provided an estimate of the complexity.

New Algorithm Representing Simon-Meets-Kuperberg. We provided a new efficient algorithm that solves the problem when considering parallel modular additions. We have simulated the algorithm and verified that our estimated complexity is met in practice. As in the case of single modular additions, though the security is increased with respect to the same constructions when using xors, the size state is equally increased, and the tweak does not seem more promising.

Cryptanalysis of FX Variants and Poly1305. This paper proposes some new quantum attacks, mainly using our generalized and improved Kuperberg’s algorithm, that provide an important speed-up with respect to Grover’s quantum generic exhaustive search attack.

Further Applications. Hidden shift algorithms can be applied in other cryptographic fields. They have in particular been successfully applied to ordinary isogenies [18] and are relevant to assess the security of some proposed post-quantum asymetric schemes, such as CSIDH [16]. Indeed, in [13] one of the first concrete quantum cryptanalysis of post-quantum primitives is proposed, suggesting an important re-dimension of parameters in order to propose the wanted security. This result builds upon our work and simulations.

Open Problems

  1. (1)

    Prove a tighter bound on the complexity of Kuperberg’s algorithm, or study the success probability.

  2. (2)

    Study combinations and alternance of xor and modular additions to increase the complexity of the attacks or counter them.

  3. (3)

    Find more attacks using the new algorithm, for instance, considering ARX constructions. Also study applications to other post-quantum primitives, like lattice-based ones.