Keywords

1 Introduction

We start off with a simple, clearly undesirable property of a block cipher and generalize it; suppose there is an n-bit block cipher which allows, for a particular known or chosen key, to determine a plaintext, such that the plaintext is the same as the ciphertext. For a good block cipher, accomplishing this should be very unlikely with much less than \(2^n\) trials. It would, for example, allow preimage attacks in fully preimage-secure compression function constructions that use this block cipher.

Now, consider an n-bit block cipher where the key is known or chosen by the attacker and let us focus on a single bit at position i of the plaintext \(p_i\) and ciphertext \(c_i\) in this setting. We would expect that the equation \(p_i=c_i\) holds in exactly half the cases. In fact, any statistically significant deviation from this expectation can be interpreted as a sign of non-randomness in the cipher.

Such an attack would be in the so-called key-less model, which covers both the known-key and chosen-key models, and is hence of relevance if the cipher is used as part of a hash function construction. More generally, it allows to make meaningful statements and differentiate between ciphers beyond what is possible in other models. Should we consider such a cipher as a good building block for a compression function? Not if there would be an alternative cipher with similar implementation characteristics that does not allow for such a distinguisher!

1.1 Contributions

We discuss the two types of contributions in this paper. One is of a more conceptual/modeling nature, while the other is a concrete cryptanalytic application of the former.

A New Way of Formulating Key-less Distinguishers. The property described in the beginning resembles properties used in linear cryptanalysis to recover secret keys. The problem with the above line of reasoning was that so far there did not exist a meaningful model to properly express the setting. By this, we mean a model which has a proper characterization of the power of generic attackers and a clear distinction as to when a dedicated attack in fact can be considered a valid distinguisher, i.e. outperforms generic attackers. In this paper, after starting off by giving notation and preliminary notions of block ciphers and linear cryptanalysis in Sect. 2, we put in Sect. 3 the above very informal description of a possible demonstration of non-randomness on more rigorous grounds.

The usual requirement for a distinguisher to be valid is, that one must compare the cost of satisfying a specific property, which varies from case to case, for a concrete permutation \(\pi \), with achieving the same property for an ideal permutation. In our model, we expand on this by posing the problem of determining for a concrete permutation \(\pi \): (i) a linear relation over \(\pi \) in the form of an input/output mask and (ii) a set of inputs to \(\pi \), such that the number of inputs satisfying the linear relation is expected to deviate from what one expects of an ideal permutation, by a significant amount. A property which should not be attainable for an ideal primitive.

Our proposed key-less linear distinguisher model captures the possibility of distinguishing a cipher using any previous linear cryptanalysis, in the sense that the attacker needs only a linear hull and the probability distribution on the absolute correlation, to perform his analysis. To amplify the distinguisher to either cover more rounds or to need less computation, approaches inspired by message modification [42] and rebound attacks [28, 35] are used.

Application to PRESENT. We can find concrete results in the new model in round-reduced versions of the leading lightweight-cipher PRESENT [10] (used in compression function designs advocated e.g. in [11]). In Sect. 4 we describe the relevant aspects of the PRESENT block cipher and give results on linear hulls and keys pertaining to it. Section 5 details the application of the key-less linear distinguisher to PRESENT. We fix a bit position i, devise an algorithm for determining up to \(2^{61.97}\) key-dependent plaintexts in a very efficient manner, and study the expected number of plaintext and ciphertext pairs where \(p_i=c_i\). What we claim to be able to find is a deviation from the expectation that the equation \(p_i=c_i\) is fulfilled with probability \(\frac{1}{2}\). Depending on the size of the allowable key-set, this will work for up to 27 rounds of PRESENT. Detailed results are summarized in Table 4, before our conclusions and a discussion of open problems in Sect. 6. We confirm the results with experimental verifications (see Appendix C and [29]).

1.2 Related Work

Linear cryptanalysis, a technique to recover keys in ciphers, was pioneered by Matsui from 1992 on [32, 34], with extensions or variants such as multiple linear approximations [5, 20], linear hulls [38], multidimensional variants [16], zero-correlations [12] and considerations of a general statistical framework [3, 30, 37].

The application of linear cryptanalysis to key-less constructions, i.e. in models where the key is either known or chosen by the attacker, is largely an open problem. Sometimes, designs are evaluated with respect to standard linear cryptanalysis [2, 31]. Some designers of SHA-3 candidates state properties with respect to this class of attacks (such as linear probability) without ever mentioning specific models. The reason is that there simply was no model, a situation that we address in this paper.

In all cases of linear cryptanalysis applied in a key-less setting, the analysis done is exactly the same as in a setting with a secret key: a linear approximation with a non-zero correlation is presented. The only known exception to us is a linear analysis of Cubehash by Ashur and Dunkelman [2]. There, an 11-round linear approximation with bias \(2^{-235}\) is used to describe a standard distinguisher with \(2^{470}\) queries. Then, inspired by a chosen-plaintext variant of linear cryptanalysis of DES by Knudsen and Mathiassen [23], the authors fix 80 bits of the plaintext input of modular additions, thereby gaining the first round for free, arriving at a 12-round result with a complexity below \(2^{512}\). This can be seen as a predecessor to our deterministic technique of Sect. 5.2.

The only analysis of PRESENT in a setting without secret keys we know of is by Koyama, Sasaki, and Kunihiro [25]. In their work, differential chosen-key distinguishers (a setting that gives the attacker more freedom than in our known-key model) for up to 18 rounds are obtained.

At its core is a differential rebound attack with an inbound phase of 5 rounds that needs 100 degrees of freedomFootnote 1. In the method we propose, we allow the key to be fixed arbitrarily, and out of the remaining 64 degrees of freedom from the plaintext input more than 61 degrees of freedom remain. Hence our results, that cover more rounds, and use our deterministic phase over 3 rounds that needs only 3 degrees of freedom, compare favorably to this result.

2 Preliminaries

In this section we introduce our notation, give basic definitions and recall known properties related to our analysis throughout the paper.

Notation. For an n-bit block cipher with key space \({\mathcal {K}}\), let \(E : {\mathbb {F}}_2^n \times {\mathcal {K}}\rightarrow {\mathbb {F}}_2^n\) and \(D : {\mathbb {F}}_2^n \times {\mathcal {K}}\rightarrow {\mathbb {F}}_2^n\) denote encryption and decryption functions, respectively. For convenience, we also use the notation that \(E_K(x) := E(x,K)\) and \(D_K(c) := D(c,K)\). We use \(\sharp X\) to denote the size of a set X. For a real number w, |w| denotes the absolute value of w. We let \({\mathsf{Perm}}(n)\) denote the set of all permutations on n-bit inputs and we let \(x \xleftarrow {\$} X\) denote the assignment of x by an element of X chosen uniformly at random. We use \(\mathcal{N}(\mu ,\sigma ^2)\) and \(\mathcal{B}(n,p)\) to denote the normal- and binomial distributions respectively. For a distribution D we use \({\varPhi }(D,x)\) to denote the cumulative distribution function of D at point x. We use the notation that \(\mathbf {e}_i\) is a binary string with a 1 in position i and zeroes elsewhere.

In this paper, when we talk about the key-less setting, we implicitly mean adversarial assumptions where the key \(K \in {\mathcal {K}}\) is either known or chosen by the attacker.

Trails and Hulls. In the following, let \(F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\) be an iterated function of the form \(F = F_R \circ \cdots \circ F_1\). We borrow to a large extent the notation from Leander’s treatment on linear cryptanalysis [30]. We define a mask as a vector \(\alpha \in {\mathbb {F}}_2^n\). For two masks \(\alpha ,\beta \), we denote by \(\langle \alpha ,\beta \rangle \) the inner product of the two masks:

$$\begin{aligned} \langle (\alpha _0,\ldots ,\alpha _{n-1}),(\beta _0,\ldots ,\beta _{n-1}) \rangle := \bigoplus _{i=0}^{n-1} \alpha _i \beta _i. \end{aligned}$$

We define an R-round trail as an element \((\delta ,\alpha _1,\ldots ,\alpha _{R-1},\gamma ) \in ({\mathbb {F}}_2^n)^{R+1}\), where \(\delta \) and \(\gamma \) are the input and output masks, respectively. The \(\alpha _i\) are called the intermediate masks. For a randomly chosen \(x \in {\mathbb {F}}_2^n\), and for \(i=1,\ldots ,R\) (letting \(\alpha _0=\delta \) and \(\alpha _R = \gamma \)), we have

$$\begin{aligned} \Pr \left[ \langle x,\alpha _{i-1} \rangle = \langle F_i(x),\alpha _{i} \rangle \right] = \frac{1}{2} + \frac{{\mathbf {C}}_{F_i}(\alpha _{i-1},\alpha _i)}{2}, \end{aligned}$$

where \({\mathbf {C}}_{F_i}(\alpha _{i-1},\alpha _i)\) is the correlation over \(F_i\). The trail correlation over F is defined in terms of the \({\mathbf {C}}_{F_i}\) as

$$\begin{aligned} {\mathbf {C}}_F(\delta ,\alpha _1,\ldots ,\alpha _{R-1},\gamma ) = {\mathbf {C}}_{F_1}(\delta ,\alpha _1) \left( \prod _{i=2}^{R-1} {\mathbf {C}}_{F_i}(\alpha _{i-1},\alpha _i) \right) {\mathbf {C}}_{F_R}(\alpha _{R-1},\gamma ). \end{aligned}$$
(1)

We say that a trail is valid if and only if each constituent correlation of (1) is non-zero.

We define an R-round linear hull \(\text {LH}_R(\delta ,\gamma )\) as the union of all valid linear trails with input mask \(\delta \) and output mask \(\gamma \). As such, we use the notation that \(t \in \text {LH}_R(\delta ,\gamma )\) for an R-round trail t. Note that a linear hull \(\text {LH}_R(\delta ,\gamma )\) defines an R-round linear relation between x and F(x), which we denote \(\mathcal{R}_{\delta ,\gamma }^F : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2\), where

$$\begin{aligned} \mathcal{R}_{\delta ,\gamma }^F(x) = {\left\{ \begin{array}{ll} 1 &{} , \langle x,\delta \rangle = \langle F(x),\gamma \rangle \\ 0 &{} , \langle x,\delta \rangle \ne \langle F(x),\gamma \rangle \end{array}\right. }. \end{aligned}$$

When \(\mathcal{R}_{\delta ,\gamma }^F(x) = 1\) we say the relation is satisfied for input x and otherwise it is not. The linear hull correlation [17, Theorem 7.8.1] is given by

$$\begin{aligned} {\mathbf {C}}_F(\text {LH}_R(\delta ,\gamma ))&= \sum _{t \in \text {LH}_R(\delta ,\gamma )} {\mathbf {C}}_F(t) \\ = \sum _{t \in \text {LH}_R(\delta ,\gamma )} (-1)^{ sgn (t)} \cdot |{\mathbf {C}}_F(t)|, \quad sgn (t) =&{\left\{ \begin{array}{ll} 0 &{} , {\mathbf {C}}_F(t) \ge 0 \\ 1 &{} , {\mathbf {C}}_F(t) < 0 \end{array}\right. }. \end{aligned}$$

When the trail or hull is understood, we write \({\mathbf {C}}_F\) for simplicity to mean the correlation of the trail or hull over F. For a block cipher, the value of \( sgn (t)\) for \(t \in \text {LH}_R(\delta ,\gamma )\) depends on the secret key \(K \in {\mathcal {K}}\), and hence the value of \(|{\mathbf {C}}_F(\text {LH}_R(\delta ,\gamma ))|\) depends on the difference between the number of trails with \( sgn (t) = 1\) and those with \( sgn (t) = 0\). In this paper, we use the following assumption.

Assumption 1

For any fixed key \(K \in {\mathcal {K}}\), we assume that for any two trails \(t,t' \in \text {LH}_R(\delta ,\gamma )\), where \(t \ne t'\), the signs \( sgn (t)\) and \( sgn (t')\) are independent Bernoulli random variables with \(p=\frac{1}{2}\).

We note that Assumption 1 has been experimentally verified for PRESENT, see e.g. [13, 30].

For readers familiar with differential-type attacks in the known-key setting, we offer the following loose analogy. We say that \(x \in {\mathbb {F}}_2^n\) follows an R-round trail over F if and only if

$$\begin{aligned} \langle x,\delta \rangle = \langle F_{1}(x), \alpha _1 \rangle = \cdots = \langle (F_{{R-1}} \circ \cdots \circ F_{1})(x),\alpha _{R-1} \rangle = \langle F(x),\gamma \rangle . \end{aligned}$$

This notion will be used in Sect. 5, when we describe how to use a technique similar to message modification, to extend a presented distinguisher in the key-less setting.

3 Key-less Linear Distinguishers for Block Ciphers

Even though block ciphers have used already for a very long time, either implicitly or explicitly, to construct hash functions, a separate study of the security of block ciphers where the key is either known or under control of the adversary, has started only recently. Knudsen and Rijmen proposed so-called known-key distinguishers [24]. Later Biryukov et al. [8] and Lamberger et al. [27] proposed open- or chosen-key models to evaluate the security of block ciphers.

Even though these models often exhibit a rather contrived looking property, and evade a formally rigorous definitionFootnote 2 (a property they share with collision attacks), cryptanalysts largely agree that these distinguishers are useful and interesting. Indeed, techniques developed to improve the original known-key distinguishers from [24], such as the rebound attack later led to collision attacks on various hash functions [21, 27, 36]. Also, the findings in the open-key model from [8] were later used to find the first related-key key-recovery attacks on AES-256 and AES-192 [6, 7].

3.1 Motivation for Our Distinguisher

Sometimes distinguisher descriptions are merely motivated by the fact that they can be formulated, as e.g. the 7-round known-key distinguisher on AES from [24], where byte-level zero-sums are used as a distinguishing property. Another example is the rotational rebound attack on reduced Skein [22], where the existence of “rotational collisions with errors” is defined as a distinguishing property. Sometimes, however, they are better motivated, e.g. by the construction of near-collisions, or the subspace- and limited-birthday distinguishers [19, 27, 28] that resemble some generalization of the concept of near-collisions.

The distinguisher we propose below comes with a new motivation that stems from preimage attacks on hash functions or compression functionsFootnote 3. As an example, consider the compression function construction using a single call to a block cipher in Matyas-Meyer-Oseas mode. The ith message block \(m_i\) is compressed by using it as the plaintext input when computing the next chaining value \(H_{i+1}\) using \(H_i\) as the cipher key, i.e. \(H_{i+1} = E_{H_i}(m_i) \oplus m_i\). If an attacker can determine a relation stating that the jth bit of \(m_i\) equals the jth bit of \(E_{H_i}(m_i)\) with a high probability, then it is likely that the jth bit of \(H_{i+1}\) equals zero. In a preimage attack, if the target preimage is zero at position j, this then leads to an advantage over brute-force search.

Motivated by this example, we proceed with our new key-less linear distinguisher model for block ciphers that we will use throughout the paper.

3.2 The Key-less Linear Distinguisher Model

In the following, we give our definition of key-less linear distinguishers. Essentially, the model captures the possibility of distinguishing any block cipher in the key-less setting, given that a linear relation (in the form of a linear hull) of sufficiently high absolute correlation for a reasonable fraction of the key space \({\mathcal {K}}\), is available. The notions of Definitions 1 and 2 are largely inspired by the recent work of Gilbert on pushing known-key attacks further on the AES [18].

The following definition of \(\alpha \) -separability formalizes how a linear relation, combined with a set of inputs for a permutation \(\pi : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\), can exhibit a significant deviation from the behavior of a random permutation.

Definition 1

( \(\alpha \) -separability). Let \({\mathcal {P}}\) be a set of permutations from \({\mathbb {F}}_2^n\) to \({\mathbb {F}}_2^n\) and let \(\pi \in {\mathcal {P}}\) denote a particular, fixed permutation from \({\mathcal {P}}\). Let \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) with size \(\mathcal{M}\) and let \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\).

Without checking each input, each \(x_i \in {\mathcal {S}}\) has an (a priori) associated probability \(p_i = \Pr \left[ \mathcal{R}_{\delta ,\gamma }^{\pi }(x_i) = 1\right] \) that the linear relation is satisfied for that particular input. Let \(\mathcal{X}= \sharp \{ x \in {\mathcal {S}}\;|\;\mathcal{R}_{\delta ,\gamma }^{\pi }(x) = 1 \}\), then \({\mathbb {E}}\left[ \mathcal{X}\right] = \sum _{i=1}^{\mathcal{M}} p_i\). We say that the tuple \(({\mathcal {P}}, \pi , {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{\pi })\) is \(\alpha \) -separable if and only

$$\begin{aligned} \Pr \left[ \left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2} \right| \ge \sqrt{\mathcal{M}}\right] \ge \alpha , \end{aligned}$$

where the probability is taken over \(\pi \in {\mathcal {P}}\).

Definition 2

( \((T,\mathcal{M},\alpha )\) -intractability). Let \(\mathcal {P}\) be a set of permutations from \({\mathbb {F}}_2^n\) to \({\mathbb {F}}_2^n\) and let \(\pi \in \mathcal {P}\) denote a particular, fixed permutation from \(\mathcal {P}\). Let \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\) and let \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\). We say that the tuple \((\mathcal {P},\pi , {\mathcal {S}},\mathcal{R}_{\delta ,\gamma }^{\pi })\) is \((T,\mathcal{M},\alpha )\) -intractable if and only if it is impossible, for any algorithm \({\mathcal {A}}\) to

  1. 1.

    Commit to a choice of \(\delta ',\gamma ' \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) and

  2. 2.

    When given access to a fixed pair \({\varPi },{\varPi }^{-1}\) with \({\varPi }\xleftarrow {\$} {\mathsf{Perm}}(n)\), construct a set \({\mathcal {S}}'\) of size \(\mathcal{M}\) in time T, s.t. the tuple \(({\mathsf{Perm}}(n),{\varPi },{\mathcal {S}}',\mathcal{R}_{\delta ',\gamma '}^{{\varPi }})\) is \(\alpha \)-separable.

Note 1

For our distinguisher model, the notion of one time unit corresponds to a single evaluation of the respective permutation.

With the definition of \(\alpha \)-separability and \((T,\mathcal{M},\alpha )\)-intractability in hand, we are ready to formulate our proposed key-less linear distinguisher.

Definition 3

(Key-less linear distinguisher). Let \(E : {\mathbb {F}}_2^n \times {\mathcal {K}}\rightarrow {\mathbb {F}}_2^n\) be a block cipher and let \(\mathcal {E}\) to denote the set of permutations due to choices of the key \(K \in {\mathcal {K}}\). Let \(E_K\) denote some fixed permutation from \(\mathcal {E}\).

Fix \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) and let \({\mathcal {A}}\) be an algorithm producing in time T a set \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\). Then the tuple \(({\mathcal {A}}, \mathcal {E}, E_K, {\mathcal {S}}, T,\mathcal{R}_{\delta ,\gamma }^{E_K}, \alpha )\) is said to be a key-less linear distinguisher if and only if \((\mathcal {E}, E_K, {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{E_K})\) is both \(\alpha \)-separable and \((T, \mathcal{M}, \alpha )\)-intractable.

Note 2

In all of the definitions above, the fixed linear masks \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) are chosen by the algorithm \({\mathcal {A}}\), but the choice must be made before the production of the input set \({\mathcal {S}}\) commences.

In the context of distinguishing a block cipher, the adversary commits to \(\delta \) and \(\gamma \) and then obtains access to \(E_K\) upon which the production of \({\mathcal {S}}\) in time T begins. The parameter \(\alpha \) directly expresses a lower bound on the fraction of the permutations \(\pi \in \mathcal {P}\) for which the key-less linear distinguisher is valid. The time T allowed to construct \({\mathcal {S}}\) is a parameter chosen by the adversary.

Analysis. In the following, we analyze and argue that the key-less linear distinguisher is meaningful. First, informally, the notion of \(\alpha \)-separability expresses that for a concrete permutation \(\pi : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n\), one can provide a linear relation which captures, for some constructed set of inputs, a significant non-random behavior in a permutation which is supposed to behave randomly. The significant part is captured by the requirement that the number of inputs satisfying the relation \(\mathcal{R}_{\delta ,\gamma }^{\pi }\) should deviate from what is expected in the ideal case by at least \(\sqrt{\mathcal{M}}\). This reflects the usual requirement in linear cryptanalysis, that the data complexity is inversely proportional to the squared correlation. Second, on top of that, Definition 2 captures the notion that for a random permutation \({\varPi }\xleftarrow {\$} {\mathsf{Perm}}(n)\), it should not be possible, in the same amount of time, to provide such a relation with a set of inputs which exhibits the same significant non-random behavior.

With respect to Definition 2, one of the components to analyzing our proposed key-less linear distinguisher is to answer the following question: What is the upper bound on the probability \(\alpha '\) that an algorithm \({\mathcal {A}}\), when given access to the fixed pair \({\varPi }\) and \({\varPi }^{-1}\), can produce in time T a set \({\mathcal {S}}' \subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\), together with a pre-determined relation \(\mathcal{R}_{\delta ,\gamma }^{\varPi }\), such that \(({\mathsf{Perm}}(n),{\varPi },{\mathcal {S}}',\mathcal{R}_{\delta ,\gamma }^{{\varPi }})\) is \(\alpha '\)-separable? Our analysis answers this question in the following, and it implicitly provides a lower bound on \(\alpha \) for when a concrete permutation \(\pi : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n \in \mathcal {P}\) (in the notation of Definitions 1 and 2) can be shown to be \((T, \mathcal{M}, \alpha )\)-intractable, for fixed T and \(\mathcal{M}\). We begin our analysis with Lemma 1.

Lemma 1

In the notation of Definition 2, let \(\delta ',\gamma ' \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) be fixed, and let then an algorithm \({\mathcal {A}}\) be given access to \({\varPi },{\varPi }^{-1}\), where \({\varPi }\xleftarrow {\$} {\mathsf{Perm}}(n)\). The optimal way for \({\mathcal {A}}\) to construct \({\mathcal {S}}' \subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\) in time T is the following:

  1. 1.

    Construct an arbitrarily chosen set \(\mathcal {Q} \subseteq {\mathbb {F}}_2^n\) of size T.

  2. 2.

    Partition \(\mathcal {Q}\) into \({\mathcal {Q}}_1 = \{ x \in \mathcal {Q} \;|\;\mathcal{R}_{\delta ',\gamma '}^{\varPi }(x) = 1 \}\) and \({\mathcal {Q}}_0 = \{ x \in \mathcal {Q} \;|\;\mathcal{R}_{\delta ',\gamma '}^{\varPi }(x) = 0 \}\) by querying \({\varPi }(x)\) for all \(x \in \mathcal {Q}\).

  3. 3.

    Set \({\mathcal {S}}'\) equal to the larger of the sets \({\mathcal {Q}}_0\) and \({\mathcal {Q}}_1\).

  4. 4.

    Fill up \({\mathcal {S}}'\) with arbitrarily chosen inputs from \({\mathbb {F}}_2^n \backslash \mathcal {Q}\) until \(\sharp {\mathcal {S}}' = \mathcal{M}\).

Proof

As \({\varPi }\xleftarrow {\$} {\mathsf{Perm}}(n)\), the particular choice of \(\delta ',\gamma ' \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) does not affect the analysis. The most information \({\mathcal {A}}\) can learn about \({\varPi }\) in time T is to obtain T pairs \((x,{\varPi }(x))\), as is done when determining \(\mathcal {Q}\) and its image under \({\varPi }\). In order to optimally shift the balance of the expected number of inputs of \({\mathcal {S}}'\) satisfying \(\mathcal{R}_{\delta ',\gamma '}^{\varPi }\) away from \(\mathcal{M}/2\), \({\mathcal {A}}\) should take the larger of \({\mathcal {Q}}_1\) and \({\mathcal {Q}}_0\) and pool it with randomly chosen inputs x for which the value of \(\mathcal{R}_{\delta ',\gamma '}^{\varPi }(x)\) is not known. \(\square \)

Continuing our analysis, assuming an algorithm \({\mathcal {A}}\) constructs \({\mathcal {S}}'\) as in Lemma 1, we determine an upper bound on the value \(\alpha '\) as a function of \(\mathcal{M}\) and T, such that the resulting tuple \(({\mathsf{Perm}}(n), {\varPi }, {\mathcal {S}}', \mathcal{R}_{\delta ',\gamma '}^{\varPi })\) is \(\alpha '\)-separable. We give this result in Theorem 1.

Theorem 1

(Generic success probability). Let \({\mathcal {A}}, {\varPi }, \delta ', \gamma ', {\mathcal {S}}'\) and T be as in Lemma 1, where \(T \le 4\sqrt{\mathcal{M}}\), and let \(\mathcal{X}:= \sharp \{ x \in {\mathcal {S}}' \;|\;\mathcal{R}_{\delta ,\gamma }^{\varPi }(x) = 1 \}\). Then

$$\begin{aligned} \Pr \left[ \left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2} \right| \ge \sqrt{\mathcal{M}}\right] = 2^{-T} \cdot \left[ \sum _{k=0}^{T-2\sqrt{\mathcal{M}}} {T \atopwithdelims ()k} + \sum _{k=2\sqrt{\mathcal{M}}}^{T} {T \atopwithdelims ()k} \right] . \end{aligned}$$

Proof

First, note that \(\sharp \mathcal{Q}_1 \sim \mathcal{B}(T,\frac{1}{2})\). We want to determine the probability that we have \(\left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2} \right| \ge \sqrt{\mathcal{M}}\). The consideration is split into two cases depending on whether or not \(\sharp \mathcal{Q}_1 \ge T/2\).

Case \(\sharp \mathcal{Q}_1 \ge T/2\). In this case, we know that at least \(\sharp \mathcal{Q}_1\) of the \(\mathcal{M}\) inputs satisfy the relation. Thus, \({\mathbb {E}}\left[ \mathcal{X}\right] = {\mathbb {E}}\left[ Z\right] + \sharp \mathcal{Q}_1\) where \(Z \sim \mathcal{B}\left( \mathcal{M}- \sharp \mathcal{Q}_1, \frac{1}{2}\right) \). Thus, \({\mathbb {E}}\left[ \mathcal{X}\right] = \frac{\mathcal{M}+ \sharp \mathcal{Q}_1}{2}\), and the requirement \(\left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2} \right| \ge \sqrt{\mathcal{M}}\) is equivalent to either \(\sharp \mathcal{Q}_1 \ge 2\sqrt{\mathcal{M}}\) or \(\sharp \mathcal{Q}_1 \le -2\sqrt{\mathcal{M}}\), the latter not being possible as \(\sharp \mathcal{Q}_1\) is non-negative.

Case \(\sharp \mathcal{Q}_1 < T/2\). In this case, we know that there are at least \(T-\sharp \mathcal{Q}_1\) of the \(\mathcal{M}\) inputs that do not satisfy the relation. Thus, \({\mathbb {E}}\left[ \mathcal{X}\right] = {\mathbb {E}}\left[ Z\right] \) where \(Z \sim \mathcal{B}\left( \mathcal{M}- T + \sharp \mathcal{Q}_1, \frac{1}{2}\right) \). Thus, \({\mathbb {E}}\left[ \mathcal{X}\right] = \frac{\mathcal{M}- T + \sharp \mathcal{Q}_1}{2}\), and the requirement \(\left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2} \right| \ge \sqrt{\mathcal{M}}\) is equivalent to either \(\sharp \mathcal{Q}_1 \ge T + 2\sqrt{\mathcal{M}}\) or \(\sharp \mathcal{Q}_1 \le T - 2\sqrt{\mathcal{M}}\), the former not being possible as \(\sharp \mathcal{Q}_1 \le T\).

In both of the cases considered, there is one event which makes the inequality \(\left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2} \right| \ge \sqrt{\mathcal{M}}\) true. The combined probability of those two events is

$$\begin{aligned}&\Pr \left[ \sharp \mathcal{Q}_1 \ge 2\sqrt{\mathcal{M}}\right] + \Pr \left[ \sharp \mathcal{Q}_1 \le T-2\sqrt{\mathcal{M}}\right] \\&\quad \!\! = 2^{-T} \cdot \left[ \sum _{k=0}^{T-2\sqrt{\mathcal{M}}} {T \atopwithdelims ()k} + \sum _{k=2\sqrt{\mathcal{M}}}^{T} {T \atopwithdelims ()k} \right] . \end{aligned}$$

From this, the result follows. \(\square \)

Note 3

The reason for the requirement \(T \le 4\sqrt{\mathcal{M}}\) in the statement of Theorem 1 arises because otherwise the two sums would overlap and add the same terms twice. The probability which is derived as a function of \(\mathcal{M}\) and T provides a lower bound on \(\alpha \) for when, in the notation of Definition 2, a tuple \((\mathcal {P}, \pi , {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{\pi })\) can be \((T,\mathcal{M},\alpha )\)-intractable.

By using the normal approximation of \(\sharp \mathcal{Q}_1\), i.e. \(\sharp \mathcal{Q}_1 \sim \mathcal{N}\left( \frac{T}{2}, \frac{T}{4}\right) \), one obtains a very precise and easily-computable approximation of the probability as

$$\begin{aligned} 1 - {\varPhi }\left( \mathcal{N}\left( \frac{T}{2}, \frac{T}{4}\right) , 2\sqrt{\mathcal{M}} \right) + {\varPhi }\left( \mathcal{N}\left( \frac{T}{2}, \frac{T}{4}\right) , T-2\sqrt{\mathcal{M}} \right) . \end{aligned}$$

Corollary 1

Let \({\mathcal {A}}\) be an algorithm which, after a choice of \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) is fixed, is given access to some permutation \(\pi : {\mathbb {F}}_2^n \rightarrow {\mathbb {F}}_2^n \in \mathcal {P}\).

When \(T < 2\sqrt{\mathcal{M}}\) and \(\mathcal {P} = {\mathsf{Perm}}(n)\), it is impossible for \({\mathcal {A}}\) to produce in time T a set \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\) s.t. the tuple \((\mathcal {P}, \pi , {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{\pi })\) is \(\alpha \)-separable for any \(\alpha > 0\).

On the other hand, when \(T \ge 4\sqrt{\mathcal{M}}\) and \(\mathcal {P} = \mathcal {E}\) (in the notation of Definition 3), then it is impossible for \({\mathcal {A}}\) to produce in time T a set \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\) s.t. the tuple \(({\mathcal {A}}, \mathcal {P}, \pi , {\mathcal {S}}, T, \mathcal{R}_{\delta ,\gamma }^{\pi },\alpha )\) is a key-less linear distinguisher for any \(\alpha > 0\).

Proof

The first result follows directly from Theorem 1 when observing that the both sums are zero when \(T < 2\sqrt{\mathcal{M}}\). The second result follows from Theorem 1 when observing that the sums equal one when \(T=4\sqrt{\mathcal{M}}\). This makes \((T,\mathcal{M},\alpha )\)-intractability impossible. \(\square \)

Note 4

The key-less linear distinguisher specified in Definition 3 does not ask to provide outputs, hence it is not ruled out to give a valid key-less linear distinguisher without pre-computation, i.e. to have \(T=0\). Indeed, one of the concrete attacks we will later show does not need any computations.

Indeed, from Corollary 1 it follows that when no pre-computation is allowed, i.e. when \(T=0\), any algorithm \({\mathcal {A}}\) producing a set \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) together with any relation \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\) for a permutation \(E_K \in \mathcal {E}\), yields a key-less linear distinguisher \(({\mathcal {A}},\mathcal {E}, E_K, {\mathcal {S}}, T, \mathcal{R}_{\delta ,\gamma }^{E_K}, \alpha )\) for some \(\alpha > 0\). Note, however, that the parameter \(\alpha \) measures how likely such a distinguisher is to work for a specific key. For example, when \(\alpha \) is very small, one might have a valid key-less linear distinguisher for many rounds, but for a tiny fraction of the key space. As such, when \(T=0\), such a key-less linear distinguisher is to be taken with a grain of salt, depending on the value \(\alpha \). In the following sections, we always provide together with our distinguishers the parameter \(\alpha \), to make clear the lower bound on the fraction of the key space for which it is valid.

Having analyzed the generic case, we move on to stating in Theorem 2 a necessary condition for when, for a particular fixed \(\pi \in \mathcal {P}\) and \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\), an algorithm \({\mathcal {A}}\) can construct \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) of size \(\mathcal{M}\) in time T, s.t. the tuple \((\mathcal {P}, \pi , {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{\pi })\) is a \(\alpha \)-separable.

Theorem 2

Let \(\pi \in \mathcal {P}\) and fix \(\delta ,\gamma \in {\mathbb {F}}_2^n \backslash \{(0,\ldots ,0)\}\). Let \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) have size \(\mathcal{M}\). Then the tuple \((\mathcal {P}, \pi , {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^\pi )\) can be \(\alpha \)-separable for \(\alpha > 0\) if and only if the absolute correction \(|{\mathbf {C}}_\pi |\) of \(\mathcal{R}_{\delta ,\gamma }^\pi \) satisfies \(|{\mathbf {C}}_\pi | \ge 2/\sqrt{\mathcal{M}}\). Furthermore, the largest \(\alpha \) for which \(\alpha \)-separability is obtained, is given by \(\alpha = \Pr \left[ |{\mathbf {C}}_\pi | \ge 2/\sqrt{\mathcal{M}}\right] \).

Proof

Let \(\mathcal{X}:= \{ x \in {\mathcal {S}}\mid \mathcal{R}_{\delta ,\gamma }^\pi (x) = 1 \}\). Then \(\mathcal{X}\sim \mathcal{B}\left( \mathcal{M}, \frac{1}{2} + \frac{{\mathbf {C}}_\pi }{2}\right) \). We have \(\alpha \)-separability if and only if \(\Pr \left[ \left| {\mathbb {E}}\left[ \mathcal{X}\right] - \frac{\mathcal{M}}{2}\right| \ge \sqrt{\mathcal{M}}\right] \ge \alpha \). Thus, we require either \({\mathbb {E}}\left[ \mathcal{X}\right] \ge \frac{\mathcal{M}}{2} + \sqrt{\mathcal{M}}\) or \({\mathbb {E}}\left[ \mathcal{X}\right] \le \frac{\mathcal{M}}{2} - \sqrt{\mathcal{M}}\). Since \({\mathbb {E}}\left[ \mathcal{X}\right] = \frac{\mathcal{M}}{2} + \mathcal{M}\cdot \frac{{\mathbf {C}}_\pi }{2}\), this happens exactly when \(|{\mathbf {C}}_\pi | \ge 2/\sqrt{\mathcal{M}}\). From this, the results follow. \(\square \)

4 The Block Cipher PRESENT, Keys and Linear Hulls

PRESENT is a 64-bit iterated block cipher [10] for use in lightweight applications such as RFID tags and wireless sensor networks. Its use in compression function designs is e.g. studied and advocated for in [11]. The key space is \({\mathcal {K}}= {\mathbb {F}}_2^\kappa \) with \(\kappa \) either 80 or 128 bits. The respective block ciphers are denoted PRESENT-80 and PRESENT-128. Both ciphers have 31 rounds. The PRESENT key-schedule (see Appendix A for details) produces 32 \(\kappa \)-bit round keys, but only the 64 most significant bits are used in the key addition of each round. We refer to these 64-bit round keys as \(K_i\) with \(i=0,\ldots ,31\).

The structure of PRESENT is a substitution-permutation network, repeating the round function

$$\begin{aligned} {R}_i(x) = P \circ S(x \oplus K_i), \end{aligned}$$

where x is the 64-bit state input to round i, S is the parallel application of sixteen identical 4-bit S-boxes and P is a fixed bitwise permutationFootnote 4. The full cipher is composed of 31 applications of the round function followed by addition of a post-whitening key, i.e.

$$\begin{aligned} E_K = ({R}_{30} \circ \cdots \circ {R}_0)(x) \oplus K_{31}. \end{aligned}$$

An illustration of a single round of PRESENT is given in Fig. 1. For the specification of the PRESENT S-box and permutation P, see Appendix A.

Fig. 1.
figure 1

Top-to-bottom illustration of a single round of PRESENT

4.1 Keys and Linear Hulls in PRESENT

One of the first thorough treatments of linear cryptanalysis on PRESENT is by Ohkuma [39]. This work defines optimal linear trails using solely masks of Hamming weight one. Furthermore, 64 optimal hulls using these trails are determined, along with the number of trails in each hull.

The absolute correlation for one of Ohkuma’s R-round optimal trails t is \(|{\mathbf {C}}_{E_K}(t)| = 2^{-2R}\). Considering a particular R-round optimal hull \(\text {LH}_R(\delta ,\gamma )\), let \(T_R^+\) (respectively \(T_R^-\)) denote the number of trails t in the hull for which \( sgn (t)=0\) (respectively \( sgn (t)=1\)). We also let \(T_R := \sharp \text {LH}_R(\delta ,\gamma )\), i.e. \(T_R = T_R^+ + T_R^-\). By Assumption 1, for a fixed key \(K \in {\mathcal {K}}\), we have \(T_R^+ \sim \mathcal{B}\left( T_R, \frac{1}{2}\right) \), which for sufficiently large \(T_R\) is well approximated by \(T_R^+ \sim \mathcal{N}\left( \frac{T_R}{2}, \frac{T_R}{4}\right) \). Let \(Z = T_R^+ - T_R^- = 2T_R^+ - T_R\). Thus, Z is normally distributed with \(\mu = 2 \cdot \frac{T_R}{2} - T_R = 0\) and \(\sigma ^2 = 2^2 \cdot \frac{T_R}{4} = T_R\), so \(Z \sim \mathcal {N}(0, T_R)\). When \(|Z| \ge N\), for some N, where \(0 \le N \le T_R\), the absolute linear hull correlation is

$$\begin{aligned} | {\mathbf {C}}_{E_K} |&\ge N \cdot 2^{-2R}. \end{aligned}$$

Thus, there is a clear trade-off between the lower bound on \(|{\mathbf {C}}_{E_K}|\) and the probability that a randomly chosen \(K \in {\mathcal {K}}\) yields such a lower bound.

For the \(T_R\) values, we refer to [39] or Table 6 in Appendix B. For a fixed number of rounds R, using the analysis above, \(T_R\) can be used directly to determine (i) a lower bound on \(|{\mathbf {C}}_{E_K}|\) and (ii) the probability that for a random \(K \in {\mathcal {K}}\), this bound is obtained. Table 1 gives, for various probabilities \(\alpha \) and number of rounds R the value \(\beta \) such that \(\alpha = \Pr \left[ |{\mathbf {C}}_{E_K}| \ge \beta \right] \). Table 7 in Appendix B gives the same data points for \(R \in \{1,\ldots ,31\}\).

Table 1. Values \(\log _2 \beta \) s.t. \(\alpha = \Pr \left[ |{\mathbf {C}}_{E_K}| \ge \beta \right] \) for R-round PRESENT

Example 1

For \(R=28\), we have \(T_{28} = 45170283840\). Thus, with probability \(\alpha = 0.30\), a randomly chosen \(K \in {\mathcal {K}}\) yields that one of Ohkuma’s optimal hulls has \(|{\mathbf {C}}_{E_K}| \ge 2^{-38.25}\).

5 Application to PRESENT

In this section we give key-less linear distinguishers on PRESENT for varying parameters; the number of rounds R; the pre-computation time T; the size \(\mathcal{M}\) of the set \({\mathcal {S}}\) produced and the lower bound \(\alpha \) on the fraction of the key space for which they are valid.

As already hinted in Sect. 4, PRESENT has received some attention in the context of key-recovery attacks, especially with respect to linear cryptanalysis [13, 15, 30, 39] on which our results build. The attack described is completely independent of the key size used, and hence also of the key-schedule.

5.1 Distinguishers with \(T=0\)

In this section we present key-less linear distinguishers on PRESENT using the model introduced in Sect. 3. We refer to approach described here as the probabilistic phase, which in Sect. 5.2 is combined with a deterministic phase to extend the distinguishers for three more rounds.

The distinguishers we present here do not use any pre-computation, i.e. in the notation of the model, we have \(T=0\). Corollary 1 implies in this case that when \(|{\mathbf {C}}_{E_K}| > 0\), the tuple produced by any algorithm \({\mathcal {A}}\) is always \((T,\mathcal{M},\alpha )\)-intractable for some \(\alpha > 0\), and hence a valid distinguisher. The results match those of distinguishers used in key-recovery attacks and are as such of limited interest. We hope the discussion below makes it easier to follow (and appreciate) the real use of the model we introduced, namely for the case described in Sect. 5.2 when we do a some, albeit very little, pre-computation.

In the following, let \(\mathcal{R}_{\delta ,\gamma }^{E_K}\) be the linear relation used, where \(\delta =\gamma = \mathbf {e}_{21}\), which is one of the optimal linear hulls for PRESENT identified by Ohkuma. Also, let \({\mathcal {A}}\) be an algorithm constructing \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) by picking \(\mathcal{M}\) arbitrary \(x \in {\mathbb {F}}_2^n\). In Table 2 we give, for various \(\mathcal{M}\) and number of rounds R, lower bounds \(\alpha \) on the fraction of the key space, s.t. \(({\mathcal {A}},\mathcal{E}, E_K, {\mathcal {S}}, T=0, \mathcal{R}_{\delta ,\gamma }^{E_K}, \alpha )\) are key-less linear distinguishers.

Table 2. Lower bounds \(\alpha \) on the fraction of the key space \({\mathcal {K}}\) susceptible to key-less linear distinguishers using \(T = 0\) and the specified parameters \(\mathcal{M}\) and number of rounds R. A dash indicates that \(\alpha < 0.00\).

Note, that the \(\alpha \) parameter from Table 2 gives immediately the probability that such an R-round key-less linear distinguisher without pre-computation for PRESENT is valid in practice, for a fixed chosen- or known key \(K \in {\mathcal {K}}\). As examples, we see that with \(\mathcal{M}= 2^{40}\), the probability of having a valid key-less linear distinguisher for 13-round PRESENT with a fixed key K is at least \(\alpha = 0.41\). Another example is a key-less linear distinguisher on 22-round PRESENT which is valid for a fraction of at least \(\alpha = 0.33\) of the key space, using \(\mathcal{M}= 2^{63}\).

5.2 Extension by Deterministic Phase

Next, we describe how one can use pre-computation to extend the key-less linear distinguishers from Sect. 5.1 to cover three more rounds with no degradation to the valid key space fraction \(\alpha \). In the notation of the model, we now have \(T > 0\), which in turn means that \((T,\mathcal{M},\alpha )\)-intractability is no longer granted for free by Corollary 1, unless below \(T < 2\sqrt{\mathcal{M}}\). In Appendix D we outline an approach for a deterministic phase over 6 rounds, reminiscent of the rebound approach [28, 35], which however has a too-high computational complexity to fit into our model.

We describe in the following the algorithm \({\mathcal {A}}\) which will construct the set of inputs \({\mathcal {S}}\). The algorithm we give will construct \({\mathcal {S}}\) such that each \(x \in {\mathcal {S}}\) is guaranteed to follow the linear trail \({\mathcal {T}}= (\mathbf {e}_{21},\mathbf {e}_{21},\mathbf {e}_{21},\mathbf {e}_{21})\) over the first three rounds. We remark that this choice of trail is not unique; several others choices are possible, this is but one example. We refer to the approach we describe as the deterministic phase.

Fig. 2.
figure 2

Construction of \({\mathcal {S}}\) for 3-round PRESENT using the trail \({\mathcal {T}}= (\mathbf {e}_{21},\mathbf {e}_{21},\mathbf {e}_{21},\mathbf {e}_{21})\)

For notation, in round \(r \in \{0,1,2\}\), let \(S_{r,j}\) denote the jth S-box (counting from right to left) and let \(K_{r,j}\) denote the jth least significant bit of the round key \(K_r\), where all indices start from zero. Consider then \(S_{2,5}\) which is highlighted in Fig. 2. By inspection, the PRESENT S-box has 10 inputs x which satisfy \(\langle x,(0,0,1,0) \rangle = \langle S(x),(0,0,1,0) \rangle \) and hence follow the trail \((\mathbf {e}_{21},\mathbf {e}_{21})\) over \({R}_{2}\), no matter what the inputs on the other S-boxes are. By adding the key bits \((K_{2,23} \Vert \cdots \Vert K_{2,20})\) to each x, we can trace those back through the permutation layer of \({R}_{1}\). For each value of \(x \oplus (K_{2,23} \Vert \cdots \Vert K_{2,20})\), we now have a particular value on output bit 1 of each of the S-boxes \(S_{1,7},\ldots ,S_{1,4}\), as indicated in Fig. 2. By the bijectivity of the S-box, it holds that for each of these S-boxes, half the inputs will give the desired output bit. However, for the S-box \(S_{1,5}\) we have the extra requirement that the input bit on position 1 should equal the output bit on position 1, and only 5 inputs have both properties. As such, we can trace each of the ten values for x back through \({R}_{1}\) and also adding the key bits \((K_{1,31} \Vert \cdots \Vert K_{1,16})\) to obtain \(10 \cdot 8^3 \cdot 5 = 25600\) inputs to \({R}_{2} \circ {R}_{1}\) which follow the trail \((\mathbf {e}_{21},\mathbf {e}_{21},\mathbf {e}_{21})\) by construction. By tracing each of these values back through \({R}_0\) the same way, and adding the full round key \(K_0\), algorithm \({\mathcal {A}}\) has a construction of the set \({\mathcal {S}}\) which consists of inputs which follow \({\mathcal {T}}\) over three rounds with probability 1. Using this approach to constructing \({\mathcal {S}}\), the size of the set can be up to \(\mathcal{M}= 25600 \cdot 8^{15} \cdot 5 = 4503599627370496000 \approx 2^{61.97}\). As such, if one should wish to use a smaller \(\mathcal{M}\) for the key-less linear distinguisher, this is also possible, simply by leaving out elements in the construction of \({\mathcal {S}}\).

Table 3. Tight values \(\alpha \) such that \((\mathcal{E}, E_K, {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{E_K})\) is \(\alpha \)-separable, where \(E_K\) is R-round PRESENT for a fixed, known \(K \in {\mathcal {K}}\) (and thus \(E_K \in \mathcal{E}\))

Consider \(E_K\) being R-round PRESENT for a particular fixed \(K \in {\mathcal {K}}\), and thus \(E_K \in \mathcal{E}\). Let \({\mathcal {A}}\) be an algorithm for constructing \({\mathcal {S}}\) using the 3-round deterministic technique described, with \(\mathcal{M}\approx 2^{61.97}\) for one of Ohkuma’s optimal linear hull relations \(\mathcal{R}_{\delta ,\gamma }^{E_K}\). Table 3 gives, for various number of rounds R, the highest possible \(\alpha \) s.t. \((\mathcal{E}, E_K, {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{E_K})\) is \(\alpha \)-separable as per Definition 1. Of course, in order for the key-less linear distinguisher \(({\mathcal {A}}, \mathcal{E}, E_K, {\mathcal {S}}, T, \mathcal{R}_{\delta ,\gamma }^{E_K}, \alpha )\) to be valid, it also has to hold that the tuple \((\mathcal{E}, E_K, {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{E_K})\) is \((T,\mathcal{M},\alpha )\)-intractable as per Definition 2, where T is the time required by \({\mathcal {A}}\) to construct the set \({\mathcal {S}}\).

In Sect. 5.3, we show that the time T required to construct \({\mathcal {S}}\) by \({\mathcal {A}}\) is equivalent to \(T = \frac{409641}{16R}\) calls to an R-round PRESENT encryption oracle. As such, we have that \(T < 2\sqrt{\mathcal{M}}\), and from Corollary 1, it follows that \((\mathcal{E}, E_K, {\mathcal {S}}, \mathcal{R}_{\delta ,\gamma }^{E_K})\) is \((T,\mathcal{M},\alpha )\)-intractable.

In Appendix C, we give examples of experimental verification of the key-less linear distinguishers presented on 9-round PRESENT. The code for this experimental verification is available as [29].

5.3 Computational Complexity T

In this section we analyze the computational complexity, i.e. the time T required by \({\mathcal {A}}\) to construct \({\mathcal {S}}\) in the deterministic phase of Sect. 5.2. In the key-less setting, the attacker has white-box access to the encryption oracle. This is what is exploited by \({\mathcal {A}}\). In order to measure the time T spent in this phase, we determine the number of S-box lookups performed by \({\mathcal {A}}\) and then compare this to the number of S-box applications for a full call to the encryption oracle.

Let us consider all S-boxes as being different for generality, as the complexity in this case will certainly upper bound the case where they are all equal. In particular, since the key is known, this allows us to consider the key addition as part of the S-boxes.

The analysis follows the construction of \({\mathcal {S}}\) by \({\mathcal {A}}\) itself, starting from \({R}_2\) and working its way up (referring again to Fig. 2). To determine the 10 inputs to \(S_{2,5}\), \({\mathcal {A}}\) performs one lookup into this S-box. For each of these 10 values, one bit is traced back to an S-box of \({R}_1\), so this adds \(10 \cdot 4\) S-box lookups. Finally, \({\mathcal {A}}\) has 25600 inputs to \({R}_1\) for which it traces one bit back to each of the 16 S-boxes of \({R}_0\), contributing by \(25600\cdot 16\) S-box lookups.

In total, the number of lookups is \(1+10\cdot 4+25600\cdot 16 = 409641\). Now, comparing to the number of S-box lookups involved with a call to an R-round PRESENT oracle, the number of lookups would be 16R, not counting key scheduling. As such, we find that the time T spent by \({\mathcal {A}}\) for constructing \({\mathcal {S}}\) is \(T = \frac{409641}{16R}\).

Memory Complexity. The memory complexity, though not a formal part of the key-less linear distinguisher model, is at a practical level. The storage of the set \({\mathcal {S}}\) can be encoded efficiently with two lists. In a first list with 4-bit entries of length 256 (128 bytes), we store all possible input values before the first subkey addition. The second list contains 25600 sets of 16 indices to the first list. Even a naïve encoding of this only needs 400kB.

5.4 Overview of Selected Distinguishers and Discussion

Here, we consider key-less linear distinguishers applying the deterministic phase combined with the probabilistic phase, using \(\mathcal{M}\le 2^{61.97}\). Let \(w_2\) and \(w_1\) denote the number of inputs to \({R}_2\) and \({R}_1\) used by \({\mathcal {A}}\) in the construction of \({\mathcal {S}}\). Then \(w_2 \le 10\) and \(w_1\) is constrained by \(w_2\) since \(w_1 \le 8^3 \cdot 5w_2\). Further, \(\mathcal{M}\le 8^{15}\cdot 5w_1\) and the time T required by \({\mathcal {A}}\) to construct \({\mathcal {S}}\) is \(T = \frac{1 + 4w_2 + 16w_1}{16R}\) for R-round PRESENT. Obviously, for a fixed target size \(\mathcal{M}\), minimizing \(w_1\) yields the lower time complexity T.

Table 4. Overview of parameters for key-less linear distinguishers on PRESENT. The entries give, for each \(\mathcal{M}\) and each total number of rounds R a pair \((\log _2 T, \log _2 (\alpha \cdot 2^{128}))\) s.t. algorithm \({\mathcal {A}}\) can construct \({\mathcal {S}}\) in time T and result in a distinguisher for at least a fraction \(\alpha \) of the key space. Here, we indicate for PRESENT-128 the number of keys supporting the distinguisher. The equivalent number for PRESENT-80 is obtained as \(\alpha \cdot 2^{80}\). A dash indicates that \(\alpha \cdot 2^{128} < 0\).

Using these simple observations, we give in Table 4 an overview of selected results for key-less linear distinguishers on R-round PRESENT. We give the size \(\mathcal{M}\) of \({\mathcal {S}}\subseteq {\mathbb {F}}_2^n\) constructed by \({\mathcal {A}}\), the time T required to do so, and the parameter \(\alpha \) (implicitly, as we give \(\alpha \cdot 2^{128}\)) for the distinguisher, i.e. the lower bound on the fraction of the key space for which the distinguisher is valid. As such, the table is representative for PRESENT-128. Numbers for PRESENT-80 can be directly determined with the same T and \(\alpha \cdot 2^{80}\). Note, however, that for 27-round PRESENT-80 using \(\mathcal{M}= 2^{61.97}\), \(\alpha \cdot 2^{80} < 0\), so one can distinguish at most 26 rounds of PRESENT-80.

What is evident from Table 4 is, that there is a clear limit to how many rounds can be distinguished using a particular \(\mathcal{M}\). This shows in the diagonal line through the table. Another observation is that for a fixed \(\mathcal{M}\), there is a clear drop in the fraction of the key space \(\alpha \) for which the distinguisher works between R and \(R+1\) rounds. For example, with \(\mathcal{M}= 2^{61}\), we see a drop from \(2^{108.5}\) keys supporting the distinguisher for 26 rounds to just \(2^{21}\) for 27 rounds. What is also apparent is that in all cases, \(T \ll 2\sqrt{\mathcal{M}}\), indeed sometimes \(T < 1\), so by Corollary 1, \((T,\mathcal{M},\alpha )\)-intractability is for granted.

One thing worth discussion is the time complexity T. This is the time, converted to equivalent calls to an R-round encryption oracle, required by the key-less linear distinguisher algorithm \({\mathcal {A}}\) to construct the set \({\mathcal {S}}\). In a scenario where one would verify the distinguisher for a concrete block cipher \(E_K\), i.e. for a particular value of K, one would need to determine the value of the random variable \(\mathcal{X}\) of Definition 1. What we denote as the verifying complexity in this case is dominated by \(\mathcal{M}\), because this is the number of inputs to the permutation that needs to be evaluated in order to determine \(\mathcal{X}\).

6 Conclusion and Open Problems

In this paper we have formalized the notion of distinguishers for block ciphers using linear cryptanalysis in the key-less setting, i.e. where the block cipher is instantiated with a single known or chosen key.

The introduced key-less statistical distinguisher based on linear cryptanalysis led to a wide variety of results on PRESENT, for example a linear distinguisher of up to 26 and 27 rounds of PRESENT-80 and PRESENT-128, with respective computational complexities of about \(2^{9}\) and \(2^{10}\), and verifying complexities of about \(2^{61}\) and \(2^{61.97}\), for both PRESENT variants. The very low computational complexity made a practical verification possible for a reduced number of rounds, but also leaves room for improvements: Is it possible to extend the deterministic phase to cover more rounds while still keeping the work factor below the allowed \(2^{30}\)?

While PRESENT was chosen because it is a relatively high profile cryptanalytic target and the fact that relatively long useful linear hulls exist, we point out that the new distinguisher model is not specifically tailored for it. KATAN, a cipher with a very different round transformation and design philosophy, exhibits linear effects as described in [14] that makes it another interesting target for an application of the techniques introduced in this paper.

More research is needed on the relations between the use of degrees of freedom and the number of rounds that can be sidestepped, e.g. in our deterministic phase. Even though there is no good theoretical understanding of this yet, the literature already contains many data points for differential properties. The linear counterpart seems different and interesting enough to warrant a separate study, see also Appendix D.

The techniques we developed for the presented distinguisher might also have applications to preimage attacks that are inspired by linear cryptanalysis, or at least to somewhat speed-up brute-force preimage search. It will be interesting to see how this approach compares to other such methods [9, 41]. Also, the approach naturally and directly applies to permutations, which become an increasingly important primitive in their own right, also due to the popularization of the Sponge [4] construction.