1 Introduction

The design and analysis of any cipher in history have to match well with the computing technologies in a specified period. Fast correlation attacks, introduced by Meier and Staffelbach in 1989 [19], are commonly regarded as classical methods in the cryptanalysis of LFSR-based stream ciphers, which were usually implemented in hardware at that time. In general, fast correlation attacks have been constantly and steadily evolving [4, 5, 11], resulting in more and more powerful decoding methods dedicated to very large linear codes in the presence of a highly noisy channel.

On the other side, with the development of computing facilities, many word-oriented stream ciphers have been proposed, e.g., SNOW 2.0, SNOW 3G [6, 8] and Sosemanuk [2], aiming to combine the merits from the thoroughly studied LFSR theory with a fast implementation in software. Due to the complex form of the reduced LFSR recursion from the extension field to \(\mathrm{GF}(2)\) (many taps and a large number of state variables), the previous bitwise fast correlation attacks do not work so well as expected in these cases. This motivates us to study the security of these word-oriented primitives against a new form of fast correlation attacks that works on some larger data unit.

Table 1. Comparison of the attacks on SNOW 2.0

Our Contributions. First, a formal framework for fast correlation attacks over extension fields is constructed, under which the theoretical predictions of the computational complexities for both the offline and online/decoding phase can be reliably derived. This gives an answer to the open problem of Meier in [18] at FSE 2011. We adapt the k-tree algorithm [24] to generate the desirable parity check equations in the pre-computation phase and propose a fast decoding algorithm for the online phase. Second, an efficient algorithm to compute the large-unit distributions of the generalized pseudo-linear functions modulo \(2^n\) (GPLFM), which includes all the previously studied relevant topics [17] in an unified framework, is proposed. This technique, serving as a basis to the first one, generalizes the algorithm in [22] and has the value in its own right. It can compute the noise distributions of the linear approximations of the GPLFM (including the addition modulo \(2^n\)) in a larger alphabet of m-bit \((m > 1)\) size when m is divisible by n with a low complexity, e.g., for \(n=32\), the 2, 4, 8, 16-bit linear approximations can be found efficiently with a slice size depending on the structure of the primitive. Last, we apply our methods to SNOW 2.0, an ISO/IEC 18033-4 standard and a benchmark stream cipher in the European eSTREAM project. We build the byte-wise linear approximation of the FSM by further generalizing the GPLFM to include the S-boxes and restore the initial state of the LFSR (thus the key) with a fast correlation attack over \(\mathrm{GF}(2^8)\). The time/memory/data/pre-computation complexities of this attack are all below \(2^{186.95}\). Then we further improve our attack by changing the linear mask from \(\mathrm{GF}(2)\) to \(\mathrm{GF}(2^8)\), which results in the significantly reduced time/memory/data/pre-computation complexities all below \(2^{164.15}\). This attack is more than \(2^{49}\) times better than the best published result at Asiacrypt 2008Footnote 1. Table 1 presents a comparison of our attack on SNOW 2.0 with the best previous ones. Our results have been verified on a small-scale version of SNOW 2.0 with 16-bit word size in experiments.

Outline. We present some preliminaries relevant to our work in Sect. 2. In Sect. 3, the framework of fast correlation attacks over extension fields is established with detailed theoretical justifications. The new algorithm to accurately and efficiently compute the large-unit distribution of the GPLFM is provided in Sect. 4. The application of our approaches to SNOW 2.0 is given in Sect. 5. The improved attack using finite field linear masks is described in Sect. 6 with the experimental results. Finally, some conclusions are made and future work is pointed out in Sect. 7.

2 Preliminaries

In this section, some notations and basic definitions are presented. Denote the set of real numbers by \(\mathbf {R}\). The binary field is denoted by \(\mathrm{GF}(2)\) and the m-dimensional extension field of \(\mathrm{GF}(2)\) is denoted by \(\mathrm{GF}(2^m)\). The modular addition is \(\boxplus \) and the usual xor operation is \(\oplus \). The inner product of two n-dimensional vectors a and b over \(\mathrm{GF}(2^m)\) is defined as \(\langle a, b\rangle =\langle (a_0,\cdots ,a_{n-1}),\) \((b_0,\cdots ,b_{n-1}) \rangle := \bigoplus ^{n-1}_{i=0}a_i b_i.\) As usual, a function \(f:\mathrm{GF}(2^n)\rightarrow \mathrm{GF}(2)\) is called a Boolean function and a function \(g=(g_1,\cdots ,g_m): \mathrm{GF}(2^n) \rightarrow \mathrm{GF}(2^m)\) with each \(g_i\) \((1\le i\le m)\) being a Boolean function is called a m-dimensional vectorial Boolean function.

Definition 1

Let X be a binary random variable, the correlation between X and zero is defined as \( c(X) = \text{ Pr }\{X=0\} - \text{ Pr }\{X=1\}\). The correlation of a Boolean function \(f: {GF}(2^n) \rightarrow GF(2)\) to zero is defined as \(c(f)=\text{ Pr }\{f(X)=0\}-\text{ Pr }\{f(X)=1\},\) where \(X \in {GF}(2^{n})\) is an uniformly distributed random variable.

Given a vectorial Boolean function \(g: \mathrm{GF}(2^n) \rightarrow \mathrm{GF}(2^m)\), define the distribution \(p_g\) of g(X) with X uniformly distributed as \(p_g(a)=\#\{X|g(X)=a\}/2^n\) for all \(a \in \mathrm{GF}(2^m)\).

Definition 2

As in [1], the Squared Euclidean Imbalance (SEI) of \(p_{\text{ g }}\) is \(\varDelta (p_{g})=2^m \sum _{a \in GF(2^m)}(p_g(a)-\frac{1}{2^m})^2\), which measures the distance between the target distribution and the uniform distribution.

SEIFootnote 2 is used to evaluate the efficiency of large-unit linear approximations in this paper. Here by large-unit, we refer to the linear approximation whose basic data unit is non-binary. The next definition introduces a powerful tool to compute the correlation of a nonlinear function and to reduce the complexity of the substitution step of a fast correlation attack [5].

Definition 3

Given a function \(f: {GF}(2^n) \rightarrow \mathbf {R}\), for \(\omega \in {GF}(2^n)\), the Walsh Transform of f at point \(\omega \) is defined as \(\hat{f}(\omega )= \sum _{x \in GF(2^n)}f(x)(-1)^{\langle \omega , x \rangle }.\)

The Walsh Transform of f can be computed efficiently with an algorithm called Fast Walsh Transform (FWT) [25] in \(n2^n\) time and \(2^n\) memory. The preparation of f takes \(2^n\) time, thus the total time complexity is \(2^n+n2^n\), which is a large improvement compared to \(2^{2n}\). The following fact [21] is used in our analysis.

Lemma 4

We consider a vectorial Boolean function \(g:{GF}(2^n) \rightarrow {GF}(2^m)\) with the probability distribution vector \(p_g\). Then \(\varDelta (p_g) = \sum _{a \in GF(2^m)}c^2(\langle a, g \rangle )\), where \(c(\langle a, g \rangle )\) is the correlation of the Boolean function \(\langle a, g \rangle \).

Lemma 4 indicates that we can derive the SEI of distribution \(p_g\) with the correlations \(c(\langle a,g \rangle )\) for \(a \in \mathrm{GF}(2^m)\). Therefore, computing the SEI of the large data unit distribution can be reduced to the problem of looking for bitwise linear approximations with non-negligible correlations.

3 Fast Correlation Attacks Over Extension Fields

In this section, we will describe a formal framework for fast correlation attacks over \(\mathrm{GF}(2^n)\), which is the first comprehensive answer to the open problem how to amount fast correlation attack over the extension fileds proposed in [3, 18]. Let us first define the notations used hereafter.

  • N is the number of available output words.

  • l is the word-length of the LFSR over \(\mathrm{GF}(2^n)\).

  • \(l'\) is the number of target words in decoding phase.

  • G is the \(l \times N\) generator matrix of a [Nl] linear code \(\mathcal {C}_1\) over \(\mathrm{GF}(2^n)\).

  • \(u_i \in \mathrm{GF}(2^n)\) is the i-th output word of the LFSR.

  • \(z_i \in \mathrm{GF}(2^n)\) is the i-th output word of the keystream generator.

  • \(e_i \in \mathrm{GF}(2^n)\) is the i-th noisy variable of a Discrete Memoryless Channel (DMC).

Fig. 1.
figure 1

Model for fast correlation attacks over \(\mathrm{GF}(2^n)\)

3.1 Model for Fast Correlation Attacks Over Extension Fields

The fast correlation attack over extension fields is also modelled as a decoding problem, i.e., the keystream segment \(\mathbf {z}=(z_1,z_2,\cdots ,z_N)\) can be seen as the transmission result of the LFSR sequence \(\mathbf {u}=(u_1,u_2,\cdots ,u_N)\) through a DMC with the noisy variables \(\mathbf {e}=(e_1,e_2,\cdots ,e_N)\), as shown in Fig. 1. From this model, we can represent the received symbols \(z_i\) as \(z_i = u_i \oplus e_i,\) where the noise variable \(e_i\) is non-uniformly distributed for \(i=1, \cdots , N\). The capacity of the DMC is \({C_{\textsc {DMC}}=\text{ log }(2^n) + \sum _{e\in GF(2^{n})}\text{ Pr }\{e_i=e\}\cdot \text{ log }(\text{ Pr }\{e_i=e\})},\) where the maximum capacity is reached when \(\text{ Pr }\{e_i=e\}= 1/2^{n}\) for all \(e \in \mathrm{GF}(2^n)\). Then the above decoding problem is converted into decoding a [Nl] linear code \(\mathcal {C}_1\) over \(\mathrm{GF}(2^n)\), where N is the code length and l is the symbol-length of information, with the code rate \(R=\text{ log }(2^{n})\cdot l/N\). Using Taylor series at order two, we achieve the following theorem, which theoretically connects the capacity of the DMC with the SEI of the noise distribution.

Theorem 5

Let \(C_{\textsc {DMC}}\) be the capacity of a DMC over \(\text{ GF }(2^n)\) and the noise variable \(e_i \in \text{ GF }(2^n)\), whose distribution is denoted by \(p_{e_i}=(Pr\{e_i=0\}, \) \(\cdots ,Pr\{e_i=2^n-1\})\). Then the theoretical relation between the capacity \(C_{\textsc {DMC}}\) and the SEI of \(p_{e_i}\), i.e., \(\varDelta (p_{e_i})\), is \(C_{\textsc {DMC}} \approx \frac{\varDelta (p_{e_i})}{2\ln (2)}\).

This theorem provides a tool for bridging the theory based on Shannon theory and that based on the SEI measure. Theorem 5 is the basis of our framework, enabling us to derive a lower bound on the keystream length required for a successful attack. Actually, a [Nl] linear code over \(\mathrm{GF}(2^n)\) can be successfully decoded only if its code rate does not exceed the capacity of the transmission channel, pioneered in [23].

Theorem 5 and Shannon Theorem are combined together in our framework to give a theoretical analysis of the new fast correlation attacks over extension fields. Under this theoretical framework, we can assure that the fast correlation attack succeeds with a high probability, i.e., \(0.5 < P_{succ} \le 1\), if \(R < C_{\textsc {DMC}}\).

3.2 General Description of Fast Correlation Attacks Over Extension Fields

Our new algorithm is extracted from the previous work in [10, 12] by addressing some important unsolved problems therein. First, the pre-computation algorithm in [10, 12] uses the straight forward method to find all the possible collisions over extension fields, whose complexity is too high to be applied in cryptanalysis. Second, in Fig. 1, only a DMC with the following properties is considered, i.e., the distribution of the noise variable \(e_i\) satisfies \(\mathrm{Pr}\{e_i=0\}=1/2^n + \delta \) and \(\mathrm{Pr}\{e_i=e\} = 1/2^n -\delta /(2^n-1),\forall e \in \mathrm{GF}(2^n), e\ne 0\), which is not the general case. Usually in the practice of correlation attacks, the distribution of noisy variable does not necessarily satisfy this condition. Third, the straightforward method is used to identify the correct key in the online phase, i.e., by evaluating parity-checks one by one for each possible codeword, which is inappropriate for cryptanalytic purposes. Last, a comprehensive theoretical justification is missing, which will assure the decoding reliability when simulations are infeasible.

Preprocessing. As in [4, 10, 12], we convert the original code \(\mathcal {C}_1\) directly derived from the primitive to a new code \(\mathcal {C}_2\), which is expected to be easier to decode by some fast decoding algorithm later devised. Precisely, let the length of the LFSR be l-word. Then we have \(\mathbf {u} = (u_1,u_2,\cdots ,u_l)\cdot G\), where \((u_1,u_2,\cdots ,u_l)\) is the initial state of the LFSR. Let \((\cdot ,\cdots ,\cdot )^{T}\) be the transpose of a vector, we rewrite the matrix G in column vectors as \(G=(\mathbf {g}_1,\mathbf {g}_2,\cdots ,\mathbf {g}_N)\), where \(\mathbf {g}_i=(g_i^1,g_i^2,\cdots ,g_i^l)^{T}\) \((1\le i\le N)\) is the i-th column vector. In order to reduce the decoding complexity, we build a new code \(\mathcal {C}_2\) with a smaller number of information symbols \(\widehat{\mathbf {u}}=(u_1,u_2,\cdots ,u_{l'})\) for a certain \(l' < l\) as follows. We first look for some k-tuple column vectors \((\mathbf {g}_{i_1},\mathbf {g}_{i_2},\cdots ,\mathbf {g}_{i_k})\) satisfying \(\mathbf {g}_{i_1} \oplus \mathbf {g}_{i_2} \oplus \cdots \oplus \mathbf {g}_{i_k} =(c_1,c_2,\cdots ,c_{l'},0,\cdots ,0)^T\). For each k-tuple, we have

$$\begin{aligned} \bigoplus _{j=1}^{k}u_{i_j}=(u_1,u_2,\cdots ,u_l) \bigoplus _{j=1}^{k}\mathbf {g}_{i_j} =c_1u_1 \oplus c_2u_2 \oplus \cdots \oplus c_{l'}u_{l'}. \end{aligned}$$
(1)

This equation is called the parity check for \(u_1, \cdots , u_{l'}\). Since \(z_i=u_i \oplus e_i\), we rewrite it as \(\bigoplus _{j=1}^{k}z_{i_j}=c_1u_1 \oplus c_2u_2 \oplus \cdots \oplus c_{l'}u_{l'} \oplus \bigoplus _{j=1}^{k}e_{i_j}.\) Collect a desirable number of such k-tuples and denote the number of such derived equations by \(m_{k}\). Denote the indices of t-th such tuple of columns by \(\{i_1^{(t)},i_2^{(t)},\cdots ,i_k^{(t)}\}\). Let \(U_t= \bigoplus _{j=1}^{k}u_{i^{(t)}_j}\), \(1 \le t \le m_k\). Thus we have constructed an \([m_k,l']\)-code \(\mathcal {C}_2\), i.e., \(\mathbf {U}=(U_1,U_2,\cdots ,\) \(U_{m_k})\).

Processing. Denote the received sequence by \(\mathbf {Z} = (Z_1,Z_2,\cdots ,Z_{m_{k}})\), where \(Z_t=\bigoplus _{j=1}^{k}z_{i^{(t)}_j}\). We first use the keystream words \(z_1,z_2,\cdots ,z_N\) to compute \(\mathbf {Z}\). Then decode the code \(\mathcal {C}_2\) using the algorithm in the following subsection and output \((u_1,u_2,\cdots ,u_{l'})\). Using the DMC model and assuming that all the \(e_i\)s are independent random values over \(\mathrm{GF}(2^n)\), it is easy to see that the distribution of the folded noisy variable \(E_t=\bigoplus _{j=1}^{k}e_{i^{(t)}_j}\) can be computed by the convolution property via FWT. The new noise sequence can be represented as \(\mathbf {E} = (E_1,E_2,\cdots ,E_{m_{k}})\).

3.3 Preprocessing Stage: Generating the Parity Checks

Now we present an algorithm to compute the desirable k-tuple parity checks with a relatively low complexity, while the straight forward method in [12] needs a complexity of \(O(N^k)\). First look at the case of \(k=2\). Eq. (1) indicates that \((g_{i_1}^{l'+1},g_{i_1}^{l'+2},\cdots ,g_{i_1}^{l})^{T} \oplus (g_{i_2}^{l'+1},g_{i_2}^{l'+2},\cdots ,g_{i_2}^{l})^{T} =(0,\cdots ,0)^{T}\). Thus the construction of parity checks is equivalent to the searching of \(n(l-l')\)-bit collision, i.e., just split \((g_i^{l'+1},g_i^{l'+2},\cdots ,g_i^{l})\) for \(i=1,\cdots ,N\) into two lists \(L_1\) and \(L_2\), and look for \(x_1 \in L_1,x_2 \in L_2\) such that \(x_1 \oplus x_2 = 0\). Hence, by searching for collisions through these two lists, 2-tuple parity checks in our attack can be constructed.

Note that the crucial difference between \(\mathrm{GF}(2^{n})\) and \(\mathrm{GF}(2)\) requires that the length of the partial collision positions cannot be arbitrary and should be a multiple of n. In general, we can split the truncated matrix columns of G into k lists and search for \(x_{i}\in L_{i}\) for \(1\le i\le k\) such that \(\bigoplus _{i=1}^{k}x_{i}=0\) holds for \(1\le i\le k\). This problem can be transformed into the well known k-sum problem.

Problem 1. (The k-sum problem) Given k lists \(L_1, \cdots , L_k\), each of length \(\alpha \) and containing elements drawn uniformly and independently at random from \(\{0,1\}^{n(l-l')}\), find \(x_1 \in L_1, \cdots , x_k \in L_k\) such that \(x_1 \oplus x_2 \oplus \cdots \oplus x_k = 0\).

Fortunately, this problem can be efficiently solved by the k-tree algorithm in [24]. It is shown that the k-tree algorithm requires \(O(k2^{n(l-l')/(1+\log k)})\) time and space and uses lists of size \(O(2^{n(l-l')/(1+\log k)})\). The k-tree algorithm can also find many solutions to the k-sum problem. It can find \(\beta ^{1+ \log k}\) solutions to the k-sum problem with \(\beta \) times as much work as finding a single solution, as long as \(\beta \le 2^{n(l-l')/(\log k (1+\log k))}\). Thus the total time/space complexities are \(O(\beta k2^{n(l-l')/(1+\log k)})\) and the size of each list is \(O(\beta 2^{n(l-l')/(1+\log k)})\).

Now we show how to generate the \(m_{k}\) k-tuple parity checks. Precisely, we denote the truncated partial vector of \(\mathbf {g}_i\) by \(\mathbf {x}_i=(g_i^{l'+1},\cdots ,g_i^l)\) for \(i=1,\cdots ,N\). Then disjoin \(({\mathbf {x}_1,\mathbf {x}_2,\cdots ,\mathbf {x}_N})\) into k lists \(L_1,\cdots ,L_k\), each of length \(\alpha = N/k\). We want to find \(\mathbf {x}_1 \in L_1,\cdots ,\mathbf {x}_k \in L_k\) satisfying \(\mathbf {x}_1 \oplus \mathbf {x}_2 \oplus \cdots \oplus \mathbf {x}_k =0\). This is exactly the same case as the k-sum problem, so we can adopt the k-tree algorithm in [24] to find the required number of desirable parity checks.

3.4 Processing Stage: Decoding the Code \(\mathcal {C}_2\)

It is well-known that decoding a random linear code over an extension field is a NP-hard problem. Here we present a fast decoding algorithm, which can be seen as a solution to this problem.

As shown in [5, 14], FWT can be used to accelerate the decoding process for the linear codes over \(\mathrm{GF}(2)\). Here we derive a method based on Lemma 4 to exploit FWT for decoding linear codes over \(\mathrm{GF}(2^{n})\).

Let us denote the guessed value of the partial initial state \(\widehat{\mathbf {u}}=(u_1,\cdots ,u_{l'})\) by \(\widehat{\mathbf {u}}' = (u'_1,\cdots ,u'_{l'})\). After pre-computation, we construct a distinguisher \(I(\widehat{\mathbf {u}}') = c^{(t)}_1(u_1 \oplus u_1') \oplus \cdots \oplus c^{(t)}_{l'}(u_{l'} \oplus u_{l'}') \oplus E_t =Z_t \oplus c^{(t)}_1u_1' \oplus \cdots \oplus c^{(t)}_{l'}u_{l'}',\) to find the correct partial state \(\widehat{\mathbf {u}}\). If the guessed value \(\widehat{\mathbf {u}}' \) is correct, I is expected to be biased; otherwise it approximates an uniform distribution.

Next, let us give a description on how to compute the SEI of \(I(\widehat{\mathbf {u}}')\), which is the crucial part of our algorithm. We need to substitute the \(z_i\)s into the parity check equations and evaluate the SEI of I for each possible \(\widehat{\mathbf {u}}' \). Combining Lemma 4 in Sect. 2.2 with FWT, we have the following method. Precisely, the SEI of \(I(\widehat{\mathbf {u}}')\) can be computed by the correlations \(c(\langle \gamma , I\rangle )\), where \(\langle \gamma , I\rangle \) is a boolean function and \(\gamma \in \mathrm{GF}(2)^n\). We can divide the vectorial boolean function I into n linearly independent boolean functions \(I_1, \cdots , I_n\) and each boolean function can be expressed as \(I_i= \langle w_i, \widehat{\mathbf {u}}' \rangle \oplus \langle v_i,Z_t\rangle \), where \(w_i \in \mathrm{GF}(2)^{nl'},v_i \in \mathrm{GF}(2)^n\) are two binary coefficient vectors. Let \(Q=\text{ span }\{I_1,\cdots ,I_n\}\) such that Q is a set of approximations generated by these n approximations \(I_i\). Now the advantage is that FWT can be used to compute the correlation of each approximation \(I_i\) for \(i=1,\cdots ,n\), as described in [14].

Preciously, assume that we have \(m_k\) n-bit parity checks over \(\mathrm{GF}(2^{n})\) with the same distribution. Then for each \(I_i\) there are \(m_k\) bitwise parity checks denoted by \(I_i^{(t)}\) for \(1 \le t \le m_k\). In order to evaluate these \(m_k\) bitwise parity checks \(I_i^{(t)} = \langle w_i^{(t)}, \widehat{\mathbf {u}}' \rangle \oplus \langle v_i^{(t)},Z_t\rangle \) for each \(\widehat{\mathbf {u}}' \), we introduce an integer-valued function,

$$\begin{aligned} h(\widehat{\mathbf {u}}')=\sum _{1 \le t \le m_k: \widehat{\mathbf {u}}' =w_i^{(t)}}(-1)^{\langle v_i^{(t)},Z_t\rangle }, \end{aligned}$$

for all \(\widehat{\mathbf {u}}' \in \mathrm{GF}(2^{nl'})\). We compute the Walsh transform of h and then we can get an \(2^{nl'}\)-dimensional array storing the correlation \(c(I_i)\) indexed by \(\widehat{\mathbf {u}}' \). The total time complexity for computing \(c(I_1), \cdots , c(I_n)\) is \(O(n(m_k+l'n2^{l'n}))\) and memory complexity is \(O(n2^{l'n})\). In addition, the correlations of the other \(2^n-n-1\) linear approximations can be computed by the Piling-up Lemma [16]. Thus, we have got all the correlations for different guessed values of \(\widehat{\mathbf {u}}\). Again from Lemma 4, we can easily compute \(\varDelta (I(\widehat{\mathbf {u}}'))\) for each possible \(\widehat{\mathbf {u}}'\). Then, we can use a distinguisher described in [1] to recover the correct initial state. In total, the time complexity of decoding \(\mathcal {C}_2\) in such a way is \(O(n(m_k+l'n2^{l'n})+2^n 2^{l'n})\).

Now we give the theoretical justifications of our algorithm. Assume the noisy distribution of \(E_t\) over \(\mathrm{GF}(2^n)\) is \(p_{E_t}=(\text{ Pr }\{E_t=0\}, \cdots ,\text{ Pr }\{E_t=2^n-1\})\) and the code length of \(\mathcal {C}_2\) is \(m_k\). According to the k-tree algorithm, using k lists, each of which has size of \(\alpha = \beta 2^{n(l-l')/(1+\log k)}\), we can find \(\beta ^{1+\log k}\) parity checks.

Since the number of parity checks pre-computed is \(m_k\), thus we have \(m_k= \beta ^{1+\log k}\). Further, for the decoding to succeed, the code rate \(R=l'\cdot \log (2^n)/m_k\) of \(\mathcal {C}_2\) must satisfy \(R < C_{\textsc {DMC}}\). Then by Theorem 5, the value of \(m_k\) can be calculated as \( m_k \approx (2l'n \ln 2) / \varDelta (p_{E_i}). \) The following theorem gives the required length N of the observed keystream segment for successfully decoding code \(\mathcal {C}_1\).

Theorem 6

Given a [Nl] linear code \(\mathcal {C}_1\) over \(\text{ GF }(2^n)\). After applying the pre-computation of our algorithm, we get a new \([m_k,l']\) linear code \(\mathcal {C}_2\), which is transmitted through a \(2^n\)-ary DMC with the noise distribution \(p_{E_i}\). The required length N of the observed keystream segment for the algorithm to succeed is \(N \approx k2^{\frac{n(l-l')}{\theta }}(2 l' n \ln 2)^{\frac{1}{\theta }}\varDelta (p_{E_i})^{-\frac{1}{\theta }},\) where \(\theta = 1 + \log k\).

4 Large-Unit Linear Approximation and Its Distribution

In this section, an efficient algorithm to accurately compute the large-unit distribution of the GPLFM is proposed. This is desired when the decoding algorithm is available.

4.1 Large-Unit Linear Approximations

Most of the previous work only study how to use the bitwise linear approximations to constitute a vector, here we directly focus on the non-binary linear approximations whose basic data unit is over \(\mathrm{GF}(2^m)\;(m>1)\) and such non-binary unit linear approximations are called the large-unit linear approximations throughout this paperFootnote 3. Let \(H(X_1,X_2,\cdots ,X_d)\) be a non-linear function, where the output and the input \(X_i\)s are all random variables over \(\mathrm{GF}(2^n)\). Our task is to accurately compute the m-bit large-unit distribution of some linear approximation of H. In practice, the choice of m cannot be arbitrary and is usually determined by the structure of the primitive and the underlying building blocks, e.g., the LFSR structure and the S-box size. When m is fixed, the output of H and each input \(X_i (1\le i\le d)\) can all be regarded as some \(\frac{n}{m}\)-dimensional vectors over \(\mathrm{GF}(2^m)\). In this setting, the definition of a binary linear mask is as follows.

Definition 7

Let \(X \in \text{ GF }(2^n)\) and \(\varOmega =(\omega _{\frac{n}{m}},\cdots ,\omega _2,\omega _1)\) be a \(\frac{n}{m}\)-dimensional binary vector, then X can be transformed to a \(\frac{n}{m}\)-dimensional vector \(X=(x_{\frac{n}{m}},\cdots , x_2, x_1)\) over \(\text{ GF }(2^{m})\) with \(x_i \in \text{ GF }(2^{m})\) for \(1\le i\le \frac{n}{m}\). The inner product between these two vectors is defined as \(\varOmega \cdot X = \omega _{\frac{n}{m}}x_{\frac{n}{m}} \oplus \cdots \oplus \omega _1x_1 \in \text{ GF }(2^{m})\), where \(\varOmega \) is called the \(\frac{n}{m}\)-dimensional binary linear mask of X over \(\text{ GF }(2^{m})\).

4.2 The Generalized Pseudo-Linear Function Modulo \(2^n\)

Now we first generalize the pseudo-linear function modulo \(2^{n}\) (PLFM) in [17] to GPLFM by introducing the binary mask with the inner product in Definition 7. Note that in [17], the distribution of some class of functions called PLFM over \(\mathrm{GF}(2^n)\) is computed, here we consider similar problems of GPLFM in a smaller field \(\mathrm{GF}(2^m)\) with \(m<n\).

Assume the large-unit is of m-bit size. Let \(\mathcal {X}=\{X_1,X_2,\cdots , X_d\}\) be a set of d uniformly distributed n-bit random variables with \(X_i \in \mathrm{GF}(2^n)\) for \(1\le i\le d\), \(\mathcal {C}=\{C_1,\cdots , C_g\}\) be a set of n-bit constants and \(\mathcal {M}\) be a set of \(\frac{n}{m}\)-dimensional binary masks of \(\mathcal {X}\) and \(\mathcal {C}\). Now each element in \(\mathcal {X}\) and \(\mathcal {C}\) can be regarded as a \(\frac{n}{m}\)-dimensional vector over \(\mathrm{GF}(2^{m})\). We denote some symbol or expression on \(\mathcal {X}\) and \(\mathcal {C}\) by \(T_i\). The following two definitions introduce the GPLFM, which generalizes the definition of PLFM in [17].

Definition 8

Given three sets \(\mathcal {X}\), \(\mathcal {C}\) and \(\mathcal {M}\), we have:

  1. 1.

    \(\mathcal {A}\) is an arithmetic termFootnote 4, if it has only the operation of arithmetic \(\boxplus \), e.g., \(\mathcal {A}=T_1 \boxplus T_2 \boxplus \cdots \).

  2. 2.

    \(\mathcal {B}\) is a Boolean term, if it only involves Boolean operations such as OR, AND, XOR, and others, e.g., \( \mathcal {B} = (T_1 \oplus T_2) \; \& \; T_3\).

  3. 3.

    \(\mathcal {S}\) is a simple term, if it is a symbol either from \(\mathcal {X}\) or \(\mathcal {C}\).

  4. 4.

    \(\varOmega \cdot X\) for \(X \in \{\mathcal {A},\mathcal {B},\mathcal {S}\}\) is the inner product result of the term X with the binary mask \(\varOmega \in \mathcal {M}\).

Definition 9

\(F(X_1,X_2,\cdots ,X_d)\) is called a generalized pseudo-linear function modulo \(2^n\) (GPLFM) on \(\mathcal {X}\), if it can recursively be expressed in \(\varOmega \cdot X\) for \(X \in \{\mathcal {A},\mathcal {B},\mathcal {S}\}\) combined by the Boolean operations.

It can be easily seen that the PLFM studied in [17] forms a subset of the GPLFM, which only satisfies the conditions \(1 \sim 3\) in Definition 8. In our large-unit linear approximation of SNOW 2.0 in Sects. 5 and 6, we actually further generalize the GPLFM functions by considering parallel boolean functions, i.e., the S-boxes and multiplication over finite fields are included in our framework.

4.3 Algorithm for Computing the Distribution of a GPLFM

Assume the basic large-unit is of m-bit size. Let \(F(X_1,\cdots ,X_d)\) be a GPLFM with \(\mathcal {X}\), \(\mathcal {C}\) and \(\varOmega \in \mathcal {M}\), where \(X_i \in \mathrm{GF}(2^n)\) \((1\le i\le d)\) and the binary masks are \(\frac{n}{m}\)-dimensional vectors. We want to calculate the distribution of F in an efficient way for some large n. Note that if \(n\ge 32\) and \(d \ge 2\), the distribution \(p_F\) is impossible to implement in practice with the straight forward method, which needs \(2^{nd}\) operations. Further, the algorithm in [17] cannot be applied to this problem due to the inner product operation inherent in the GPLFM over a smaller field \(\mathrm{GF}(2^m)\). Here we propose Algorithm 1 to fulfill this task.

Our basic idea is as follows. Since each coordinate of the binary mask can only take the value of 0 or 1, it actually selects which parts of the data arguments will take effect in the approximation. According to the binary mask \(\varOmega \), we can split each variable \(X_i \in \mathrm{GF}(2^n)\) for \(i=1,\cdots ,d\) into \(\frac{n}{m}\) blocks and each block has m bits, i.e., \(X_i = (X_i^{\frac{n}{m}},\cdots , X_i^2,X_i^1)\), where \(X_i^j \in \mathrm{GF}(2^m)\) for \(1\le j\le \frac{n}{m}\). Since each block of the input variable is mutually independent, the function F can be split into \(\frac{n}{m}\) sub-functions \(F_i\) \((1\le i\le \frac{n}{m})\), which can be evaluated over a smaller space \(\mathrm{GF}(2^m)\). Each sub-function \(F_i\) can be seen as a PLFM over \(\mathrm{GF}(2^{m})\), whose distribution can be efficiently calculated by the algorithm in [17].

figure a

On the other hand, the sub-function \(F_i\)s are connected with each other by the one direction information propagation from the least significant function \(F_{1}\) to the most significant \(F_{\frac{n}{m}}\), caused by the carry bit introduced by \(\boxplus \) and the output of \(F_i\), shown in Fig. 2. Therefore, we can use a connection matrix to characterize this propagation process.

Now, we compute the distribution \(p_F\) by calculating the \(F_i\)s one-by-one from 1 to \(\frac{n}{m}\), as depicted in Fig. 2. Here \(B_{i-1} \in \mathrm{GF}(2^{m})\) is the output of sub-function \(F_{i-1}\) and \(Cr_{i-1}\) is the carry vector of \(F_{i-1}\) that will be propagated to \(F_{i}\), generated by the arithmetic terms in \(F_{i-1}\). If there are s arithmetic terms \(\mathcal {A}_{j}\) \((1\le j\le s)\) in F (thus in each \(F_i\)), then we have \(Cr_i=(cr_i^1,\cdots ,cr_i^s)\), where each \(cr_i^j\) is the corresponding local carry value of the \(A_j\) \((1\le j\le s)\) when the inputs are truncated to the ith block. Note that though each block is m-bit size, the modular addition is still calculated bit-by-bit, thus the maximum local carry value is \(d_j^{+}\), where \(d_j^{+}\) is the number of modular additions in \(\mathcal {A}_{j}\) \((1\le j\le s)\). Emphatically, \(Cr_i\) contains all the carry information of \(F_j\) for \(j<i\), since the carry information is propagated from \(F_1\) to \(F_i\). It is proved in [17] that for any arithmetic term \(\mathcal {A}_j\), the maximum local carry value is \(d^{+}_j\) (the additions of carry value are not included). Similarly, denote the cardinality of \(Cr_i\) by \( |Cr_i|=((cr_i^{1}\cdot (d_{2}^{+}+1)+cr_{i}^{2})(d_{3}^{+}+1)+cr_{i}^{3})\cdot \ldots , \) which is a one-to-one index mapping function from \((cr_i^1,\cdots ,cr_i^s)\) to \([0,\ldots ,|Cr_{\text{ max }}|-1]\), where \(|Cr_{\text{ max }}|=\prod _{j=1}^{s}(d_j^{+}+1)\) is the maximal possible cardinality of the carry vector \(Cr_i\). We use a \(2^{m} \times |Cr_{\text{ max }}|\) matrix \(M_i\) to store the information of the \(F_j\)s \((j\le i)\), where the matrix element \(M_i[B_i][|Cr_i|]\) for \(0\le B_i \le 2^{m}-1\) and \(0\le |Cr_i| \le |Cr_{\text{ max }}|-1\) represents the total number of the inputs \((X_1^i,X_2^i,\cdots ,X_d^i)\) of \(F_i\) that result in the \(F_i\) output \(B_i\) and the carry vector \(Cr_i\). Thus, the evaluation of \(F_{i}\) is converted into the computation of the matrix \(M_{i}\). \(M_i\) stores all the output and carry information of \(F_i\). Here we call it the connection matrix.

Fig. 2.
figure 2

The basic idea of our algorithm

Now we need to evaluate the function \(F_{i}\) based on the connection matrix \(M_{i-1}\), to obtain the next matrix \(M_{i}\). It depends on the carry vector \(Cr_{i-1}\) and the output value \(B_{i-1}\) of \(F_{i-1}\). For \(m > 1\), since the sub-function \(F_i\) can be seen as a PLFM over \(\mathrm{GF}(2^{m})\), which is recursively expressed in \(\mathcal {A},\mathcal {B},\mathcal {S}\), we can use a sub-algorithm called ComputePLFM (Appendix A) to compute the matrix \(M_i\) (\(M_2\) in Algorithm 1) for all the possible values of \(B_{i-1}\) and \(Cr_{i-1}\). Hereafter, when applying the Algorithm 1 we always assume that \(m > 1\). The initial values are \(Cr_0=(0,0,\cdots ,0)\) and \(B_0=0\), i.e., the initial matrix \(M_0\) is set to be zero matrix. Our algorithm to compute the full m-dimensional distribution \(p_F=(p_F(0), p_F(1), \cdots , p_F(2^m-1))\) of a GPLFM F over \(\mathrm{GF}(2^n)\) is shown in the Algorithm 1 diagram. Note that in Algorithm 1, only two connection matrices \(M_1\) and \(M_2\) are used to store the propagation information alternatively. The complexity analysis of Algorithm 1 is as follows. First look at the complexity of Algorithm 2 in Appendix A. Step 1 in Algorithm 2 needs a time complexity of \(O(m \cdot |Cr_{\text{ max }}| \cdot 2^d)\) from [17]. Step 2 to step 8 needs a complexity of \(O(2^m \cdot m\cdot |Cr_{\text{ max }}|)\). Thus the complexity of Algorithm 2 is \(O(m\cdot |Cr_{\text{ max }}|\cdot (2^d+2^m))\) and the total time complexity of our algorithm is \(O(n \cdot 2^m \cdot |Cr_{\text{ max }}|^2 \cdot (2^d + 2^m))\).

5 A Key Recovery Attack on SNOW 2.0

In this section, we demonstrate a state recovery attack against SNOW 2.0. The description of SNOW 2.0 is detailed in [6]. Our new attack is based on the byte-wise linear approximation and utilizes the fast correlation attack over \(\mathrm{GF}(2^{8})\) to recover the correct initial state with much lower complexities.

Fig. 3.
figure 3

The linear approximation of the FSM in SNOW 2.0

5.1 The Byte-Wise Linear Approximation of SNOW 2.0

In SNOW 2.0, denote the AES S-box and the Mixcolumn matrix in the S transform of FSM by \(S_R\) and M respectively. Since \(S_R\) is a 8-bit S-box, we let \(n=32\) and \(m=8\). As SNOW 2.0 has two 32-bit memory registers R1 and R2 in the FSM, it is necessary to consider at least two consecutive steps of the FSM to eliminate these two registers in the approximation. Here we denote the two binary masks by \(\varGamma , \varLambda \in \mathrm{GF}(2)^{4}\) respectively, thus the 32-bit word can be divided into 4 bytes and be regarded as a 4-dimensional vector over \(\mathrm{GF}(2^8)\). For example, let the binary mask \(\varGamma = (1,0,1,0)\) and \(X=(x_4,x_3,x_2,x_1)\) be a 32-bit word of SNOW 2.0 in byte-wise form, thus \(\varGamma \cdot X =x_4 \oplus x_2\). Applying \(\varGamma \) and \(\varLambda \) to \(z_t\) and \(z_{t+1}\) respectively, we have

$$\begin{aligned} \varGamma \cdot z_t&= \varGamma \cdot s_t \oplus \varGamma \cdot (R1_t \boxplus s_{t+15}) \oplus \varGamma \cdot R2_t,\\ \varLambda \cdot z_{t+1}&=\varLambda \cdot s_{t+1} \oplus \varLambda \cdot (s_{t+16} \boxplus R2_t \boxplus s_{t+5})\oplus \varLambda \cdot S(R1_t). \end{aligned}$$

Let \(y_t=\mathrm{Sbox}(R1_t)=(S_R(R1^4_t),S_R(R1^3_t),S_R(R1^2_t),S_R(R1^1_t))\) be the output of S-box. Since the Mixcolumn matrix M is a linear transformation over \(\mathrm{GF}(2^8)\), we have \(\varLambda \cdot S(R1_t)= \varLambda \cdot (My_t)=\varLambda ' \cdot y_t\). We can rewrite the above two equations as

$$\begin{aligned} \varGamma \cdot z_t&= \varGamma \cdot s_t \oplus \varGamma \cdot (\mathrm{Sbox}^{-1}(y_t) \boxplus s_{t+15}) \oplus \varGamma \cdot R2_t,\\ \varLambda \cdot z_{t+1}&=\varLambda \cdot s_{t+1} \oplus \varLambda \cdot (s_{t+16} \boxplus R2_t \boxplus s_{t+5}) \oplus \varLambda ' \cdot y_t. \end{aligned}$$

Now we have a new byte-wise linear approximation of SNOW 2.0, depicted in Fig. 3. Note that in Fig. 3, the S transform of the FSM is dissected to have an efficient approximation. Here we use two linear approximations

$$\begin{aligned} \varGamma \cdot (\mathrm{Sbox}^{-1}(y_t) \boxplus s_{t+15})&= \varGamma \cdot s_{t+15} \oplus \varLambda ' \cdot y_t \oplus \mathbf {N}_1(t),\end{aligned}$$
(2)
$$\begin{aligned} \varLambda \cdot (s_{t+16} \boxplus s_{t+5}\boxplus R2_t)&= \varLambda \cdot s_{t+16} \oplus \varLambda \cdot s_{t+5} \oplus \varLambda \cdot R2_t \oplus \mathbf {N}_2(t). \end{aligned}$$
(3)

The linear approximation (3) is a GPLFM, thus we can adopt Algorithm 1 to compute the distribution of \(\mathbf {N}_2(t)\). For the linear approximation (2), it is not a GPLFM in Definitions 8 and 9, thus we cannot use Algorithm 1 directly. But note that the four \(S_R\)s do not affect the independency among the bytes of \(y_t\), thus we can revise Algorithm 1 to compute the distribution of \(\mathbf {N}_1(t)\) as shown in Algorithm 3.

figure b

The time complexity of computing the distribution of \(\mathbf {N}_1(t)\) has dropped from \(2^{64}\) to \(2^{26.58}\), which is a large improvement compared to the straightforward method. We have searched over all the different binary masks over \(\mathrm{GF}(2)^4\) and found that when \(\varGamma =\varLambda \), these two linear approximations will have larger SEIs. Thus the sum of \(\varGamma \cdot (z_t \oplus z_{t+1})\) can be expressed as

$$\begin{aligned} \varGamma \cdot (z_t \oplus z_{t+1}) = \varGamma \cdot s_{t} \oplus \varGamma \cdot s_{t+1} \oplus \varGamma \cdot s_{t+5} \oplus \varGamma \cdot s_{t+15} \oplus \varGamma \cdot s_{t+16} \oplus \mathbf {N}(t), \end{aligned}$$
(4)

where \(\mathbf {N}(t)=\mathbf {N}_1(t) \oplus \mathbf {N}_2(t)\) is the folded noise variable introduced by the above two linear approximations, whose distribution can be computed by the convolution of the above two noise distributions. We have searched all the possible \(\varGamma \) and \(\varLambda \) and found that the strongest linear approximation of the FSM is as follows. When \(\varGamma =\varLambda =(\mathtt{1,0,1,0 })\), the distribution of \(\mathbf {N}(t)\) has the value of SEI as \(\varDelta (\mathbf {N}(t))=2^{-43.23}\). Observe that given a noise distribution \(\text{ Pr }\{\mathbf {N}(t)\}\), the SEI can be precisely computed by Definition 2. Now, we have constructed the byte-wise linear approximation, i.e., Eq. (4), of SNOW 2.0. Next, we will use this linear approximation to recover the initial state of SNOW 2.0.

5.2 Fast Correlation Attack on SNOW 2.0

Now we apply the fast correlation attack over \(\mathrm{GF}(2^{8})\) to SNOW 2.0 to recover the initial state of the LFSR. Let the LFSR state be \((s_{t+15},\cdots ,s_{t})\in \mathrm{GF}(2^{32})^{16}\), here the LFSR is interpreted as a 64-byte LFSR over \(\mathrm{GF}(2^8)\), i.e., \((s_{t+15}^4,s_{t+15}^3,s_{t+15}^2,\) \(s_{t+15}^1,\cdots ,s_t^4,s_t^3,\) \(s_t^2,s_t^1)\in \mathrm{GF}(2^{8})^{64}\). With the feedback polynomial we can express the linear approximation (4) in the initial state form as \(\varGamma \cdot (z_t \oplus z_{t+1})= \varPsi _{t} \cdot (s_{t+15}^4,s_{t+15}^3,s_{t+15}^2,s_{t+15}^1,\cdots ,s_t^4,s_t^3,s_t^2,s_t^1) \oplus N(t)\), where \(\varPsi _{t} \in \mathrm{GF}(2^8)^{64}\) is the derived recursion of the LFSR.

For the decoding algorithm, we apply the precomputation algorithm in Sect. 3 to generate the parity checks with the parameters \(l=64,l'=17,k=4\), which are the best parameters we have found in terms of the total complexities. The distribution of the folded noise variables \(\mathbf {N}(t_{i_1}) \oplus \mathbf {N}(t_{i_2}) \oplus \mathbf {N}(t_{i_3}) \oplus \mathbf {N}(t_{i_4})\) can be computed by the applications of the convolutional operation twice. The \(\text{ SEI }\) of this new distribution is found to be \(2^{-177.3}\). Using 4 lists in the k-tree algorithm, we get about \(m_k=\beta ^{1+\log k}=2^{184.86}\) parity check equations. By Theorem 6, the data complexity is \(N=2^{188.95}\) and the time/memory complexities of preprocessing stage are \(\beta k2^{n(l-l')/(1+\log k)}=2^{188.95}\). Second, we perform the online decoding algorithm on the new code \(\mathcal {C}_2\) of the code length \(2^{188.95}\). With a computational complexity of \(n(m_k+l'n2^{l'n})+2^n 2^{l'n}=2^{187.86}\), we can recover the \(17 \cdot 8=136\) bits of the initial state of LFSR, the other bits can be recovered with a much lower complexity.

Therefore, the final time/memory/data/pre-computation complexities are all upper bounded by \(2^{186.95}\), which is more than \(2^{25}\) times better than the best previous result at Asiacrypt 2008 [13]. This obviously confirms the superiority of our new techniques.

6 An Improved Key Recovery Attack on SNOW 2.0

Recall in Sect. 4, we use the \(\frac{n}{m}\)-dimensional binary linear masks. Here we generalize this definition by making each component \(\omega _i \in \mathrm{GF}(2^m)\) rather than \(\mathrm{GF}(2)\), i.e., changing the 0 / 1 coefficients to finite field coefficients, i.e., expressing X by \(X = (x_{\frac{n}{m}}, \cdots , x_1) \in \mathrm{GF}(2^m)^{\frac{n}{m}}\) with \(x_i \in \mathrm{GF}(2^m)\) and denote the inner product by \(\varOmega \cdot X = \omega _{\frac{n}{m}} x_{\frac{n}{m}} \oplus \cdots \oplus \omega _1 x_1 \in \mathrm{GF}(2^m),\) where \(\omega _{i} x_{i}\) is the multiplication over \(\mathrm{GF}(2^m)\). \(\varOmega \) is called the linear mask over \(\mathrm{GF}(2^m)\) of X. Now these new nonlinear functions are not GPLFM in Definitions 8 and 9, for we have changed the linear mask from \(\mathrm{GF}(2)\) to \(\mathrm{GF}(2^m)\). Thus we cannot apply the Algorithm 1 to compute the distributions of these new functions directly. Instead, we further revise Algorithm 1 to efficiently compute the distributions of such functions in the following analysis of SNOW 2.0.

6.1 Linear Approximations of SNOW 2.0 Over \(\mathrm{GF}(2^8)\)

The process of finding the linear approximations of SNOW 2.0 is nearly the same as in Sect. 5. In order to find the best linear masks over \(\mathrm{GF}(2^8)\), let us take a closer look at the details of the S permutation in FSM. Let \(\varLambda '=(\varLambda _4',\varLambda _3',\varLambda _2',\varLambda _1')\) denote the linear mask over \(\mathrm{GF}(2^8)\) of the 4 byte outputs of the \(\mathrm{Sbox}\), where the multiplication is computed in \(\mathrm{GF}(2^8)\) defined by the AES Mixcolumn. Then, we can express \(\varLambda \cdot S(\omega )\) as

$$\begin{aligned} \underbrace{(\varLambda _1,\varLambda _2,\varLambda _3,\varLambda _4)\left( \begin{array}{cccc} 2 &{} 3 &{} 1 &{} 1\\ 1 &{} 2 &{} 3 &{} 1\\ 1 &{} 1 &{} 2 &{} 3\\ 3 &{} 1 &{} 1 &{} 2\\ \end{array} \right) } \left( \begin{array}{c} S_R(\omega _1)\\ S_R(\omega _2)\\ S_R(\omega _3)\\ S_R(\omega _4) \end{array} \right) =(\varLambda _1',\varLambda _2',\varLambda _3',\varLambda _4')\left( \begin{array}{c} S_R(\omega _1)\\ S_R(\omega _2)\\ S_R(\omega _3)\\ S_R(\omega _4) \end{array} \right) , \end{aligned}$$

where \(\varLambda _i,\varLambda '_i \in \mathrm{GF}(2^8)\) for \(1\le i\le 4\). We adopt the field \(\mathrm{GF}(2^8)\) as that defined by the AES Mixcolumn and assume that the linear masks over \(\mathrm{GF}(2^8)\) are also defined in this field. Here we still use the two linear approximations over \(\mathrm{GF}(2^8)\), i.e., Eqs. (2) and (3), but the linear masks \(\varGamma ,\varLambda \) are 4-dimensional vectors over \(\mathrm{GF}(2^8)\). The algorithm to compute the distribution of Eq. (2) is similar as before, except that \(\varGamma =(\varGamma _4,\varGamma _3,\varGamma _2,\varGamma _1) \in \mathrm{GF}(2^8)^4\) rather than \(\mathrm{GF}(2)^4\), shown in Algorithm 3. The distribution of \(\mathbf {N}_2(t)\) with the linear mask \(\varLambda \in \mathrm{GF}(2^8)^4\) can be computed by Algorithm 4 in Appendix B The time complexity is \(3\cdot (2^8)^3\cdot 2^8 \approx 2^{33.58}\), while the straightforward method needs a complexity of \(2^{96}\).

Now we describe how to find the linear masks \(\varLambda , \varGamma \) that satisfy Eqs. (2) and  (3) with large SEIs. Our strategy is to limit the Hamming weights of the linear masks \(\varLambda \) and \(\varLambda '\) over \(\mathrm{GF}(2^8)\). Denote the Hamming weight of a vector by \(wt(\cdot )\), thus \(wt(\varLambda ')\) determines the number of active S-boxes in the S-box ensemble S. In the experiments, we found that the SEI of \(\mathbf {N}_2(t)\) is dependent on \(wt(\varLambda )\). The lower value of \(wt(\varLambda )\), the larger \(\varDelta (\mathbf {N}_2(t))\). We have searched all the linear masks \(\varLambda ,\varLambda '\) with \(wt(\varLambda ) \le 3\) and \(wt(\varLambda ') \le 3\) and found 255 different linear masks having the same largest value of \(\varDelta (\mathbf {N}(t))\). For example, when \( \varGamma =\varLambda =(\mathtt{0x00,0x01,0x00,0x03 }), \) we get the best linear approximation with the noise distribution \(\mathbf {N}(t)\) having a SEI of \(\varDelta (\mathbf {N}(t))=2^{-29.23}\).

Please see Appendix C for unifying the two fields before decoding. Then we launch the fast correlation attack over \(\mathrm{GF}(2^8)\) with the parameters \(n=32,l=64,l'=19,k=4\). The data complexity is \(N \approx 2^{163.59}\), while the time/memory complexities of the pre-computation is \(2^{163.59}\). After pre-computation, we can acquire about \(m_k= 2^{124.79}\) parity checks. For the online decoding algorithm, the time complexity is \(2^{162.52}\) with the above parameter configuration. Note that here all the complexities are below \(2^{164.15}\approx 2^{162.52}+2^{163.59}\), which are considerably reduced compared to the binary mask case. The reason is that the linear masks with the finite field coefficients greatly extend the searching space and can well exploit the algebraic structure of the two finite fields (one defined in a tower manner in the LFSR and the other in the Mixcolumn) inherent in SNOW 2.0.

6.2 Experimental Results

We have verified each step of our new techniques in simulations to support the theoretical analysis. We have used the GNU Multiple Precision Arithmetic Library in Linux system to verify the exact distribution of each linear approximation that has been found, thus the SEI of our large-unit linear approximation is precisely evaluated without any precision error. Then we have run extensive experiments on a small-scale version of SNOW 2.0, called s-SNOW 2.0 described in Appendix D, that have verified our approach.

We have computed the 4-bit linear approximation of the s-SNOW 2.0 with Algorithm 1 in theory and verified it in practice. Then we executed the experiments on the decoding algorithm in Sect. 3.4. We randomly fixed the values of 60 bits of the initial state of the LFSR and tried to recover the remaining 20-bit by our method. The chosen parameters are \(l'=20,m_k=2^{15.39}\). We first use s-SNOW 2.0 to generate \(2^{17}\) keystream bits \(z_t\) for a randomly generated 80-bit initial state. Then we store \(z_t\) and \(s_t\) in two arrays for \(t=1,\cdots ,2^{17}\). Thus we can construct \(2^{17}\) parity checks \(I^{(t)}=\varGamma \cdot (z_t \oplus z_{t+1}) \oplus \varGamma \cdot s_t \oplus \varGamma \cdot s_{t+1} \oplus \varGamma \cdot s_{t+3} \oplus \varGamma \cdot s_{t+4} \oplus \varGamma \cdot s_{t},\) for \(t=0, \cdots , 2^{17}-1\). Second, for each parity check \(I^{t}\), we use the LFSR feedback polynomial to express each \(s_t\) for \(t>4\) as a linear combination of the LFSR initial state variables. Now we get \(2^{17}\) parity checks only containing \((s_0,s_1,s_2,s_3,s_4)\) after fixing 60-bit in the state. Third, we divide the 4-bit linear approximation \(I^{(t)}\) into four bitwise linear approximations, i.e., \(I_1^{(t)}= \langle (0,0,0,1), I^t \rangle ,I_2^{(t)}=\langle (0,0,1,0), I^{(t)} \rangle ,I_3^{(t)}=\langle (0,1,0,0), I^{(t)} \rangle ,I_4^{(t)}=\langle (1,0,0,0), I^{(t)}\rangle \). For each possible 20-bit initial state, we use FWT to compute the correlations \(c(I_i^{(t)})\) for \(i=1,\cdots ,4\). Fourth, we apply Lemma 4 to compute the SEI of \(p_I\) for each possible initial state. Then we use the SEI to distinguish the correct initial state. We ran the experiments for randomly generated values 100 times with different fixed values at different positions in the LFSR state, and we found that the correct key always ranks in the top 10 in the candidates list. These 10 candidates have \(\varDelta (p_I)\) around \(2^{-10.6}\), which verified the theoretical analysis.

Therefore, the experimental results have provided a solid support to our decoding algorithm and we can get reliable predictions from our theoretical analysis when the simulation is infeasible to perform. Further, our decoding method is essentially the LLR method in linear cryptanalysis, whose validity can be guaranteed by the theory of linear cryptanalysis.

7 Conclusions

In this paper, we have developed two new cryptanalytic tools to bridge the gap between the widely used primitives employing word-based LFSRs and the current mainstream bitwise fast correlation attacks. The first one, a formal framework for fast correlation attacks over extension fields with a thorough theoretical analysis, is the first comprehensive answer to the corresponding open problem in the field of correlation attacks. The second technique, serving as a basis to the first one, allows to efficiently compute the bias distributions of large-unit linear approximations of the flexibly derived GPLFM, which includes all the previously studied topics in the open literature in an unified framework. The size of the data unit is usually chosen according to the structure of the underlying primitive and the building blocks, which greatly extends the freedom of the adversary in the cryptanalysis of many symmetric-key primitives. As an application, we adapted these two techniques to SNOW 2.0, an ISO/IEC 18033-4 standard and a benchmark stream cipher in the European eSTREAM project, and achieved the best key recovery attacks known so far. The new methods are generic and are applicable to other symmetric-key primitives as well, e.g., SNOW 3G, Sosemanuk, Dragon, and some CAESAR candidates. It is our future work to study the large-unit linear approximations of these primitives and launch various attacks accordingly.