1 Introduction

The dedicated hardware-oriented stream ciphers are widely used in the resource constrained environments with limited storage, gate count, or low power supply/consumption, e.g., A5/1 in the GSM and the two-level E0 encryption scheme in Bluetooth. These LFSR-based primitives are designed in the 1990s of last century as the typical examples of irregular clocking generator and the keystream generator with memory.

Bluetooth is a wireless technology standard managed by the Bluetooth Special Interest Group (SIG), whose applications are ubiquitous nowadays, e.g., at home, in hospitals, assembly lines, aircrafts, and wearable computers. The Bluetooth standard, authorized by IEEE 802.15.1 [3], adopts the two-level E0 stream cipher to protect the privacy between different devices, such as personal computers, laptops and mobile phones, that operate over a short range and at low power. Although being a long-standing problem in stream ciphers, the security analysis of two-level E0 is still of great practical importance, as pointed out by Prof. Preneel in [27]. In the latest version of Bluetooth Specification v4.2 [3], the E0 stream cipher is still being used to protect the user information all over the world.

Correlation attack [31] is a classical method in the cryptanalysis of stream ciphers, which exploits some statistically biased relation between the produced keystream and the output of certain underlying sequence. In the 1990s of last century, the correlation properties of combiners with memory are analyzed in theory [9, 25]. Based on these identified correlations, for LFSR-based stream ciphers, the initial state of the target LFSR can be recovered by (fast) correlation attacks [4, 5, 12, 13, 24]. Further, in [15, 16], the notion of correlation was extended to conditional correlation, that studied the linear correlation of the inputs conditioned on a given output pattern of some nonlinear functions. Later at Crypto 2005 [19], the conditional correlation was assigned with a dual meaning, i.e., the correlation of the output of a function conditioned on some unknown input,Footnote 1 which is uniformly distributed and was applied to analyze the security of two-level E0. Since the conditional correlation is no smaller than the unconditional ones, it is expected that better attacks could be achieved if such conditional correlations are exploited appropriately. In the special case that holds for two-level E0, the condition vector is determined linearly by some key-related material and the public nonce, and thus the adversary will get for free the various condition vectors for different target functions corresponding to different values of the nonce and expect to observe the biased sample sequence for the correct key and unbiased sequences for the wrong candidates. Given a pool of sample sequences derived from the guessed values of the condition vector and some public information, a statistical distinguisher can be mounted accordingly to restore the secret key.

The keystream generator E0 used in Bluetooth is a LFSR-based nonlinear combiner with 4-bit memory, which is a modification of the summation generator [28]. In practice, the E0 cipher is frequently re-synchronized as a two-level scheme and the keystream generated for each frame is only 2790 bits.Footnote 2 Thus, most of the published attacks [1, 6, 7, 11, 14, 21, 29, 30] that work on one impractically long frame of keystream remain the academic interest only and have little impact on the practical usage of Bluetooth encryption. Currently, a few attacks [7, 8, 10, 19, 20, 26] apply to the two-level E0. The best known-IV attack in [19] requires \(2^{38}\) on-line computations, \(2^{38}\) off-line computations and \(2^{33}\) memory to restore the original encryption key, given the first 24 bits of \(2^{23.8}\) frames in theory (while in experiments, it needs about 19-, 37-h and 64 GB storage, given the first 24 bits of \(2^{26}\) frames).

Our Contributions In this paper, we first propose a generalized mathematical model that inherits the spirit of the two-level E0 encryption scheme, and study its both unconditional and conditional correlation properties. A fast recursive method with time complexity justification is formulated to compute the unconditional correlations in the general core keystream generator. Besides, the conditional correlation properties of the two-level model are derived and analyzed by the condition masking technique, which instead of considering the correlations conditioned on the whole condition vector, only a subset of the condition vector is taken into account when investigating the correlations. This generalizes the concept of linear mask by depicting the condition as the value selected according to a mask and studying how to choose the condition to achieve better tradeoffs between time/memory/data complexities.

It is expected that with a careful selection of the condition mask, better tradeoffs between the attack complexities can be reached compared to the case of simply choosing the full condition vector. Based on the new notion, a theoretical framework is established to efficiently restore the secret key in the model, which includes the former framework in [19] as a special case. The subtle difference between the new framework and the previous one in [19] is pointed out, which is demonstrated by the concrete attack on the real two-level E0 later. Based on a dedicated linear approximation of the two-level model, both bitwise and vectorial key recovery attacks are mounted and analyzed. During the process, a necessary and sufficient condition that determines when the adversary could gain in correlation by moving from low-dimension to high-dimension in the conditional correlation attack in the general model is proved. Furthermore, a novel design criterion for the general model to achieve desirable security level is proposed as a countermeasure to resist the attack, which is shown to be lightweight and very efficient in practice.

Then under the above cryptanalytic principles, we systematically study the security of the real two-level Bluetooth encryption scheme. Our main observation is that it is of high probability that only a subset of bits in the whole condition vector determines the magnitude of the bias, e.g., in the E0 combiner, only the latest four LFSR bits entering into the FSM play the most important role. Thus, the time/memory complexities of the conditional correlation attack against two-level E0 can be significantly reduced by properly choosing the condition mask.

We start with a revisiting of the unconditional correlation properties in the Bluetooth combiner. Note that the former relevant result, the Corollary 6 in [21], can only compute a special type of unconditional correlations in the core combiner, i.e., the correlations of the pure FSM output sequence. For the correlations between all the input linear functions and all the output linear functions, only a small mask length up to 6-bit is provided in [10]. Here, we present the complete recursive formula for fast computation of such correlations in the E0 combiner, which goes beyond the time/memory complexity barriers of the Fast Walsh Transform (FWT) [18, 32] and has a reasonable practical complexity for a wide range of the length of the linear mask. It is stated in the conclusion section of [10] that the complexity of their attack against E0 can be further decreased by exploiting m-bit linear correlation for \(m>6\) if such correlations are feasible to compute. We efficiently solve this problem by using our method to recursively compute and verify all the unconditional correlations up to 14-bit with a low complexity.

Second, we comprehensively investigate the conditional correlations inside the two-level E0 with the tool of condition masking. The target function inherent in E0 used to compute the conditional correlations in [19] is generalized, and a large class of correlations conditioned on both the linear mask and the condition mask is presented. Although the correlation conditioned on the full condition vector is maximum in the value, it is not generally optimum in the global time/memory/data complexities aspect. The time/memory complexities are closely associated with the condition. An adversary need not to guess the full condition vector and what he has to guess is determined by the condition mask he has chosen. In this way, the time/memory complexities can be considerably reduced.

Third, following the general principles of high-dimensional attacks, the vectorial approach is studied. The vectors used in our attack are carefully constructed and indeed work well to keep the data complexity as low as possible without a penalty in the time/memory complexities. In the process, we point out that the data complexity analysis of the attacks in [19] and [34] are inaccuracy. The exact data complexities in theory of the previous attacks are all above the \(2^{26}\) bound due to an inaccurate formula used in [19] and [34]. We correct the data complexity and show how to reduce it below the \(2^{26}\) bound by a combination with the list decodingFootnote 3 and multi-pass decoding techniques,Footnote 4 which results in the data complexity reduced to \(2^{24}\). As a result of all the above techniques, it is shown that if the first 24 bits of \(2^{24}\) frames are available, the secret key can be reliably found with \(2^{25}\) on-line computations, \(2^{21.1}\) off-line computations and 4 MB memory in the known-IV scenario. Our attacks have been fully implemented in C language on one core of a single PC. Due to the small memory consumption and low time complexity, it is repeated thousands of times with randomly generated keys and IVs, while the attack in [19] is only executed 30 times for a fixed master key with \(2^{26}\) frames. On average, it takes only a few seconds to restore the original encryption key. To our knowledge, this is the best and most threatening known-IV attack on the real Bluetooth encryption scheme so far. Besides, compared to the experimental attack in [34], the success probability of our new attack is improved as well.

Finally, we further convert the above known-IV attack into a ciphertext-only attack against the real two-level E0, based on the fact that in any stretch of written language, certain letters and combinations of letters occur with varying frequencies, i.e., the plaintexts are not random. Thus, we can always find some biases among the plaintext bits. Then, it is shown that if the first 24 bits of \(2^{26}\) frames are available, the secret key can be reliably found with \(2^{26}\) on-line computations and \(2^{21.1}\) off-line computations in the ciphertext-only scenario, which is the first practical ciphertext-only attack on the two-level bluetooth encryption scheme so far. The practical implementation of the ciphertext-only attack is provided as well. An efficient countermeasure to improve the security of the two-level E0 encryption scheme is summarized to prolong the existence life of the Bluetooth standard in practice.

This paper is organized as follows. We first present some preliminaries used in our work in Sect. 2. Then, the generalized mathematical model of the two-level encryption scheme is provided in Sect. 3. The correlation properties of the two-level model, both unconditional and conditional, are studied in Sect. 4 with the new framework for recovering the secret key in the model. A full description of the real two-level E0 scheme is presented in Sect. 5. In Sect. 6, a brief review of the best previous attack against the two-level E0 is given. Various correlation properties in the E0 combiner, e.g., unconditional and conditional correlations based on condition masking are studied in Sect. 7. Then, both bitwise and vectorial key recovery attacks based on condition masking are developed in Sect. 8 with theoretical analysis. In Sect. 9, the practical implementation of the known-IV attack is described. In Sect. 10, we detail the first ciphertext-only attack on two-level E0, while the practical implementation of the ciphertext-only attack is provided in Sect. 11. Finally, some conclusions are provided in Sect. 12.

2 Preliminaries

In this section, some basic notations and definitions are presented. Denote the binary field by \(\text{ GF }(2)\) and the m-dimensional extension field of \(\text{ GF }(2)\) by \(\text{ GF }(2^{m})\). Similarly, denote the m-dimensional vector space over \(\text{ GF }(2)\) by \(\text{ GF }(2)^{m}\). The set of real numbers is denoted by \(\mathbf {R}\). The inner product of two n-dimensional vectors \(\gamma \) and \(\rho \) over \(\text{ GF }(2^{m})\) \((m\ge 1)\) is \(\gamma \cdot \rho =<\gamma ,\rho>\) \(=<(\gamma _{0},\ldots ,\gamma _{n-1}),(\rho _{0},\ldots ,\rho _{n-1})>\) \(= \bigoplus _{i=0}^{n-1}\gamma _{i}\rho _{i}\). The Hamming weight of a vector or a polynomial is denoted by \(wt(\cdot )\), i.e., the number of nonzero components or coefficients.

Definition 1

The correlation (or bias) of a random Boolean variable X is \(\epsilon (X) = \text{ Pr }(X=1)-\text{ Pr }(X=0)\).

Note that in some articles, \(\epsilon (X) = \text{ Pr }(X=0)-\text{ Pr }(X=1)\). The only difference is the sign of the correlation. Let \(\xi \) be an arbitrary set, given the function \(f: \xi \rightarrow \text{ GF }(2)^r\), the distribution \(D_f\) of f(X) with \(X \in \xi \) uniformly distributed is

$$\begin{aligned}D_f(a)= \frac{1}{|\xi |} \sum _{X \in \xi }\mathbf {1}_{f(X)=a}\end{aligned}$$

for all \(a \in \text{ GF }(2)^r\).

Definition 2

The Squared Euclidean Imbalance (SEI) of a distribution \(D_f\) is defined as \(\Delta (D_f)=2^r\sum _{a \in {GF}(2)^r} (D_f(a)-\frac{1}{2^r})^2\).

\(\Delta (D_f)\) measures the distance between the target distribution \(D_{f}\) and the uniform distribution. Specially, for \(r=1\), we have \(\Delta (D_f)=\epsilon ^2(D_f)\). For brevity, we use the \(\epsilon (f), \Delta (f)\) to represent \(\epsilon (D_f), \Delta (D_f)\), respectively, hereafter. Similarly, \(\text{ E }[\Delta (h_{\mathcal {B}})]\) is used to measure the conditional correlations, where the expectation is taken over all the uniformly distributed \(\mathcal {B}\). Next, we give the definitions of the Walsh Transform and convolution transform respectively.

Definition 3

Given a function \(f: \text{ GF }(2)^n \rightarrow \mathbf {R}\), for \(\omega \in \text{ GF }(2)^n\), the Walsh Transform of f at point \(\omega \) is defined as \(\hat{f}(\omega )= \sum _{x \in GF(2)^n}f(x)(-1)^{\langle \omega , x \rangle }\).

Definition 4

Given two functions \(f,g: \text{ GF }(2)^n \rightarrow \mathbf {R}\), the convolution transform of f and g is defined as \( (f \otimes g)(x)= \sum _{y \in GF(2)^n} f(y)\cdot g(x \oplus y). \) Further, we have the relation

$$\begin{aligned} \widehat{(f \otimes g)} (x) = \hat{f}(x) \cdot \hat{g}(x), \end{aligned}$$

for all \(x \in GF(2)^{n}\).

It is well known that the Walsh transform of f can be computed efficiently with an algorithm called Fast Walsh Transform (FWT) [32] in \(n2^n\) time and \(2^n\) memory. The preparation of f takes \(2^n\) time, and thus the total time complexity is \(2^n+n2^n\). The convolution transform between f and g could be computed by invoking three times the FWT algorithm, i.e., \(\hat{f}, \hat{g}\) and \(\widehat{\widehat{f \otimes g}}\).

Fig. 1
figure 1

Structure of the two-level model

3 Mathematical Model

Our model of the two-level E0-like encryption scheme is depicted in Fig. 1, which consists of two phases: the payload key generator in the first level and the keystream generator in the second level.

An E0-like keystream generator, as defined in [22], lies at the core of the two-level model. There are n maximum-length LFSRs in the generator, denoted by LFSR\(_{i}\) \((1\le i\le n)\) of length \(L_{i}\)-bit, together with a Finite State Machine (FSM) of k memory bits. Without loss of generality, let the LFSR\(_{i}\)s have pairwise distinct lengths \(L_{i}\) satisfying \(L_{1}<L_{2}<\cdots <L_{n}\) and primitive characteristic polynomials \(p_{i}(x)\in \text{ GF }(2)[x]\). Denote the time instant at the first level by t and at the second level by \(t^{\prime }\), respectively. The content of the LFSRs at time t is denoted by \(\zeta _t\). At time t, denote the n output bits of LFSRs by \(B_t=(b^1_t, \ldots , b^n_t)\), which is also the input to the FSM, and the FSM state by \(\sigma _{t}\in \text{ GF }(2)^{k}\). Then, the next state \(\sigma _{t+1}\) of the FSM can be computed by the current FSM state \(\sigma _t\) and \(B_t\) via \(\sigma _{t+1} = \mathcal {F}(B_t, \sigma _t)\), where \(\mathcal {F}: \sigma _{t}\mapsto \sigma _{t+1}\) is a permutation for any \(B_{t}\). The FSM outputs one bit \(\varPsi _t = \omega _{c} \cdot \sigma _t\), which is an inner product of its current state \(\sigma _t\) and a constant \(\omega _{c} \in \text{ GF }(2)^k\). The core combiner generates one keystream bit \(z_{t}\) as the xor of the FSM output bit \(\varPsi _{t}\) and the sum of the LFSRs outputs, i.e., \(\varPsi _{t} \oplus \xi _{t} = z_{t}\), where \(\xi _{t} = \oplus ^n_{i=1}b^{i}_{t}\).

Next, we provide a formal description of the workflow of the two-level model. At the first level, the secret key and public nonce \(P^{i}\) (IV)Footnote 5 are mixed by two affine transforms \(\mathcal {G}_1\) and \(\mathcal {G}_{2}\), then loaded into the n LFSRs linearly. With the preset null state in the FSM, the core generator runs a certain number of clocks and produce \(\eta _{1}\)-bit output \((\eta _{1}> \sum _{i=1}^{n}L_{i})\). The last generated \(L=\sum _{i=1}^{n}L_{i}\) output bits at the first level are permutated into the n LFSRs by another affine transform \(\mathcal {G}_3\) and keep the content of the FSM at the end of the first level. We stress here that \(\mathcal {G}_3\) inherits the feature that the last \(\sum _{i=1}^{n}L_{i}\) output bits are only permutated into the LFSRs, without any linear combination among the manipulated bits, for efficiency reasons. From this combined internal state, the core generator produces \(\eta _{2}\)-bit keystream for encryption for the i-th frame during the second level.

4 Correlations Properties of the Two-Level Model

In this section, both the unconditional and conditional correlations properties based on condition masking of the two-level model are studied, which naturally lead to our new key recovery framework.

4.1 Unconditional Linear Correlations

We first study the unconditional correlation properties of the second level, which are exploited in the linear approximation process of the two-level model. Inspired by [10, 21], we give a general way to efficiently compute the unconditional correlations at the second level.

Let \(\varOmega (a,\langle \omega ,u \rangle )\) be the correlation \(\epsilon (a \cdot \sigma _{t^{\prime }+1} \oplus \omega \cdot \sigma _{t^{\prime }} \oplus u \cdot B_{t^{\prime }})\) of two consecutive steps in the keystream generation, where \(a \in \text{ GF }(2)^k, u \in \text{ GF }(2)^{n}, \omega \in \text{ GF }(2)^k\) are linear masks and \(B_{t^{\prime }}\) represent the n output bits of LFSRs at time \(t^{\prime }\) of the second level. For brevity, we denote the unconditional correlation for a continuous d time instants by

$$\begin{aligned} \delta ( \langle a_1,u_1 \rangle , \ldots ,\langle a_{d-1},u_{d-1} \rangle ,a_d)&=\epsilon (a_1\cdot \sigma _{t^{\prime }+1}\oplus u_1\cdot B_{t^{\prime }+1}\oplus \cdots \oplus a_{d-1}\nonumber \\&\quad \cdot \sigma _{t^{\prime }+d-1}\oplus u_{d-1}\cdot B_{t^{\prime }+d-1} \oplus a_{d} \cdot \sigma _{t^{\prime }+d}), \end{aligned}$$
(1)

where \(u_1,\ldots ,u_{d-1} \in \text{ GF }(2)^n, a_1,\ldots ,a_{d-1}, a_{d} \in \text{ GF }(2)^k\). The following theorem can be used to compute the correlation for iterative structures [11].

Theorem 5

Given functions \(f:GF(2)^m \times GF(2)^p \rightarrow GF(2)\) and \(g:GF(2)^q \rightarrow GF(2)^p\), let \(X \in GF(2)^m\) and \(Y \in GF(2)^q\) be two independent random variables. Then, for all \(u \in GF(2)^m, v \in GF(2)^q\), we have

$$\begin{aligned} \delta (f(X,g(Y)) \oplus u \cdot X \oplus v \cdot Y)= & {} \sum _{\omega \in GF(2)^p}\delta (f(X,g(Y))\oplus u \\&\quad \cdot X \oplus \omega \cdot g(Y))\delta (\omega \cdot g(Y) \oplus v \cdot Y). \end{aligned}$$

Now we can present our general iterative computation method to calculate the unconditional correlation in (1).

Theorem 6

Assume that the initial state \((\zeta _{0},\sigma _{0})\) is random and uniformly distributed in the model, and then we have

$$\begin{aligned} \delta (\langle a_1,u_1\rangle ,\ldots , \langle a_{d-1},u_{d-1}\rangle ,a_d) =&\sum _{\omega \in GF(2)^k}\varOmega (a_d,\langle \omega ,u_{d-1}\rangle )\\&\quad \cdot \delta (\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-2},u_{d-2} \rangle ,a_{d-1}\oplus \omega ). \end{aligned}$$

Proof

To apply Theorem 5, we set \(X=B_{t^{\prime }+d-1}, Y=(\langle \sigma _{t^{\prime }+1},B_{t^{\prime }+1}\rangle , \ldots ,\) \(\langle \sigma _{t^{\prime }+d-2},B_{t^{\prime }+d-2}\rangle ,\sigma _{t+d-1}), g(Y)=\sigma _{t^{\prime }+d-1},f(X,g(Y))=a_d \cdot \) \(\sigma _{t^{\prime }+d}, u=u_{d-1}\) and \(v=(\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-2},u_{d-2},\) \(a_{d-1}\rangle )\). Thus, we have

$$\begin{aligned}&\delta (\langle a_1,u_1\rangle ,\ldots , \langle a_{d-1},u_{d-1}\rangle ,a_d)\\&\quad =\delta (f(X, g(Y))\oplus u_{d-1} \cdot X \oplus v \cdot Y)\\&\quad =\sum _{\omega \in GF(2)^k}\delta (f(X,g(Y))\oplus u \cdot X \oplus \omega \cdot g(Y)) \cdot \delta (\omega \cdot g(Y) \oplus v \cdot Y)\\&\quad =\sum _{\omega \in GF(2)^k}\delta (a_d \cdot \theta _{t^{\prime }+d}\oplus u_{d-1} \cdot B_{t^{\prime }+d-1} \oplus \omega \cdot \theta _{t^{\prime }+d-1}) \\&\qquad \cdot \delta (\omega \cdot \theta _{t^{\prime }+d-1} \oplus a_1\cdot \theta _{t^{\prime }+1}\oplus \cdots \oplus a_{d-1} \oplus \theta _{t^{\prime }+d-1})\\&\quad =\sum _{\omega \in GF(2)^k}\varOmega (a_d,\langle \omega ,u_{d-1}\rangle )\cdot \delta (\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-2},u_{d-2} \rangle ,a_{d-1}\oplus \omega ), \end{aligned}$$

which completes the proof. \(\square \)

Theorem 6 is a generalization of the formulas in [21, 22]. It can compute the unconditional correlations between all the input linear functions and all the output linear functions without any miss. Some illustrative examples are given in the two-level E0 case later in Sect. 7.1.

Denote the m consecutive keystream bits as \(Z^m_{t^{\prime }}\) and the m continuous LFSR inputs as \(B^m_{t^{\prime }}\), where \(v \cdot Z^m_{t^{\prime }}=\bigoplus ^{m-1}_{j=0}v_jz_{t^{\prime }+j}\), and \(W \cdot B^m_{t^{\prime }}=\bigoplus ^{m-1}_{j=0} (\omega _j \cdot B_{t^{\prime }+j})\) are two linear functions defined by a \(n\times m\) matrix \(W=(\omega _0,\ldots ,\omega _{m-1})\) and a vector v. We can ignore the effect of \(t^{\prime }\), for the correlations are time-invariant. Then, we have the following corollary on the time complexity of the above recursive method.

Corollary 7

The recursive expression in Theorem 6 can compute all the correlation coefficients of the form \(\epsilon (W\cdot B^m \oplus v\cdot Z^m)\), i.e., the unconditional correlations between the LFSR output sequence and the keystream sequence of the general combiner in \(k^m\cdot (n+1)^{m-1}\) iterations.

Proof

We first prove that all the \(\epsilon (W\cdot B^m \oplus v\cdot Z^m)\) can be computed by Theorem 6. Assume \(W=(\omega _0,\ldots ,\omega _{m-1}),v=(v_0,\ldots ,v_{m-1})\), where \(\omega _i\in GF(2)^n,v_i \in GF(2)\) and \(\mathbf {1}_n\) represents the vector with all the components being 1, and then we have

$$\begin{aligned}&\epsilon (W\cdot B^m \oplus v\cdot Z^m) =\epsilon (\omega _{0} \cdot B_{0} \oplus \cdots \omega _{m-1} \cdot B_{m-1} \oplus v_{0}z_{0} \oplus \cdots v_{m-1}z_{m-1})\\&\quad =\epsilon (\omega _{0} \cdot B_{0} \oplus \cdots \omega _{m-1} \cdot B_{m-1} \oplus v_{0}(\mathbf {1}_n \cdot B_{0} \oplus a_{0} \cdot \sigma _{0}) \\&\qquad \oplus \cdots \oplus v_m(\mathbf {1}_n \cdot B_{m-1} \oplus a_{m-1} \cdot \sigma _{m-1}))\\&\quad =\epsilon ((\omega _{0} \oplus v_{0}\mathbf {1}_n)\cdot B_{0} \oplus (v_{0}a_{0})\cdot \sigma _{0}\oplus \cdots \oplus (\omega _{m-1} \oplus v_{m-1}\mathbf {1}_n)\\&\qquad \cdot B_{m-1} \oplus (v_{m-1}a_{m-1})\cdot \sigma _{m-1}), \end{aligned}$$

where \(a_{i}=v_{i}\cdot \omega _{c}\) for \(0\le i\le m-1\). Note that if the linear mask \(\omega _{m-1} \oplus v_{m-1}\mathbf {1}_n\) of \(B_{m-1}\) is not 0, since the variable \(B_{m-1}\) will be independent to all the other variables, then the total correlation will be 0. Hence, we always assume \(\omega _{m-1} \oplus v_{m-1}\mathbf {1}_n=\mathbf {0}_n\). Now, we can compute the above correlation by Theorem 6. For a certain m, we have \(v_{0}a_{0}\ne 0\) and \(v_{m-1}a_{m-1}\ne 0\). Because of the symmetry of the combiner’s output and next-state functions with respect to the n input variables, the correlation depends on \(v_ia_i\) and \(wt(B_i)\). Hence, computing all the correlations at the second level only needs about \(k^m\cdot (n+1)^{m-1}\) iterations. \(\square \)

Theorem 6 and Corollary 7 are used in the linear approximation of the second level in the model.

4.2 Conditional Correlations Based on Condition Masking

Now let us look at the conditional correlation properties of the two-level model. Several consecutive steps of the core generator can be regarded as a vectorial Boolean function, and we would like to investigate the conditional correlation properties of this derived function.

Generally, there are two sets of inputs to the FSM in the first level at time t, i.e., the n LFSR output bits \(B_t=(b^1_t, \ldots ,b^n_t)\) and the k memory bits \(\sigma _t=(\sigma ^{k-1}_{t},\ldots ,\sigma ^{0}_{t})\in \text{ GF }(2)^{k}\). Consider l continuous time instants and let \(\gamma =(\gamma _0,\gamma _1,\ldots ,\gamma _{l-1}) \in \text{ GF }(2)^l\) be a linear mask with \(\gamma _0=\gamma _{l-1}=1\). Define the inputs to the FSM as

$$\begin{aligned} \mathcal {B}_{t}=B_{t}B_{t+1}\cdots B_{t+l-2} \in \text{ GF }(2^{n(l-1)}),\; \; \sigma _{t+1} \in \text{ GF }(2)^k \end{aligned}$$

and the FSM outputs \(C_t=(\omega \cdot \sigma _t,\ldots ,\omega \cdot \sigma _{t+l-1})\). Then, the function \(h_{\mathcal {B}_{t}}^{\gamma }:\sigma _{t}\rightarrow \gamma \cdot C_t\) is well defined, as \(\gamma _0=\gamma _{l-1}=1\) is necessary and sufficient to recursively compute \(\gamma \cdot C_{t}\) with the knowledge of \(\mathcal {B}_{t}\) and \(\sigma _{t}\) as shown in Fig. 2. The bias \(\epsilon (h_{\mathcal {B}_{t}}^{\gamma })\) can be easily computed by an exhaustive search over all the possible values of \(\sigma _{t}\). For different values of \(\mathcal {B}_{t}\), the bias \(\epsilon (h_{\mathcal {B}_{t}}^{\gamma })\) may be different, while the mean value \(\text{ E }[\epsilon (h_{\mathcal {B}_{t}}^{\gamma })]\) is a good estimate in the attacks. In general, we may expect to see the bias with a proper value of \(\gamma \). Now, we are ready for the definition of condition mask.

Definition 8

Given a function \(h: GF(2)^u \times GF(2)^v\rightarrow GF(2)^r\) with \(\mathcal {B} \in GF(2)^u, X \in GF(2)^v\), where \(\mathcal {B}\) is the key-related part and the possible condition vector. Let \(\mathcal {B}=(b_0,\ldots ,b_{u-1}) \in GF(2)^u\) and \(\lambda =(\lambda _{0},\lambda _{1},\ldots ,\lambda _{u-1})\in GF(2)^u\) with \(\textit{supp}(\lambda )=\{0 \le i \le u-1|\lambda _i=1\}=\{l_1,\ldots ,l_m\}\) (\(l_j<l_{j+1})\). Then, the shrunken vector of \(\mathcal {B}\) defined by \(\lambda \) is \(\mathcal {B}^{\prime }= (b_{l_1},\ldots ,b_{l_m}) \in GF(2)^m\). Here, \(\lambda \) is called the condition mask of \(\mathcal {B}\). Further, other bits in \(\mathcal {B}\) form another vector and are denoted by \(\mathcal {B}^{*} \in GF(2)^{u-m}\), which is the complement part of \(\mathcal {B}^{\prime }\). We define an operator \('\setminus '\) to represent the above process and have \(\mathcal {B}^{*}=\mathcal {B} \setminus \mathcal {B}^{\prime }\).

This definition indicates that the adversary may not use the full vector as the condition, but only search the correlations conditioned on a subset of \(\mathcal {B}\) defined by a mask \(\lambda \).

Fig. 2
figure 2

The computation process of \(C_t\)

Fig. 3
figure 3

The new computation process of \(C_t\)

For the model, given a condition mask \(\lambda =(\lambda _{t},\lambda _{t+1},\ldots ,\lambda _{t+l-2}) \in \text{ GF }(2)^{n(l-1)}\), where \(\lambda _j \in \text{ GF }(2)^n\) corresponds to \(B_j\) for \(j=t,t+1,\ldots ,t+l-2\), let the condition vector defined by \(\lambda \) be \(\mathcal {B}_{t}^{\prime }\) and its complement \(\mathcal {B}^{*}_{t}\) which includes the other bits. The target function \(h_{\mathcal {B}_{t}}^{\gamma }\) can now be generalized as

$$\begin{aligned} h^{\varLambda }_{\mathcal {B}_{t}^{\prime }}:\sigma _{t},\mathcal {B}^{*}_{t} \rightarrow \gamma \cdot C_t\oplus \eta \cdot \mathcal {B}^{*}_{t}, \end{aligned}$$
(2)

where \(\varLambda =(\gamma ,\eta )\) and \(|\eta |=|\mathcal {B}^{*}_{t}|\).Footnote 6 As we can see, this function induces a large class of correlations based on both the linear mask and the condition mask.

Figure 3 also shows that though the straightforward computation process of \(C_t\) is frustrated by the condition mask \(\lambda \ne \mathbf 1 _{u}\), the bias can still be computed. Since \(\mathcal {B}_{t}\) is the outputs of the LFSRs, it is the key-related material in the first level. In [19], the attacker guesses the full vector \(\mathcal {B}_{t+1}\), while now he/she only needs to guess \(\mathcal {B}_{t}^{\prime }\), a part of \(\mathcal {B}_{t}\), to mount the attack on the model.Footnote 7 This is the reason that the time/memory complexities of the attack can be significantly reduced.

Note that in the initialization phase, \(\mathcal {B}_t\) at level one can be expressed by

$$\begin{aligned} \mathcal {B}_t^i=L_{t}(K)\oplus L_{t}^{\prime }\left( P^i\right) , \end{aligned}$$
(3)

where \(L_t\) and \(L_t^{\prime }\) are the known linear functions dependent on l and t. The knowledge of \(\mathcal {B}_t^i\) will directly lead to the linear equations on the original encryption key. This motivates us to study the bias \(\epsilon (h_{\mathcal {B}_{t}^{\prime }}^{\varLambda })\) defined by a certain condition mask \(\lambda \).

The following property shows that the more knowledge of the LFSR bits \(\mathcal {B}\), the larger conditional correlation we will obtain, which exactly matches the intuition.

Property 9

Given a function f with a partial input \(\mathcal {B}\) and two condition masks \(\lambda _1, \lambda _2\), let \(\mathcal {B}_1\) be the condition vector defined by \(\lambda _1\) and \(\mathcal {B}_2\) be the condition vector defined by \(\lambda _2\). If \(\textit{supp}(\lambda _2) \subseteq \textit{supp}(\lambda _1)\), then we have \(E[\Delta (f_{\mathcal {B}_1})]\ge E[\Delta (f_{\mathcal {B}_2})],\) where equality holds if and only if \(D_{f_{\mathcal {B}_1}}\) is independent of \(\mathcal {B}_1 \setminus \mathcal {B}_2\).

Proof

By Definition 2, we have \(\text{ E }[\Delta (f_{\mathcal {B}_2})]=2^r \sum _{a \in GF(2)^r} \text{ E }_{\mathcal {B}_2}\big [{(D_{f_{\mathcal {B}_2}}(a)-\frac{1}{2^r})^2}\big ]\), where the expectation is taken over uniformly distributed \(\mathcal {B}_2\) for the fixed a. Because of \(D_{f_{\mathcal {B}_2}}(a)=E_{\mathcal {B}_1 \setminus \mathcal {B}_2}[D_{f_{\mathcal {B}_1}}(a)]\) for any fixed a, we have

$$\begin{aligned} E_{\mathcal {B}_2}[\Delta (f_{\mathcal {B}_2})]&=2^r\sum _{a \in GF(2)^r}E_{\mathcal {B}_2}\left[ \left( E_{\mathcal {B}_1 \setminus \mathcal {B}_2}[D_{f_{\mathcal {B}_1}}(a)]-\frac{1}{2^r}\right) ^2\right] \\&=2^r\sum _{a \in GF(2)^r}E_{\mathcal {B}_2}\left[ E_{\mathcal {B}_1 \setminus \mathcal {B}_2}^2\left[ D_{f_{\mathcal {B}_1}}(a)-\frac{1}{2^r}\right] \right] \\&\le 2^r\sum _{a \in GF(2)^r}E_{\mathcal {B}_2}\left[ E_{\mathcal {B}_1 \setminus \mathcal {B}_2}\left[ \left( D_{f_{\mathcal {B}_1}}(a)-\frac{1}{2^r}\right) ^2\right] \right] \\&=2^r\sum _{a \in GF(2)^r}E_{\mathcal {B}_2,\mathcal {B}_1 \setminus \mathcal {B}_2}\left[ \left( D_{f_{\mathcal {B}_1}}(a)-\frac{1}{2^r}\right) ^2\right] =E_{\mathcal {B}_1}[\Delta (f_{\mathcal {B}_1})]. \end{aligned}$$

The inequality is obtained according to the theory of statistics that for any fixed \(a, E_{\mathcal {B}_1 \setminus \mathcal {B}_2}^2\) \([D_{f_{\mathcal {B}_1}}(a)-\frac{1}{2^r}]\le E_{\mathcal {B}_1 \setminus \mathcal {B}_2}[(D_{f_{\mathcal {B}_1}}(a)-\frac{1}{2^r})^2]\) where equality holds if and only if \(D_{f_{\mathcal {B}_1}}\) is independent of the condition vector \(\mathcal {B}_1 \setminus \mathcal {B}_2\). \(\square \)

From this property, give a function \(h: \text{ GF }(2)^u \times \text{ GF }(2)^v\rightarrow \text{ GF }(2)^r\) with \(\mathcal {B} \in \text{ GF }(2)^u, X \in \text{ GF }(2)^v\) and a condition mask \(\lambda \), we have \(\text{ E }[\Delta (h_{\mathcal {B}})] \ge \text{ E }[\Delta (h_{\mathcal {B}^{\prime }})] \ge \Delta (h).\) Moreover, for a fixed condition mask \(\lambda \), its maximum bias \(\text{ max }_{\varLambda }(\text{ E }[\Delta (h^{\varLambda }_{\mathcal {B}_{t}^{\prime }})])\) among all the linear masks \(\varLambda \) is an essential measure of it. The larger the maximum bias, the better the condition mask is. The best choice of the condition mask can be determined according to the context of the underlying primitive.

4.3 Key Recovery Attacks on the Model

As mentioned before, the essential problem lies in the core is to distinguish a biased sample sequence from a pool of random-like sample sequences. Since the involved sample sequences are derived from some key-related information, this distinguisher can be used to identify the correct key. Formally, given a function \(f: \text{ GF }(2)^m \times \text{ GF }(2)^{u-m} \times \text{ GF }(2)^v\rightarrow \text{ GF }(2)^r\) and a condition mask \(\lambda \), let

$$\begin{aligned} f_{\mathcal {B}^{\prime }}(\mathcal {B}^*,X)=f(\mathcal {B}^{\prime },\mathcal {B}^*,X), \end{aligned}$$

where \(\mathcal {B}=\mathcal {B}^{\prime }\cup \mathcal {B}^* \in \text{ GF }(2)^u, X \in \text{ GF }(2)^v\). Here, the condition vector defined by \(\lambda \) is \(\mathcal {B}^{\prime } \in \text{ GF }(2)^m\) and \(\mathcal {B}^*=\mathcal {B}\setminus \mathcal {B}^{\prime }\). If \(\mathcal {B}^{\prime }\) is determined by \(\kappa \)-bit key information, then denote by \(\mathcal {B}'^{\mathcal {K}}\) the value derived when the guessing value of the key material is \(\mathcal {K}\), then the formal description of the problem is as follows.

Definition 10

There are \(2^{\kappa }\) sequences of \(\mathcal {N}\) samples with the following characteristics: one biased sequence has \(\mathcal {N}\) samples \((f_{\mathcal {B}'^{\mathcal {K}}_{i}},\mathcal {B}'^{\mathcal {K}}_{i})\) \((i=1,\ldots ,\mathcal {N})\) with the correct key \(\mathcal {K}\); the other \(2^{\kappa }-1\) sequences consists of \(\mathcal {N}\) independently and uniformly distributed random variables \((Z_{i}^K,\mathcal {B}'^{K}_{i})\) \((i=1,\ldots ,\mathcal {N})\) with the wrong keys. The problem is to efficiently distinguish the biased sequence from the other sequences with the minimum number \(\mathcal {N}\) of samples.

Following [2], the minimum number \(\mathcal {N}\) of samples for an optimal distinguisher using the unconditional correlation to effectively distinguish a sequence of \(\mathcal {N}\) output samples of f from \((2^{\kappa }-1)\) truly random sequences of equal length is

$$\begin{aligned} \mathcal {N}= \frac{4\kappa \log 2}{\Delta (f)}, \end{aligned}$$

while with the smart distinguisher in [19] based on the condition vector \(\mathcal {B}\), the number of sample needed is

$$\begin{aligned} \mathcal {N}_{\mathcal {B}}= \frac{4\kappa \log 2}{\text{ E }[\Delta (f_{\mathcal {B}})]}. \end{aligned}$$

Since \(\text{ E }[\Delta (f_{\mathcal {B}})]\ge \Delta (f)\), we have \(\mathcal {N}_{\mathcal {B}}\le \mathcal {N}\). In our condition masking terminology, we have the following theorem on the attack complexities.

Theorem 11

Given a condition mask \(\lambda \), Algorithm 1 solves the problem in Definition 10 with

$$\begin{aligned} \mathcal {N}_{\mathcal {B}^{\prime }}= \frac{4\kappa \log 2}{\text{ E }[\Delta (f_{\mathcal {B}^{\prime }})]} \end{aligned}$$

samples and the time complexity is \(O(\mathcal {N}_{\mathcal {B}^{\prime }}\cdot 2^{\kappa })\), where the condition bits \(\mathcal {B}^{\prime }\) is defined by \(\lambda \), the expectation is taken over all the uniformly distributed \(\mathcal {B}^{\prime }\). Further, if the \(\mathcal {B}'^K_i\) and \(Z_i^K\) can be expressed by

$$\begin{aligned} \mathcal {B}'^K_i&=L(K)\oplus a_i, \end{aligned}$$
(4)
$$\begin{aligned} Z_i^K&=L^{\prime }(K)\oplus a_i^{\prime } \oplus g\left( \mathcal {B}'^K_i\right) , \end{aligned}$$
(5)

for all \(\kappa \)-bit K and \(i=1,2,\ldots , \mathcal {N}\), where g is an arbitrary function, \(L,L^{\prime }\) are linear functions, and \(a_i, a_i^{\prime }\) are independently and uniformly distributed constants known to the distinguisher. Under these assumptions, we can use the FWT algorithm to achieve the optimal time complexity \(O(\mathcal {N}_{\mathcal {B}^{\prime }}+\kappa 2^{\kappa +1})\) with pre-computation \(O(\kappa 2^\kappa )\) and \(|\mathcal {B}^{\prime }|=\kappa \).

Proof

The case \(\lambda =\mathbf 1 _{u}\) was proved in [19]. When \({\lambda } \ne \mathbf 1 _{u}\), we can make a substitution \(T=\lambda \diamond \mathcal {B}\) and use the same way to prove this theorem, where \(\diamond \) represents the action of the condition mask on \(\mathcal {B}\). \(\square \)

figure a

Remarks

There is a subtle difference between the case \(\lambda =\mathbf 1 _{u}\) and \({\lambda } \ne \mathbf 1 _{u}\), i.e., our framework is different from the one in [19]. Precisely, the premises (4) and (5) when \({\lambda } \ne \mathbf 1 _{u}\) can be the same as those when \({\lambda } = \mathbf 1 _{u}\) in many cases, i.e., even if \({\lambda } \ne \mathbf 1 _{u}\), we can still have the same conclusion about the complexity reduction under the premise of \({\lambda } = \mathbf 1 _{u}\):

$$\begin{aligned} \mathcal {B}_i^K&=L(K)\oplus a_i, \end{aligned}$$
(4')
$$\begin{aligned} Z_i^K&=L^{\prime }(K)\oplus a_i^{\prime } \oplus g(\mathcal {B}_i^k). \end{aligned}$$
(5')

This fact results from the linear approximation process of the underlying primitive. For the Bluetooth two-level E0, we will demonstrate this issue in the following. We believe there are other cases that our arguments hold.

We should not ignore the impact of the cardinality of the condition vector \(|\mathcal {B}^{\prime }|=\kappa \) on the time/memory complexities. It is easy to see that for \(\lambda \ne \mathbf 1 _{u}\), the cardinality \(\kappa \) can be reduced and the time/memory complexities can be exponentially reduced accordingly. It is expected that with a careful choice of the condition mask, we can get better tradeoffs on the time/memory/data complexity curve compared to the case \(\lambda =\mathbf 1 _{u}\). This is why, we introduce the notion of condition masking.

Further, note that not all the bits in the condition vector \(\mathcal {B}\) have the same influence on the correlation. In fact, some are more important than others, i.e., it is of high probability that only a subset of the condition bits can determine the magnitude of the correlation. Thus, it is the crucial task of the adversary to determine the most important part of the condition vector for each specified primitive.

Next, we build the linear approximations of the two-level model with condition masking. The linear approximation is based on the re-initialization property of the model, detailed in Sect. 3. As previously stated, we make the following assumption.

Assumption 1

The affine transform \(\mathcal {G}_3\) is just a bit permutation of the input variables, i.e., no linear combination among the manipulated bits is introduced when loading the last \(L=\sum _{i=1}^{n}L_{i}\) bits generated at the end of the first level into the n LFSRs at the beginning of the second level.

Throughout this paper, in order to distinguish the \(\xi _t,\varPsi _t\) at the first level and \(\xi _{t^{\prime }},\varPsi _{t^{\prime }}\) at the second level, we introduce some notations as follow. Let \(R_t = \xi _t,V_{t^{\prime }}=\xi _{t^{\prime }}\) and \(\alpha _t = \varPsi _t,\beta _{t^{\prime }}=\varPsi _{t^{\prime }}\). Denote the last generated L bits at the first level by \(S^i_{[-L+1,\ldots ,0]}\) in the model, where \(S^i_{[-L+1,\ldots ,0]}=R^i_{[-L+1,\ldots ,0]} \oplus \alpha ^i_{[-L+1,\ldots ,0]}\). We also have

$$\begin{aligned} V^i_{[1,\ldots ,L]}=\mathcal {G}_3(R^i_{[-L+1,\ldots ,0]}) \oplus \mathcal {G}_3(\alpha ^i_{[-L+1,\ldots ,0]}). \end{aligned}$$

For brevity, we define \((U^i_1,\ldots ,U^i_{L})=\mathcal {G}_3(R^i_{[-L+1,\ldots ,0]})\). According to \(\mathcal {G}_3, V^i\) can be expressed as

$$\begin{aligned} V^i_{t^{\prime }}=U^i_{t^{\prime }}\oplus \bigoplus _{j=1}^{n} \alpha ^i_{t_j}, \text { for } t^{\prime }=1,\ldots ,L_1, \end{aligned}$$

where \(t_i\) are the fixed time instants of \(\alpha ^i\) before the application of \(\mathcal {G}_3\) dependent on each considered primitive.

Note that we have \(U^i_{t^{\prime }}=H_{t^{\prime }}(K) \oplus H^{\prime }_{t^{\prime }}(P^i)\), where \(H_{t^{\prime }},H^{\prime }_{t^{\prime }}\) are public linear functions dependent on \(t^{\prime }\). At the second level, \(z_{t^{\prime }}=V_{t^{\prime }}\oplus \beta _{t^{\prime }}\) holds. Hence we have

$$\begin{aligned} z_{t^{\prime }}\oplus H_{t^{\prime }}(K)\oplus H^{\prime }_{t^{\prime }}(P^i)= \bigoplus _{j=1}^{n} \alpha ^i_{t_j}\oplus \beta ^i_{t^{\prime }},\text { for } t^{\prime }=1,\ldots ,L_1. \end{aligned}$$
(6)

Given a linear mask \(\gamma \) with \(|\gamma |=l\), let \(Z_{t^{\prime }}^i=(z_{t^{\prime }}^i,\ldots ,z_{t^{\prime }+l-1}^i)\). Since at the second level, the L-bit keystream \(S^i_t\) are loaded back into the n LFSRs according to the bit permutation \(\mathcal {G}_3\), then Eq. (6) can be rewritten with the linear mask notation as

$$\begin{aligned} \mathcal {G}_3(\gamma ) \cdot (Z_{t^{\prime }}^i\oplus \mathcal {L}_{t^{\prime }}(K)\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))=\bigoplus _{j=1}^{n}(\gamma \cdot C_{t_j}^i)\oplus \mathcal {G}_3(\gamma ) \cdot C_{t^{\prime }}^i, \end{aligned}$$
(7)

for \(i=1,\ldots ,\mathcal {N}, \mathcal {L}_{t^{\prime }},\mathcal {L}^{\prime }_{t^{\prime }}\) are fixed linear functions which can be derived from \(H_{t^{\prime }},H^{\prime }_{t^{\prime }}\) and \(\mathcal {G}_3(\gamma )\) is the resultant linear mask after the restricted action of the bit permutation \(\mathcal {G}_3\). Equation (7) corresponds to the case of \(\lambda = \mathbf {1}_{u}\).

By Eq. (2), we can rewrite this equation as follows:

$$\begin{aligned}&\mathcal {G}_3(\gamma ) \cdot (Z_{t^{\prime }}^i\oplus \mathcal {L}_{t^{\prime }}(K)\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))\oplus \bigoplus _{j=1}^n (\eta \cdot \mathcal {B}^{*i}_{t_j})\nonumber \\&\qquad =\bigoplus _{j=1}^n(\gamma \cdot C_{t_j}^i\oplus \eta \cdot \mathcal {B}^{*i}_{t_j})\oplus \mathcal {G}_3(\gamma ) \cdot C_{t^{\prime }}^i\;. \end{aligned}$$
(8)

For brevity, given masks \(\lambda \) and \(\varLambda \), we use the simplified notations \(h^{\varLambda }_{\mathcal {B}'^i_{t}},h^{\mathcal {G}_3(\gamma )}\) to denote \(h^{\varLambda }_{\mathcal {B}'^i_{t}}(\mathcal {B}^{*i}_{t},\sigma _{t}^i), h^{\mathcal {G}_3(\gamma )}(\mathcal {B}^i_{t^{\prime }},\theta ^i_{t^{\prime }})\) hereafter. Besides, Eq. (3) implies that \(\mathcal {B}^{*i}_{t}=\mathcal {B}^i_{t}\setminus \mathcal {B}'^i_{t}\) is the linear combination of K and \(P^i\). Now Eq. (8) becomes

$$\begin{aligned} \mathcal {G}_3(\gamma ) \cdot (Z_{t^{\prime }}^i\oplus \mathcal {L}_{t^{\prime }}(K)\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))\oplus \eta \cdot (L_1(K)\oplus L_2(P^i)) =\bigoplus _{j=1}^n h^{\varLambda }_{\mathcal {B}'^i_{t_j}} \oplus h^{\mathcal {G}_3(\gamma )}, \end{aligned}$$
(9)

where \(L_1,L_2\) are public linear functions and \(h^{\mathcal {G}_3(\gamma )}\) is the unconditioned function at the second level. Equation (9) is the hybrid bitwise linear approximation based on condition masking for the two-level model in Fig. 1, where \(h^{\varLambda }_{\mathcal {B}'^i_{t_j}}\) are derived from the first level and \(h^{\mathcal {G}_3(\gamma )}\) contains the unconditional correlation for the second level.

4.4 Bitwise Key Recovery Attack on the Model

Given the condition mask \({\lambda }\) and the linear masks \(\varLambda =(\gamma ,\eta )\), we define the following sign function to estimate the effective value of \(h^{\varLambda }_{\mathcal {B}'^i_{t}}\) (Eq. (2)):

$$\begin{aligned} g^{\varLambda }(\mathcal {B}'^i_{t})= \left\{ \begin{array}{ll} 1,\ \text {if} \ \epsilon \left( h^{\varLambda }_{\mathcal {B}'^i_{t}}\right) >0 \\ 0,\ \text {if} \ \epsilon \left( h^{\varLambda }_{\mathcal {B}'^i_{t}}\right) <0 \end{array} \right. \end{aligned}$$
(10)

for all \(\mathcal {B}'^i_{t} \in \text{ GF }(2)^{wt(\lambda )}\) such that \(\epsilon (h^{\varLambda }_{\mathcal {B}'^i_{t}})\ne 0\). For brevity, let

$$\begin{aligned} \mathcal {B}^i_{\lambda } =(\mathcal {B}'^i_{t_1}, \mathcal {B}'^i_{t_2}, \ldots , \mathcal {B}'^i_{t_n}),\;\; \mathcal {X}^i=( Y^i_{t_1}, Y^i_{t_2}, \ldots , Y^i_{t_n}, X^i_{t^{\prime }},\mathcal {B}^i_{t^{\prime }} ), \end{aligned}$$

where \(Y^i_{t_j}=(\sigma ^i_{t_j},\mathcal {B}^*_{t_j})\) is the unknown input to \(h^{\varLambda }_{\mathcal {B}'^i_{t_j}}\), and \(X^i_{t^{\prime }},\mathcal {B}^i_{t^{\prime }}\) are the inputs to \(h^{\mathcal {G}_3(\gamma )}\). By Eq. (3) and (9), the knowledge of the key K is contained in \(\mathcal {B}'^i_{t_j}, \mathcal {L}_{t^{\prime }}(K)\) and \(L_1(K)\). Let \(K_1=(L_{t_1}(K),L_{t_2}(K),\ldots ,L_{t_n}(K))\) be the \(wt(\lambda )n\) bits contained in \(\mathcal {B}^i_{\lambda }\) and \(K_2=\mathcal {G}_3(\gamma )\cdot \mathcal {L}_{t^{\prime }}(K)\oplus \eta \cdot L_1(K)\) be the subkeys. Denote by \(\widetilde{\cdot }\) the guessed value of the argument. First, choose an appropriate condition mask \({\lambda }\) and guess the subkeys \(\widetilde{K_1},\widetilde{K_2}\). As \(P^i\) is known for each frame \(i=1,\ldots ,\mathcal {N}\), we can compute the condition vector \(\mathcal {B}^i_{\lambda }\). Second, to distinguish the correct keys from the wrong ones, we define a mapping \(\mathcal {F}^{\varLambda }_{B^i_{\lambda }}(\mathcal {X}^i)\) as follows.

$$\begin{aligned} \mathcal {F}^{\varLambda }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)= \left\{ \begin{array}{ll} &{}\bigoplus _{j=1}^n \left( h^{\varLambda }_{\mathcal {B}'^i_{t_j}}\oplus g^{\varLambda }\left( \widetilde{\mathcal {B}'^i_{t_j}}\right) \right) \oplus h^{\mathcal {G}_3(\gamma )},\quad \text {if}\ \prod ^n_{j=1}\epsilon \left( h^{\varLambda }_{\mathcal {B}'^i_{t_j}}\right) \ne 0\\ &{}\text {a truly random bit},\quad \quad \quad \quad \quad \quad \quad \quad \quad \text {otherwise} \end{array} \right. \end{aligned}$$

With Eq. (10) the value of \(\mathcal {F}^{\varLambda }_{B^i_{\lambda }}(\mathcal {X}^i)\) can be computed as

$$\begin{aligned} \mathcal {F}^{\varLambda }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)= \mathcal {G}_3(\gamma )\cdot \left( Z_{t^{\prime }}^i\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i)\right) \oplus \eta \cdot L_2(P^i) \oplus \widetilde{K_2}\oplus \bigoplus ^n_{j=1} g^{\varLambda }\left( \widetilde{\mathcal {B}'^i_{t_j}}\right) . \end{aligned}$$

If \(\mathcal {N}\) frames are available, we can compute the value of \(\mathcal {F}^{\varLambda }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)\) for each possible key by the above equation \(\mathcal {N}\) times. With appropriate choice of \(\varLambda \) and \(\lambda \), if \(K_1,K_2\) are correctly guessed, then \(\text{ E }[\Delta (\mathcal {F}^{\varLambda }_{B^i_{\lambda }}(\mathcal {X}^i))] > 0\) and we expect \(\mathcal {F}^{\varLambda }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)\) equals one most of the times. Otherwise, \(\mathcal {F}^{\varLambda }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)\) is estimated by the uniform distribution. Third, we get \(\mathcal {N}\) outputs of the source for every possible key. Submitting these samples to the distinguisher in Algorithm 1, with the \(\kappa =wt(\lambda )n+1,u=n(l-1),m=wt(\lambda ),v=(n+1)k+n(n+1) (l-1)-wt(\lambda )n\) and \(r=1\), we are expected to successfully restore the correct keys.

4.5 Vectorial Key Recovery Attack on the Model

Now we enhance the above attack by using multiple linear approximations simultaneously. Since the conditional correlations based on condition masking are not likely to be larger than those based on the whole condition vector, we appeal to the vectorial approach to keep the data complexity as low as possible.

Assume we use s mutually independent linear approximations. Let \(\varGamma =(\varLambda _1,\ldots ,\varLambda _s)\) and \(\varGamma ^{\prime }=(\mathcal {G}_{3}(\gamma _1),\ldots ,\mathcal {G}_{3}(\gamma _s))\) denote the linear mask of these s approximations, where \(\varLambda _i=(\gamma _i,\eta _i)\), and \(|\gamma _1|=\cdots =|\gamma _s|=l\) with \(s<l\). Let

$$\begin{aligned} \mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)= \left( \mathcal {F}^{\varLambda _1}_{\mathcal {B}^i_{\lambda }}, \ldots ,\mathcal {F}^{\varLambda _s}_{\mathcal {B}^i_{\lambda }}\right) , g^{\varGamma }=\left( g^{\varLambda _1}(\mathcal {B}'^i_{t}), \ldots ,g^{\varLambda _s}(\mathcal {B}'^i_{t})\right) \end{aligned}$$

and \(h_{\mathcal {B}'^i_{t}}^{\varGamma }=(h^{\varLambda _1}_{\mathcal {B}'^i_{t}},\ldots ,h^{\varLambda _s}_{\mathcal {B}'^i_{t}}),\; h^{\varGamma ^{\prime }}=(h^{\mathcal {G}_3(\gamma _1)},\ldots ,h^{\mathcal {G}_3(\gamma _s)})\). Here, the first \(g^{\varLambda _1}(\mathcal {B}'^i_{t})\) in \(g^{\varGamma }\) is determined by Eq. (10). The other bits are determined as follows: for the j-th bit, we just let it be an uniformly distributed bit if \(\epsilon (h_{\mathcal {B}'^i_{t_j}}^{\varLambda _1})=0\), otherwise take 0 or 1 according to the definition in Eq. (10). Since we have found the efficient condition mask \(\lambda \) and linear mask \(\varLambda _1=(\gamma _1,\omega _1)\) in the bitwise attack, we extend \(\mathcal {F}^{\varLambda _1}_{\mathcal {B}^i_{{\lambda }}}\) to a s-dimensional vector, i.e.,

$$\begin{aligned} \mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)= \left\{ \begin{array}{ll} &{}\bigoplus ^n_{j=1}\left( h_{\mathcal {B}'^i_{t_j}}^{\varGamma } \oplus g^{\varGamma }\left( \widetilde{\mathcal {B}'^i_{t_j}}\right) \right) \oplus h^{\varGamma ^{\prime }},\quad \quad \text {if}\ \prod ^n_{j=1}\epsilon \left( h_{\mathcal {B}'^i_{t_j}}^{\varLambda _1}\right) \ne 0\\ &{}\text {a uniformly distributed }s\text {-bit vector},\quad \quad \quad \text {otherwise.} \end{array} \right. \end{aligned}$$

In this way, we have constructed an approximation of two-level model in the vectorial approach. For the correct guess \(\widetilde{K}=K\), we have \(\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)=\bigoplus ^n_{j=1}(h_{\mathcal {B}'^i_{t_j}}^{\varGamma } \oplus g^{\varGamma }(\mathcal {B}'^i_{t_j}))\oplus h^{\varGamma ^{\prime }}\) and \(\text{ E }[\Delta (\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i))]>0\). For each wrong guess, the components of the s-dimensional vector \(\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}\) are uniformly distributed and we estimate the distribution \(D_{\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i)}\) as a s-bit uniform distribution for all i such that \(\text{ E }[\Delta (\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i))]=0\).

With the appropriate choice of \(\varGamma =(\varLambda _1,\ldots ,\varLambda _s)\), we can get larger correlation values than those in the bitwise case. Thus, the data complexity \(N_{\mathcal {B}^{\prime }}\) is effectively reduced compared to the bitwise attack. Again, submitting \(2^{\kappa }\) sequences of \(\mathcal {N}_{\mathcal {B}^{\prime }}\) pairs \((\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i),\widetilde{\mathcal {B}^i_{{\lambda }}})\) to Algorithm 1, we can eventually recover the \(\kappa \)-bit K.

Now we study how to choose the linear mask vector \(\varGamma \). We first select a linear mask \(\varLambda _1=(\gamma _1,\eta _1)\) in the bitwise attack. Under this \(\varLambda _1\), we search for other masks \(\varLambda _j (j\ge 2)\) to maximize the total correlation. The following theorem provides a guideline for an adversary to construct the vector by depicting the criterion when he/she could gain in correlation by moving from \((s-1)\)-dimension unit to s-dimension unit.

Theorem 12

Let \(\varGamma _s=(\varLambda _1,\ldots ,\varLambda _s)\) be the linear mask in the s-dimensional attack with condition vector \(\mathcal {B}\) and condition mask \(\lambda \). Denote the joint probability by \(P_{a_1\cdots a_s}=P(h^{\varLambda _1}_{\mathcal {B}^{\prime }}=a_1,\ldots ,h^{\varLambda _s}_{\mathcal {B}^{\prime }}=a_s)\), where \(a_i \in GF(2)\) for \(1\le i \le s\). Let \(P_{00\cdots 00}=\frac{1}{2^s}+\xi _{00\cdots 00}, P_{00\cdots 01}=\frac{1}{2^s}+\xi _{00\cdots 01}, \ldots , P_{11\cdots 11}=\frac{1}{2^s}+\xi _{11\cdots 11}\), where \(-\frac{1}{2^s} \le \xi _j \le \frac{1}{2^s}\) for all \(j \in GF(2)^s\) and \(\sum _{j \in GF(2)^s}\xi _j=0\), then \(\Delta (h^{\varGamma _s}_{\mathcal {B}^{\prime }}) \ge \Delta (h^{\varGamma _{s-1}}_{\mathcal {B}^{\prime }}),\) where the equality holds if and only if

$$\begin{aligned} \xi _{00\cdots 00}=\xi _{00\cdots 01}, \xi _{00\cdots 10}=\xi _{00\cdots 11},\ldots ,\xi _{11\cdots 10}=\xi _{11\cdots 11}. \end{aligned}$$

Proof

From the assumption, the \((s-1)\)-dimensional joint probability can be computed asFootnote 8 \(P_{00\cdots 0*}=\frac{1}{2^{s-1}}+\xi _{00\cdots 00}+\xi _{00\cdots 01}, P_{00\cdots 1*}=\frac{1}{2^{s-1}}+\xi _{00\cdots 10}+\xi _{00\cdots 11}\),\(\cdots \),\(P_{11\cdots 1*}=\frac{1}{2^{s-1}}+\xi _{11\cdots 10}+\xi _{11\cdots 11}\). By the definition of SEI, we have

$$\begin{aligned} \Delta (h^{\varGamma _s}_{\mathcal {B}^{\prime }})&=2^s(\xi ^2_{00\cdots 00}+\xi ^2_{00\cdots 01}+\cdots +\xi ^2_{11\cdots 11}),\\ \Delta (h^{\varGamma _{s-1}}_{ \mathcal {B}^{\prime }})&=2^{s-1}((\xi _{00\cdots 00}+\xi _{00\cdots 01})^2+ (\xi _{00\cdots 10}+\xi _{00\cdots 11})^2\\&\quad +\cdots +(\xi _{11\cdots 10}+\xi _{11\cdots 11})^2). \end{aligned}$$

We can see that \(\Delta (h^{\varGamma _s}_{\mathcal {B}^{\prime }})-\Delta (h^{\varGamma _{s-1}}_{\mathcal {B}^{\prime }})=2^{s-1}((\xi _{00\cdots 00}-\xi _{00\cdots 01})^2+(\xi _{00\cdots 10}-\xi _{00\cdots 11})^2+\cdots +(\xi _{11\cdots 10}-\xi _{11\cdots 11})^2) \ge 0\), from which we can easily derive the conclusion. \(\square \)

This theorem indicates that high-dimensional attack will always be better than or at least be the same as low-dimensional attacks. Besides, if an adversary choose the linear masks following the rules in this theorem, then he could always gain in correlation. Further, there are some other rules when choosing \(\varGamma \). First, the linear masks \(\gamma _j\) for \(j=1,\ldots ,s\) should be linearly independent with \(s \le l-2\). Second, when the key is wrong, \(\mathcal {F}^{\varLambda _j}_{\mathcal {B}^i_{{\lambda }}}\) is an uniformly distributed bit for \(1 \le j \le s\) in the bitwise attack. If they are independent to each other, \(\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}\) follows a s-bit uniform distribution. Thus, when choosing the new \(\varLambda _j=(\gamma _j,\omega _j)\) \((j>1)\), we should keep the independence among the different components \(\mathcal {F}^{\varLambda _j}_{\mathcal {B}^i_{\lambda }}\) for \(j=1,\ldots ,s\).

4.6 Security Bound of the Two-Level Model

Now we derive the security bound of the two-level model from the above attacks. By the definition of \(g^{\varLambda }\) in Eq. (10), for a certain \(\mathcal {B}^i_{\lambda }, g^{\varLambda }(\widetilde{\mathcal {B}'^i_{t_j}})\) is a fixed value not depending on \(\mathcal {X}^i\). Consequently, \(g^{\varGamma }\) has no influence on \(\Delta (\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }})\). Thus, we have the data complexityFootnote 9

$$\begin{aligned} \mathcal {N}_{\mathcal {B}^{\prime }}&=\frac{4\kappa \log {2}}{\text{ E }\left[ \Delta \left( \mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}\right) \right] }\;. \end{aligned}$$
(11)

Now let us look at the time complexity of the attack. From the expression of \(\mathcal {F}^{\varGamma }_{\mathcal {B}^i_{\lambda }}\), it can be easily verified that this expression fulfills the Theorem 11, so our attack can also use the FWT to get the optimal time complexity. For all the subkeys \(K=(K_1,K_2) \in \text{ GF }(2)^{wt(\lambda )n-1} \times \text{ GF }(2)\), where \(K_1\) and \(K_2\) are defined in Sect. 4.4, we define \(\mathcal {H},\mathcal {H^{\prime }}\) as follows:

$$\begin{aligned} \begin{aligned} \mathcal {H}(K)=&\sum ^{\mathcal {N}_{\mathcal {B}^{\prime }}}_{i=1}\mathbf {1}_{L^{\prime }_{t_1}(P^i),\ldots ,L^{\prime }_{t_n}(P^i)=K_1 \ \text {and}\ (x_1,\ldots ,x_s)=(K_2,1,\ldots ,1)},\\ \mathcal {H}^{\prime }(K)=&\left\{ \begin{array}{ll} 0,&{} \text {if} \ \prod ^n_{j=1}\epsilon \left( h_{K_{1,j}}^{\varLambda _1}\right) =0\\ \log {2^sD_{\mathcal {F}^{\varGamma }_{K_{1\lambda }}}((K_2,1,\ldots ,1)\oplus (y_1,\ldots ,y_s))},&{}\qquad \text {otherwise} \end{array}\right. \end{aligned} \end{aligned}$$

where \(x_j=\mathcal {G}_3(\gamma _j)\cdot (Z^i_{t^{\prime }}\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))\oplus \omega _j \cdot L_2(P^i)\) and \(y_j=\bigoplus ^n_{i=1}g^{\varLambda _j}(K_{1,i})\) for \(j=1,\ldots ,s\).

In Algorithm 1, the grade G(K) is a simple convolution between \(\mathcal {H}\) and \(\mathcal {H}^{\prime }\) (also in [19]), thus we have \(G(K)=\frac{1}{2^l}\widehat{\mathcal {H}^{\prime \prime }(K)}\) where \(\mathcal {H}^{\prime \prime }(K)=\widehat{\mathcal {H}}(K)\cdot \widehat{\mathcal {H}^{\prime }}(K)\). Therefore, the total time complexity is \(\mathcal {N}_{\mathcal {B}^{\prime }} + \kappa \cdot 2^{\kappa +1}\).

In order to give the security bound of the two-level model, we first consider the bitwise approximation, i.e., Eq. (9). Let the largest conditional correlation at the first level in the model be \(\epsilon _{max,1}\) and the largest unconditional correlation at the second level be \(\epsilon _{max,2}\) following the restricted consistency of the linear mask \(\mathcal {G}_{3}(\gamma )\). The total correlation of the linear approximation can be derived as \(\epsilon _{t} = \epsilon _{max,1}^n\cdot \epsilon _{max,2}\). Hence, the required data complexity is

$$\begin{aligned} \mathcal {N}_{\mathcal {B}^{\prime }}=\frac{4\kappa \log 2}{\epsilon _{max,1}^{2n}\cdot \epsilon _{max,2}^2}\;, \end{aligned}$$
(12)

and the total time complexity is

$$\begin{aligned} T=\frac{4\kappa \log 2}{\epsilon _{max,1}^{2n}\cdot \epsilon _{max,2}^2}+\kappa \cdot 2^{\kappa +1}. \end{aligned}$$
(13)

From Eqs. (12) and (13), to strengthen the security of the two-level model, we can take some strategy to reduce the conditional correlation \(\epsilon _{max,1}\) and the unconditional correlation \(\epsilon _{max,2}\) to the extent that the resultant \(T > 2^{\kappa }\) and/or \(\mathcal {N}_{\mathcal {B}^{\prime }}> 2^{\kappa }\) for \(\kappa = wt(\lambda )n+1\).

Remarks

Let us take a closer look at the linear approximation process of the above attack in Sect. 4.3. The key reason that the linear approximation at the first level and that at the second level of the model can be connected together and efficiently exploited is that the affine transform \(\mathcal {G}_{3}\) only permutes the keystream bits without any linear combination among them. Thus, each permutated keystream bit is associated with only 1 noise variable from the FSM and when combined together at the second level, there are n noise variables from the FSM at the first level and 1 noise variable from the second level, which ultimately determine the conditional and unconditional bias. Therefore, to reduce the correlations, the most efficient strategy is to increase the number of noise variables from the FSM by some method. What we suggest to reach this aim is as follows.

If we run the core generator at the second level for a number of ticks first without outputting the keystream, i.e., drop off some amount of the keystream prefix at the beginning of the second level, then the noise variables from the first level will propagate and increase the associated number of noise variables for each LFSR variable with the executing of the n LFSRs. It is expected that with an appropriate choice of the number of the dropping off keystream bits at the beginning of the second level, the correlations will be reduced to the desirable extent so that the corresponding time/data complexities of Algorithm 1 will exceed the security bound. \(\square \)

The following theorem gives the relation between the suggested number of keystream bits dropped off at the beginning of the second level and the correlations.

Theorem 13

Let \(\theta \) be the largest unconditional correlation at the first level in the model and \(\rho _{i}=wt(p_{i}(x))-1\) be the number of tap positions of LFSR \(R_i\) \((1\le i\le n)\) of the core generator in the model, and then after dropping off \(tL_{n}\) \((t\ge 1)\) keystream bits at the beginning of the second level, we have

$$\begin{aligned} \epsilon _{uc} \le \theta ^{t\cdot \sum _{i=1}^{n}\rho _{i}+1}, \end{aligned}$$
(14)

where \(\epsilon _{uc}\) is the unconditional correlation of the linear approximation of the model in Eq. (7) in Sect. 4.3; further, if we take the conditional correlation into account at the first level, we have

$$\begin{aligned} \epsilon _{hybrid} \le \epsilon _{max,1}^{t\cdot \sum _{i=1}^{n}\rho _{i}}\cdot \epsilon _{max,2}, \end{aligned}$$
(15)

for the hybrid linear approximation in Eq. (9) in Sect. 4.3.

Proof

First note that for each LFSR i in the generator, the new variable introduced by the LFSR clocking in the model is the xor of \(\rho _{i}\) initial permuted variables at the end of the first level, thus after 1 full circle clocking of the LFSR, i.e., \(L_{i}\) ticks, each variable in the current internal state of LFSR i depends on \(\rho _{i}\) initial permuted variables. Second, after dropping off \(tL_{n}\) \((t\ge 1)\) keystream bits at the beginning of the second level, i.e., after t full circles of the underlying LFSR, each new variable in the current internal state of LFSR i depends on \(t\cdot \rho _{i}\) initial permuted variables in the underlying LFSR.

Besides, from \(z_t = \varPsi _{t} \oplus \bigoplus ^n_{i=1}b^{i}_{t}\) and \(L_{1}<L_{2}<\cdots <L_{n}\), we know that after dropping off \(tL_{n}\) \((t\ge 1)\) keystream bits at the beginning of the second level, the quantity \(\bigoplus ^n_{i=1}b^{i}_{t}\) is associated with at least \(t\cdot \sum _{i=1}^{n}\rho _{i}\) initial permuted variables at the beginning of the second level. Then, by the Eqs.(7) and (9), we complete the proof. \(\square \)

Let \(|K|=\kappa \) and \(|IV|=\varsigma \), and this theorem indicates that to strengthen the security of the two-level model, it suffices to discard some keystream prefix at the beginning of the second level. To frustrate Algorithm 1, it suffices to reduce the involved correlations to the extent that either the data complexity (the number of required frames) in Eq. (12) or the time complexity in Eq. (13) exceed the security bounds, i.e., \(T> 2^{\kappa }\) or \(\mathcal {N}_{\mathcal {B}^{\prime }}> 2^{\varsigma }\), which leads to the following criterion for the two-level model in Fig. 1.

  • It is necessary to discard the first \(tL_{n}\) keystream bits at the beginning of the second level in the model for some \(t\ge 1\) .

Now we are ready to look at the real-world Bluetooth encryption scheme, one instance of the above two-level model, and study its security against the outlined conditional correlation attacks (Fig. 4).

Fig. 4
figure 4

General basic rate packet format

5 Description of Bluetooth Encryption

Bluetooth is a wireless technology standard for exchanging data over short distances (using short-wavelength UHF radio waves in the ISM band from 2.4 to 2.485 GHz) from fixed and mobile devices, and building personal area networks (PANs). In order to provide usage protection and information confidentiality, the system applies security measures at both the application layer and the link layer. These measures are designed to be appropriate for a peer environment. Before introducing the encryption of Bluetooth device, we first describe the packets used by the Bluetooth devices. The general packet format of Basic Rate packets is shown in the following figure. The access code is 72 or 68 bits, and the header is 54 bits. The payload ranges from zero to a maximum of 2790 bits. User information can be protected by encryption of the packet payload; the access code and the packet header shall never be encrypted. The security mechanisms in Bluetooth have three phases: Legacy, Secure Simple Pairing, and Secure Connections, shown in the following table. The encryption of the payload is carried out with a stream cipher, called E0, that shall be re-synchronized for every payload. The description here is according to the official specification in [3]. The size of the secret key used in two-level E0 is 128 bits, and the IV consists of 74 bits, 26 of which are derived from a real time clock, while the remaining 48 address bits are depending on users. The core is a modification of the summation generator with 4-bit memory, i.e., \(\sigma _t=X_t=(c_{t-1},c_t)=(c_{t-1}^1,c_{t-1}^0,c_{t}^1,c_t^0)\), as shown in Fig. 5.

The core keystream generation of E0

Processing:

1: \(z_{t}=b_t^1 \oplus b_t^2 \oplus b_t^3 \oplus b_t^4 \oplus c_t^0\)

2: \(s_{t+1}=(s_{t+1}^1,s_{t+1}^0)=\lfloor \frac{b_t^1 + b_t^2 + b_t^3 + b_t^4+2c_t^1+c_t^0}{2} \rfloor \)

3: \(c_{t+1}^0=s_{t+1}^0 \oplus c_{t}^0 \oplus c_{t-1}^1\oplus c_{t-1}^0, c_{t+1}^1=s_{t+1}^1\oplus c_t^1\oplus c_{t-1}^0\)

4: \((c_{t-1},c_{t})\leftarrow (c_{t},c_{t+1})\)

5: update the LFSRs

Precisely, the keystream generator consists of four regularly clocked LFSRs whose lengths are 25, 31, 33 and 39 bits, respectively (128 bits in total). The LFSRs are indexed in the order of increasing length. All the feedback polynomials are primitive and have 5 nonzero terms each. Their outputs are combined by a Finite State Machine (FSM) with 4 bits memory. At each time t, the following steps are executed (Table 1).

Fig. 5
figure 5

The core keystream generator of E0

Table 1 Security algorithms
Fig. 6
figure 6

The real two-level bluetooth encryption scheme

It is easy to see that the four LFSRs are equivalent to a single 128-bit LFSR whose output bit \(R_t\) is obtained by xoring the outputs of the four basic LFSRs, i.e., \(R_t=b_t^1\oplus b_t^2 \oplus b_t^3 \oplus b_t^4\) and \(z_t=R_t \oplus c^0_t\). Next, we introduce the real two-level E0 scheme, as shown in Fig. 6. As before, we refer the time instant t and \(t^{\prime }\) to the context of E0 level one and level two and denote \(c_t^0,c_{t^{\prime }}^0\) by \(\alpha _t,\beta _{t^{\prime }} \), respectively.

  1. 1.

    (The first level) The LFSRs are preset to zero. Given the secret key K and some IV \(P^{i}\), the LFSRs are initialized linearly as \(R^i_{[-199,\ldots ,-72]}\) \(=(R_{-199}^i,\ldots ,R_{-72}^i)=R^i_{[-199,\ldots ,-72]}\) \(=G_1(K)\oplus G_2(P^i)\), where \(G_1\) and \(G_2\) are public affine transformations over \(\text{ GF }(2)^{128}\).

  2. 2.

    The initial 4 memory bits of FSM are all set to 0. After clocking E0 200 times, we only keep the last produced 128-bit output \(S^i_{[-127,\ldots ,0]}=R^i_{[-127,\ldots ,0]} \oplus \alpha ^i_{[-127,\ldots ,0]}\). Let M be the state transmission matrix of the equivalent LFSR over \(\text{ GF }(2)^{128}\), i.e., \(R^i_{[-127,\ldots ,0]}=M^{72}(R^i_{[-199,\ldots ,-72]})\). Note that because of the linear functions \(G_1,G_2\) and M, the last 128 bits of \(R_t^i\) can be written as \(R_{[-127,\ldots ,0]}^i=(M^{72} \circ G_1)(K)\oplus (M^{72} \circ G_2)(P^i)\).

  3. 3.

    \(S_{[{-127,\ldots ,0}]}^i\) is used to initialized the four LFSRs by a byte-wise affine transformation \(G_3:\text{ GF }(2)^{128}\rightarrow \text{ GF }(2)^{128}\), detailed in Fig. 7, this process can be expressed by \(V_{[1,\ldots ,128]}^i=G_3(S_{[-127,\ldots ,0]}^i)\).

  4. 4.

    (The second level) The FSM initial state remains the same as it was in the end of the first level. Then, E0 produces the keystream \(z_{t^{\prime }}^i=V_{t^{\prime }}^i\oplus \beta _{t^{\prime }}^i\) of the i-th frame for \(t^{\prime }=1,\ldots ,2790\).

Fig. 7
figure 7

Distribution of the last 128 bits in the first level

6 Previous Attacks on Two-Level E0

At Crypto 2005, Lu, Meier and Vaudenay presented a conditional correlation attack on two-level E0 in [19]. They consider several consecutive steps of the generator as a vectorial function and investigate the conditional correlation properties of this function. Based on these properties, if an adversary is given a pool of keystream frames generated with the same key and different IVs,Footnote 10 a statistical distinguisher can be constructed which could distinguish the biased sample sequence from the other sequences consisting of independently and uniformly distributed variables. The biased sample sequence is characterized by some key-related information, which can then be used to identify the correct encryption key.

Note that in the real E0, there is a delay effect in the FSM state, i.e., the current FSM state at time t always contains the previous bit \(c_{t-1}^{0}\), which will make a difference in defining the target function in Eq. (2) to increase the involved time instants by 1, as shown below. Thus, the adversary could gain one time instant for free and reduce the guess space by 4 bits for free. Precisely, there are two sets of inputs to the FSM in E0 encryption scheme at time t, i.e., the four LFSR output bits \(B_t=(b^1_t,b^2_t,b^3_t,b^4_t)\) and the 4 memory register bits \(X_t=(c_{t-1},c_{t})\in \text{ GF }(2)^{4}\). Consider l continuous time instants and let \(\gamma =(\gamma _0,\gamma _1,\ldots ,\gamma _{l-1}) \in \text{ GF }(2)^l\) be a linear mask with \(\gamma _0=\gamma _{l-1}=1\) and \(\bar{\gamma }=(\gamma _{l-1},\gamma _{l-2},\ldots ,\gamma _0) \) be the linear mask in reverse order. Define the inputs as

$$\begin{aligned} \mathcal {B}_{t+1}=B_{t+1}B_{t+2}\cdots B_{t+l-2} \in \text{ GF }(2^{4(l-2)}),X_{t+1}=(c_t,c_{t+1}) \in \text{ GF }(2)^4 \end{aligned}$$

and the FSM outputs \(C_t=(c_t^0,\ldots ,c_{t+l-1}^0)\). Then, the function \(h_{\mathcal {B}_{t+1}}^{\gamma }:X_{t+1}\rightarrow \gamma \cdot C_t\) is also well defined. It is shown in [19] that given \(\mathcal {B}_{t+1}, \gamma \cdot C_{t}\) is heavily biased for properly chosen linear mask \(\gamma \).

A statistical distinguisher can thus be constructed based on the biased distribution of \(\gamma \cdot C_{t}\). Since \(\mathcal {B}_{t+1}\) is the key-related material, the adversary can guess the involved key information and collect a set of sample sequences from the keystream, IVs and the guessed key value. It is expected that with the correct key, the corresponding sample sequence is biased, while for the wrong guesses, the underlying sequence will behave like a random source. By properly choosing the involved parameters, it is shown that the original encryption key K in Fig. 6 can be retrieved with \(2^{38}\) on-line computations, \(2^{38}\) off-line computations and \(2^{33}\) memory, given the first 24 bits of \(2^{23.8}\) frames in theory, while in practical experiments, the attack needs about 19-hour on-line time, 37-hour pre-computation for each key and 64GB storage, given the first 24 bits of \(2^{26}\) frames.

In [19], it is mentioned that this attack was verified only 30 times for a fixed master key with \(2^{26}\) frames, slightly less than the theoretical estimate \(2^{26.5}\) frames. Further, for each possible key, there are 256 equivalent keys, which means that when using a distinguisher to determine the rank of each possible key, there are 256 equivalent candidates having the same grade. In the case that the correct key does not have the highest grade, much more time is needed to search over all the possible keys, e.g., if the grade of the right key ranks the 10th position, then we have to search \(10\cdot 256\approx 2^{11.3}\) possible keys to find the real one. Thus, the successful probability of this attack cannot be guaranteed.

In the following sections, we will show that the adversary need not to guess all the bits in the condition vector \(\mathcal {B}_{t+1}\), actually only a few bits determine the magnitude of the biased distribution of \(\gamma \cdot C_{t}\), and thus we just need to select a condition mask to determine the most important bits in \(\mathcal {B}_{t+1}\). In this way, the time/memory complexities of the above attack can be considerably reduced.

7 Correlations Properties of the Two-Level Bluetooth Encryption

In this section, we will carefully study the correlation properties of the two-level encryption scheme. First, a powerful complete recursive formula to compute the unconditional correlations of E0 is presented. Then, the conditional correlation properties based on condition masking are analyzed and computed.

7.1 Unconditional Correlations in the E0 Keystream Generator

In [21, 22], a recursive formula for the computation of the unconditional correlation in the E0 combiner is presented. However, it only involves the pure FSM variables and cannot cover all the unconditional correlations reported in [10], e.g., the following correlation also has the largest correlation

$$\begin{aligned} \epsilon \left( c_t^0\oplus c_{t+1}^0\oplus c_{t+2}^0\oplus c_{t+3}^0\oplus b_{t+3}^1 \oplus b_{t+3}^2 \oplus c_{t+4}^0\right) =-\frac{25}{256}\;, \end{aligned}$$

but it cannot be found by the previous formula. Another example is

$$\begin{aligned} \epsilon (c_t^0\oplus c_{t+1}^0\oplus c_{t+2}^0\oplus b_{t+3}^1\oplus c_{t+4}^0)= \frac{25}{256}\;. \end{aligned}$$

In general, let \(\varOmega _1(a,\langle \omega ,u \rangle )\) be the correlation \(\epsilon (a \cdot s_{t+1} \oplus \omega \cdot c_t \oplus u \cdot B_t)\) shown in Table 2, where \(a \in \text{ GF }(2)^2, u \in \text{ GF }(2)^4, \omega \in \text{ GF }(2)^2\) and \(B_t\) denote the output bits of four LFSRs at time t. Besides, let \(h:(x^1,x^0)\rightarrow (x^0,x^1\oplus x^0)\) be a permutation over \(\text{ GF }(2)^2\) and \(\epsilon ( \langle a_1,u_1 \rangle , \ldots ,\langle a_{d-1},\) \(u_{d-1} \rangle ,a_d)=\epsilon (a_1\cdot c_1\oplus u_1\cdot B_1\oplus \cdots \oplus a_{d-1}\cdot c_{d-1}\oplus u_{d-1}\cdot B_{d-1} \oplus a_{d} \cdot c_{d}),\) where \(u_1,\ldots ,u_{d-1} \in \text{ GF }(2)^4\). We can derive the above correlation coefficient as follows.Footnote 11

$$\begin{aligned} \epsilon ( \langle 1,0 \rangle ,\langle 1,0 \rangle ,\langle 1,0\rangle ,\langle 0,1\rangle ,1) =&\sum _{\omega }\varOmega _1(1,\langle \omega ,1 \rangle )\cdot \epsilon (\langle 1,0\rangle ,\langle 1,0 \rangle , \langle 2,0 \rangle ,1\oplus \omega )\\ =&-\frac{1}{4}\cdot \epsilon \left( \langle 1,0 \rangle , \langle 1,0 \rangle , \langle 2,0 \rangle , 2\right) \\ =&-\frac{1}{4}\sum _{\omega }\varOmega _1(2,\langle \omega ,0 \rangle )\cdot \epsilon (\langle 1,0 \rangle , \langle 0,0 \rangle ,\omega )\\ =&\frac{1}{16}\cdot \epsilon (\langle 1,0 \rangle ,\langle 0,0 \rangle ,1) +\frac{5}{32}\cdot \epsilon (\langle 1,0 \rangle ,\langle 0,0 \rangle ,2)\\ =&\frac{1}{16}\sum _{\omega _1}\varOmega _1(1,\langle \omega _1,0 \rangle )\cdot \epsilon (\langle 2,0 \rangle ,1\oplus \omega _1)\\&\quad +\frac{5}{32}\sum _{\omega _2}\varOmega _1(2,\langle \omega _2,0 \rangle )\cdot \epsilon (\langle 0,0 \rangle ,2\oplus \omega _2)\\ =&\frac{5}{32}\left( \frac{1}{4}\epsilon (\langle 0,0 \rangle ,3)+\frac{5}{8}\epsilon ( \langle 0,0 \rangle ,0)\right) =\frac{25}{256}\;. \end{aligned}$$

Our complete formula for the computation of unconditional correlations of E0 is yielded in the following theorem.

Table 2 Biases of all the linear combinations of \(\varOmega _1(a,\langle u,\omega \rangle )\)

Theorem 14

If the initial states of FSM and LFSR are both uniformly distributed, then we have

$$\begin{aligned}&\epsilon (\langle a_1,u_1\rangle ,\ldots , \langle a_{d-1},u_{d-1}\rangle ,a_d) =\,-\sum _{\omega \in GF(2)^2}\varOmega _1(a_d,\langle \omega ,u_{d-1} \rangle )\\&\quad \cdot \epsilon (\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-2}\oplus h(a_d),u_{d-2} \rangle ,a_{d-1}\oplus a_d\oplus \omega ). \end{aligned}$$

Proof

First note that according to the description of the real two-level E0, we can regard the initial states of the LFSRs and the FSM as uniformly distributed random variables, and thus the premise is met. To simplify the analysis, let \(Z \in \text{ GF }(2)^2\) be a random variable independent of \(B_{d-1}\) with uniform distribution. By the keystream generation of E0 in Sect. 5, define \(f: \text{ GF }(2)^4 \times \text{ GF }(2)^2 \rightarrow \text{ GF }(2)\) and \(g: \text{ GF }(2)^{6(d-2)+2} \rightarrow \text{ GF }(2)^2\) as follows.

$$\begin{aligned} f(X,g(Y))=a_d\cdot s_d \oplus u_{d-1}\cdot B_{d-1}, \end{aligned}$$

where \(g(Y)=c_{d-1}, X=B_{d-1}\) and

$$\begin{aligned} Y&=(\langle c_1,B_1 \rangle ,\ldots ,\langle c_{d-3},B_{d-3} \rangle , \langle c_{d-2},B_{d-2}\rangle ,c_{d-1}),\\ v&=(\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-3},u_{d-3} \rangle ,\langle a_{d-2}\oplus h(a_d),u_{d-2}\rangle ,a_{d-1}\oplus a_d). \end{aligned}$$

With this simplified expression, we have:

$$\begin{aligned}&\sum _{\omega \in GF(2)^2}\varOmega _1(a_d,\langle \omega ,u_{d-1}\rangle )\cdot \epsilon (\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-2}\oplus h(a_d),u_{d-2}\rangle , a_{d-1}\oplus a_d\oplus \omega )\nonumber \\&\quad =\sum _{\omega \in GF(2)^2}\epsilon (f(X,Z) \oplus \omega \cdot Z)\cdot \epsilon (\omega \cdot g(Y) \oplus v\cdot Y) \nonumber \\&\quad =\sum _{\omega ,x,z}P(X=x,Z=z)\cdot (-1)^{f(x,z)\oplus \omega \cdot z}\sum _{y}P(Y=y)\cdot (-1)^{\omega \cdot g(y)\oplus v\cdot y} \nonumber \\&\quad =\sum _{x,z,y}P(X=x,Z=z)P(Y=y)(-1)^{f(x,z)\oplus v\cdot y }\sum _{\omega }(-1)^{\omega \cdot (z \oplus g(y))} \nonumber \\&\quad =\sum _{x,y}P(X=x,Y=y)(-1)^{a_d\cdot s_d \oplus u_{d-1}\cdot B_{d-1}\oplus v\cdot y}\nonumber \\&\quad =\sum _{x,y}P(X=x,Y=y)(-1)^{a_1\cdot c_1 \oplus u_1 \cdot B_1 \oplus \cdots \oplus u_{d-1}\cdot B_{d-1} \oplus a_d\cdot c_d}\nonumber \\&\quad =-\epsilon (\langle a_1,u_1\rangle ,\ldots ,\langle a_{d-1},u_{d-1}\rangle ,a_d). \end{aligned}$$
(16)

Equation (16) is derived according to \(a_d\cdot c_d=a_d\cdot s_d \oplus a_d\cdot c_{d-1} \oplus h(a_d)\cdot c_{d-2}\). \(\square \)

Theorem 14 can compute all the unconditional correlations of the E0 combiner without any miss, e.g., it covers all the results reported in [10]. Then, we can compute \(\epsilon (v \cdot Z^m_{t^{\prime }} \oplus W \cdot B^m_{t^{\prime }})\) for m up to 14 with a low complexity, which is impossible for the exhaustive search method by FWT due to the large time and memory complexities, see [17] for more on Walsh Transforms. With Table 2 and the initial conditions \(\epsilon ((0,0),0)=-1\), and \(\epsilon ((a,b),c)=0\) for \((a,b,c)\ne (0,0,0)\), we can recursively deduce the unconditional correlations. Table 3 gives some searching results of the unconditional correlations. From the table, we found that with the increasing of m, the unconditional correlations become more and more smaller. Even though these correlations cannot be used here to improve the attack of E0, they can be applied to the attack in [10] and maybe have some better results.

Table 3 The two largest correlations and the number of different linear masks when \(m=7,\ldots ,14\)

Corollary 15

The recursive expression in Theorem 14 can compute all the correlation coefficients \(\epsilon (W\cdot B^m \oplus v\cdot Z^m)\) of the second level of E0 in \(2^{m-2}\cdot 5^{m-2}\) iterations.

Proof

In Theorem 14, \(c_t^1\) is usually not considered, so we only consider \(a_t=0,1\) for \(1 \le t \le m\). For a certain m, we have \(a_1^0=1\) and \(a_m^0=1\). From the initial values, if \(u_1\ne 0\), then the total correlation will be zero, so we always set \(u_1=0\). Now we just search over all the \(m-2\) undetermined coefficients \(a_i^0 \in GF(2)\) and \(u_i \in GF(2)^4\) for \(i=2,\ldots ,m-1\) which only needs \(2^{m-2}\cdot 5^{m-2}\) iterations. \(\square \)

The unconditional linear correlations are used in the linear approximation of the second level in Fig. 6. Since there is no linear relation between the input to the FSM in the second level and the original encryption key, the conditional correlations cannot be used in the approximation of the second level.

7.2 New Conditional Correlations

As condition mask indicates that the adversary may not use the full vector as the condition, only search the correlations conditioned on a subset of \(\mathcal {B}\) defined by a mask \(\lambda \). In the cryptanalysis of E0, \(\mathcal {B}_{t+1}\) is the key-related input. Given a condition mask \(\lambda =(\lambda _{t+1},\ldots ,\lambda _{t+l-2}) \in \text{ GF }(2)^{4(l-2)}\), where \(\lambda _j \in \text{ GF }(2)^4\) corresponds to \(B_j\) for \(j=t+1,\ldots ,t+l-2\). According to Eq. (2) and the difference between the model and the real E0 mentioned in Sect. 6, we can construct the target function of E0 as

$$\begin{aligned} h^{\varLambda }_{\mathcal {B}^{\prime }_{t+1}}:X_{t+1},\mathcal {B}^{*}_{t+1} \rightarrow \gamma \cdot C_t\oplus \omega \cdot \mathcal {B}^{*}_{t+1}, \end{aligned}$$
(17)

as shown in Fig. 8. Figure 8 also shows that though the computation process of \(C_t\) is frustrated by the condition mask \(\lambda \ne \mathbf 1 _{u}\), the bias can still be computed. Here, comes an example to illustrate how to compute the bias in the condition masking setting. Assume \(l=4\) and \({\lambda }={\texttt {0x0f}}\),Footnote 12 we have \(\mathcal {B}_{t+1}=B_{t+1}B_{t+2}, \mathcal {B}^{\prime }_{t+1}=B_{t+2}\) and \(\mathcal {B}^{*}_{t+1}=B_{t+1}\). We can guess \(B_{t+2}\) and compute \(h_{B_{t+2}}^{\varLambda }\) for all the possible choices of \(B_{t+1},X_{t+1}\) to get \(\epsilon (h_{B_{t+2}}^{\varLambda })\).

Fig. 8
figure 8

The new computation process of \(C_t\) in the real bluetooth encryption

According to the specification of E0, we can construct the linear functions \(L_t\) and \(L^{\prime }_t\) of Eq. (3). With this knowledge, the linear equations on the original encryption key can be acquired. For \(4 \le l \le 6\), we have exhaustively searched the correlations based on condition masking for all the possible condition masks on a PC. All the significant biases obtained are also verified in computer simulations working on sufficiently long output sequences. The time complexity of guessing is determined by \(wt(\lambda )\). To get better time/memory complexities, we restrain ourselves to the \(\lambda \)s satisfying \(1\le wt(\lambda ) \le 7\).

Table 4 The bias with \(\varLambda =(\gamma ,\eta )=(0x1f,\mathbf {0}_{|\eta |})\) and \(\lambda =0x00f\)

In the experiments, we have found many important masks, shown in Tables 4 and 5. Table 4 is computed with \(\lambda ={\texttt {0x00f}},\varLambda =(\gamma ,\eta )=({\texttt {0x1f}},\mathbf {0}_{|\eta |})\). We get \(\text{ E }[\Delta (h_{\mathcal {B}^{\prime }_{t+1}})] \approx 2^{-3.7}\), where \(\mathcal {B}^{\prime }_{t+1}=B_{t+3}\). In Table 5, we choose \(\lambda ={\texttt {0x007f}}\) and \(\varLambda =(\gamma ,\eta )=({\texttt {0x21}},\mathbf {0}_{|\eta |})\). From it, we get \(\text{ E }[\Delta (h_{\mathcal {B}^{\prime }_{t+1}})] \approx 2^{-3.5}\), where \(\mathcal {B}^{\prime }_{t+1}=B_{t+3}B_{t+4}\).

Table 5 The bias with \(\varLambda =(\gamma ,\eta )=(0x21,\mathbf {0}_{|\eta |})\) and \(\lambda =0x007f\)

Moreover, as mentioned in Sect. 4.2, for a fixed condition mask \(\lambda \), its maximum bias among all the linear masks \(\varLambda \) is an essential measure of it. The larger the maximum bias, the better the condition mask is. The following property indicates how to choose the condition mask to make the bias as large as possible. We have verified this property by searching over all the biases of \(h^{\varLambda }_{\mathcal {B}^{\prime }}\) for each combination of \(\lambda ,\gamma \) and \(\omega \).

Property 16

Let \(\mathcal {B}_{t+1}=B_{t+1}\cdots B_{t+l-2} \in GF(2)^{4(l-2)}\), and \(\lambda =(\lambda _{t+1},\ldots ,\) \(\lambda _{t+l-2}), \lambda ^{\prime }=(\lambda ^{\prime }_{t+1},\ldots ,\lambda ^{\prime }_{t+l-2})\) are two condition masks with \(4 \le l \le 6\) and \(wt(\lambda )=wt(\lambda ^{\prime }) \ge 4\), where \(\lambda _i,\lambda ^{\prime }_i \in GF(2)^4\) correspond to \(B_i\). If \(wt(\lambda _{t+l-2})=4\) and \(wt(\lambda ^{\prime }_{t+l-2})<4\), thenFootnote 13 \(max_{\varLambda }(E[\Delta (h^{\varLambda }_{\mathcal {B}^{\prime }_{t+1}})])>max_{\varLambda ^{\prime }}(E[\Delta (h^{\varLambda ^{\prime }}_{\mathcal {B}^{\prime }_{t+1}})])\), except when \(l=4,wt(\lambda _{t+1})=1,wt(\lambda _{t+2})=4\) and \(wt(\lambda ^{\prime }_{t+1})=2,wt(\lambda ^{\prime }_{t+2})=3\), in which case the maximum values are equal.

From Property 16, \(wt(B_{t+l-2})\) in \(\mathcal {B}_t\) plays the most important role in the correlation values based on condition masking. This weight determines the magnitude of the corresponding bias in the condition masking case. For example, given \(l=5,\varLambda =(\gamma ,\eta )=({\texttt {0x1f}},\mathbf {0}_{|\eta |}),\lambda _1={\texttt {0x303}},\lambda _2={\texttt {0x00f}}\), we can find \(\text{ E }[\Delta (h^{\varLambda }_{\mathcal {B}_1})]=0.020325 , \text{ E }[\Delta (h^{\varLambda }_{\mathcal {B}_2})]=0.078247\), where \(\mathcal {B}_1\) and \(\mathcal {B}_2\) are the two condition vectors determined by \(\lambda _1\) and \(\lambda _2\), respectively. The experimental results are depicted in the Figs. 9 and 10.Footnote 14 to show the different levels of the correlation magnitude. The fact depicted in the figures that the conditional correlation values are distributed at clearly different levels tells us that when selecting the condition masks, we should set the value of the highest four bits of \(\lambda \) to be \({\texttt {0xf}}\). For example, given \(l=5,\eta =\mathbf {0},\gamma ={\texttt {0x1f}},\lambda _1={\texttt {0x303}},\lambda _2={\texttt {0x00f}}\) and \(\lambda _3={\texttt {0x113}}\), we can find \(\text{ E }[\Delta (h^{\gamma ,0}_{{\lambda }_1 \diamond \mathcal {B}_{t+1}})]=0.020325 , \text{ E }[\Delta (h^{\gamma ,0}_{{\lambda }_2 \diamond \mathcal {B}_{t+1}})]=0.078247\) and \(\text{ E }[\Delta (h^{\gamma ,0}_{{\lambda }_3 \diamond \mathcal {B}_{t+1}})]=0.010162\). Not only one example, but all the experimental results confirm this claim so far.

Fig. 9
figure 9

\({\hbox {max}}_{\omega }(h_{\mathcal {B}}^{\varLambda })\) for different condition masks \(\lambda \) with \(wt(\lambda )=7\) and \(l=6\)

Fig. 10
figure 10

\({\hbox {max}}_{\omega }(h_{\mathcal {B}}^{\varLambda })\) for different condition masks \(\lambda \) with \(wt(\lambda )=6\) and \(l=6\)

8 Our Attacks with Condition Masking

According to the specification in [3], the last generated 128 bits \(S^i_{[-127,\ldots ,0]}\) in the first level are arranged in octets denoted by \(S[0],\ldots ,S[15]\), e.g., \(S[0]=(S^i_{-127}S^i_{-126}\cdots S^i_{-120})\). According to Sect. 4 and Fig. 7, \(V^i_{[1,\ldots ,24]}\) can be expressed as

$$\begin{aligned} V^i_{t^{\prime }}=U^i_{t^{\prime }}\oplus \alpha ^i_{t_1} \oplus \alpha ^i_{t_2} \oplus \alpha ^i_{t_3} \oplus \alpha ^i_{t_4}, \text { for } t^{\prime }=1,\ldots ,24. \end{aligned}$$

Since at level two (in Fig. 6), the 128-bit keystream \(S^i_t\) are loaded in the reverse order of that at level one, then Eq. (7) can be rewritten as

$$\begin{aligned} \bar{\gamma }\cdot (Z_{t^{\prime }}^i\oplus \mathcal {L}_{t^{\prime }}(K)\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))=\bigoplus _{j=1}^4(\gamma \cdot C_{t_j}^i)\oplus \bar{\gamma }\cdot C_{t^{\prime }}^i, \end{aligned}$$
(18)

for \(i=1,\ldots ,\mathcal {N}\). Here, we have \(t^{\prime } \in \bigcup ^2_{d=0}\{8d+1,\ldots ,8d+9-l\}\).Footnote 15

When \(t^{\prime } \in \bigcup ^2_{d=0}\{8d+1,\ldots ,8d+9-l\}\), by Eq. (2) and (17), we can rewrite this equation as follows:

$$\begin{aligned} \bar{\gamma }\cdot (Z_{t^{\prime }}^i\oplus \mathcal {L}_{t^{\prime }}(K)\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))\oplus \omega \cdot (L_1(K)\oplus L_2(P^i)) =\bigoplus _{j=1}^4 h^{\varLambda }_{\mathcal {B}'^i_{t_j+1}} \oplus h^{\bar{\gamma }}, \end{aligned}$$
(19)

where \(L_1,L_2\) are public linear functions in E0. Thus, we acquired the linear approximation of two-level E0 based on condition masking.

8.1 Key Recovery Attack with Bitwise Linear Approximation

From Sect. 7, the largest unconditional bias of \(h^{\gamma }\) is \(\frac{25}{256}\) with \(\gamma =(1,1,1,1,1)\) or (1, 0, 0, 0, 0, 1). To maximize the bias of Eq. (19), we choose these two \(\gamma \)s in the second level approximation, and then \(|\gamma |=l=5\) or 6. Due to the high time/memory complexities, the attack in [19] only considered \(l<6\). In our attack, the time/memory complexities are not dependent on \(|\gamma |\), they are determined on \(wt(\lambda )\), and thus \(l=6\) can also be used in the condition masking setting.

Given the condition mask \({\lambda }\) and the linear masks \(\varLambda =(\gamma ,\eta )\), in the case of E0 we have

$$\begin{aligned} \mathcal {B}^i_{\lambda }= & {} \left( \mathcal {B}'^i_{t_1+1}, \mathcal {B}'^i_{t_2+1}, \mathcal {B}'^i_{t_3+1}, \mathcal {B}'^i_{t_4+1}\right) ,\\ \mathcal {X}^i= & {} \left( Y^i_{t_1+1}, Y^i_{t_2+1}, Y^i_{t_3+1}, Y^i_{t_4+1}, X^i_{t^{\prime }},\mathcal {B}^i_{t^{\prime }+1} \right) , \end{aligned}$$

where \(Y^i_{t_j+1}=(X^i_{t_j+1},\mathcal {B}^*_{t_j+1})\) is the unknown input to \(h^{\varLambda }_{\mathcal {B}'^i_{t_j+1}}\), and \(X^i_{t^{\prime }},\mathcal {B}^i_{t^{\prime }+1}\) are the inputs to \(h^{\bar{\gamma }}\). By the same notations in Sect. 4, we set \(K_1=(L_{t_1}(K),L_{t_2}(K),\) \(L_{t_3}(K),L_{t_4}(K))\) be the \(4wt(\lambda )\) bits contained in \(\mathcal {B}^i_{\lambda }\) and \(K_2=\bar{\gamma }\cdot \mathcal {L}_{t^{\prime }}(K)\oplus \omega \cdot L_1(K)\) be the subkey. In the case of E0, we have the parameters \(n=4,k=4\). With the distinguisher \(\mathcal {F}^{\varLambda }_{B^i_{\lambda }}(\mathcal {X}^i)\) in Sect. 4 and the keystream sequences of E0, we can restore the correct keys.

8.2 Key Recovery Attack with the Vectorial Approach

Now we look at the vectorial approach. Here, we apply the vectorial key recovery attack in Sect. 4 with \(n=4,k=4\) to the two-level E0. Assume we use s mutually independent linear approximations. Keep the same notations as before. Let \(\varGamma =(\varLambda _1,\ldots ,\varLambda _s)\) and \(\varGamma ^{\prime }=(\bar{\gamma _1},\ldots ,\bar{\gamma _s})\) denote the linear mask of these s approximations. In particular, \(\varLambda _1\) is just the linear mask used in the above bitwise attack.

By the way of Sect. 4 and notice the difference between the general model and the real E0 pointed out in Sect. 6, we can construct a linear approximation of the real two-level E0 in the vectorial approach accordingly. With the appropriate choice of \(\varGamma =(\varLambda _1,\ldots ,\varLambda _s)\) following the principles in Sect. 4.5, we apply the vectorial key recovery attack in Sect. 4 to the case of E0. We can thus recover the \(\kappa \) secret bits of two-level E0.

8.3 The Data Complexity

The bi-biases analysis, i.e., using two bitwise linear approximations to construct a two-dimensional vector, is used in [19] to reduce the data complexity from \(2^{26.5}\) to \(2^{23.8}\). This method is very similar to the multidimensional linear cryptanalysis. But there are two errors in their analysis. First, the linear masks \(\gamma _1 = (1,1,0,1), \gamma _2=(1,0,1,1)\), are chosen there. But note that the unconditional correlation \(\epsilon (h^{\bar{\gamma _2}}) = 0\), so the second dimension of the vector distinguisher is always uniformly distributed. Thus, according to the multidimensional linear cryptanalysis, this vectorial distinguisher has the same SEI with the bitwise distinguisher, i.e., the bitwise linear approximation in the first dimension. This mainly comes from the fact that there are 4-bit memory in the E0 combiner, and thus any 2-bit linear combination has the zero correlation coefficient. Thus, this method cannot improve the data complexity. Second, the formula (31) in [19], i.e., \(\mathbf {F}^{\varGamma }_{\mathbf {B}^i} = \Delta (h^{\bar{\varGamma }})\cdot \text{ E }^4(\Delta (h^{\varGamma }_{\mathbf {B}_{t+1}}))\), is not always true. For the bitwise linear approximation, this formula is the same as pilling-up lemma, but for the vectorial method it is not always the case. Hence, the data complexity in [19] should be \(2^{26.5}\) rather than \(2^{23.8}\). The same problem is also in [34].

The correct way to compute the distribution of \(\mathbf {F}^{\varGamma }_{\mathbf {B}^i}\) is to use the convolutional operation to combine each sub-distribution efficiently. This process can be expressed as follows:

$$\begin{aligned} \Delta (\mathbf {F}^{\varGamma }_{\mathbf {B}^i})=&\Delta \left( h^{\varGamma }_{\mathbf {B}^1_{t+1}} \otimes h^{\varGamma }_{\mathbf {B}^2_{t+1}} \otimes h^{\varGamma }_{\mathbf {B}^3_{t+1}} \otimes h^{\varGamma }_{\mathbf {B}^4_{t+1}} \otimes h^{\varGamma ^{\prime }}\right) . \end{aligned}$$

Note that the FWT could be used here to accelerate the final computation through the relation between the convolution and the Walsh Transform, detailed in Sect. 2. We have used this method to re-compute all the data complexities in [19] and [34] and found that the advanced algorithm in [19] cannot improve the data complexity, that is the actual data complexity should be \(2^{26.5}\) rather than \(2^{23.8}\). Besides, the data complexities in [34] are also not so accuracy, though the experiments confirmed the data complexity in practice. But the success probability of the attack is not very high. The main reason is that the same formula as that in [19] is used. In the following sections, we will describe a new method to improve the data complexity based on the condition masking method.

8.4 Theoretical Analysis

To get the optimal performance of our attack, we should carefully choose the parameters \(\varGamma \) and \(\lambda \) in the linear approximations. As explained before, each component should not be uniformly distributed. We have searched all the linear masks in [19] and found that there is no other linear approximations that have non-uniform distribution. Thus, the method in [19] cannot be transformed into the vectorial approach. On the other hand, there is more flexibility of the condition masking method in [34] in the sense that much more linear masks are available to construct the vectorial distinguisher. By choosing different condition masks, we can use the multi-pass method and list decoding method to reduce the data complexities efficiently.

Table 6 Example: \({\lambda }=0x000f\)

The experiments have shown that there are many large correlations based on condition masking that can be used in our attack. For example, for a condition mask \(\lambda ={\texttt {0x000f}}\), we can choose the 2 linear masks listed in Table 6, the experimental results show \(\Delta (h_{\mathcal {B}^{\prime }_{t+1}}^{\varGamma })\approx 2^{-2.7}\), where \(\varGamma =(({\texttt {0x21}},\mathbf {0}),({\texttt {0x23}},\mathbf {0}))\). And \(\Delta (h^{\varGamma ^{\prime }})\approx 2^{-6.7}\), thus from Eq. (11) we know that the data complexity is \(\mathcal {N}_{\mathcal {B}^{\prime }}\approx 2^{27}\). In this example, we can recover the involved \(\kappa =17\)-bit subkey. But the data complexity is higher than the real-world Bluetooth bound \(2^{26}\). We will use the list decoding method to reduce the data complexity to \(2^{25}\), which is detailed in the next Section. Here, we give the theoretical analysis why we use the data complexity \(2^{25}\) to generate a list of 256 possible candidate keys. According to the LLR method in [2], each possible \(K \in \{0,1\}^{17}\) has the grade \(G_K = \text{ LLR }(\mathcal {F}^{\varGamma }_{K_{\lambda }})\). We assume that for one unknown value \(K = K_0\), each sample \(\mathcal {F}^{\varGamma }_{K_{\lambda }}\) follows the distribution \(\textsf {D}_0\), whereas when \(K \ne K_0\), all the \(\mathcal {F}^{\varGamma }_{K_{\lambda }}\) follow the distribution \(\textsf {D}_1\). For any \(K \ne K_0\), we obtain that \(G_{K_0}-G_K\) is approximately normally distributed with the expected value \(\mathcal {N}_{\mathcal {B}^{\prime }}\Delta (\textsf {D}_0)\) and the standard deviation \(\sqrt{2\mathcal {N}_{\mathcal {B}^{\prime }}\Delta (\textsf {D}_0)}\). Hence, the probability that a wrong key K has a better grade than the right key \(K_0\), i.e., \(G_{K_0} < G_K\) is about \(\varPhi (-\sqrt{\mathcal {N}_{\mathcal {B}^{\prime }}\Delta (\textsf {D}_0)/2})\), where \(\varPhi (t)=\frac{1}{\sqrt{2\pi }}\int _{-\infty }^{t}e^{-\frac{1}{2}u^{2}}du\) is the distribution function of the standard normal distribution. Thus, the expected number of the K having larger grades than the correct key \(K_0\) is \( (2^{17}-1)\cdot \varPhi (-\sqrt{\mathcal {N}_{\mathcal {B}^{\prime }}\Delta (\textsf {D}_0)/2})\). The probability that this value is smaller than 256 is \(\varPhi (-\sqrt{n_{\mathcal {B}^{\prime }}\Delta (D_0)/2}) \le \frac{256}{2^{17}-1}\). By the property of the standard normal distribution, we can acquire \(\varPhi (\sqrt{\mathcal {N}_{\mathcal {B}^{\prime }}\Delta (D_0)/2}) \approx 0.998.\) Since \(\Delta (D_0) \approx 2^{21.44}\), we can get the new value of \(\mathcal {N}_{\mathcal {B}^{\prime }}\) is \(2^{25.49}\). That is why, we use the data complexity \(\mathcal {N}_{\mathcal {B}^{\prime }} = 2^{25}\) in the experiments. Furthermore, we can use a new condition mask to repeat the same process to minimize the size of key candidates list, presented in Sect. 9. In the experiments, we find that we can always recover the correct key correctly.

Let us analyze the time complexity of the two examples discussed above. The pre-computation of \(\widehat{H^{\prime }}\) is \(17\cdot 2^{17}\), and we need time \(2\cdot 17\cdot 2^{17}\approx 2^{21.1}\) to compute \(\widehat{\mathcal {H}},\widehat{\mathcal {H}^{\prime \prime }}\), and time \(n_{\mathcal {B}^{\prime }}=2^{27}\) to compute \(\mathcal {H}\), so the total time is \(2^{27}+2^{21.1}\).

9 Practical Implementation of the Known-IV Attack

Our attacks have been fully implemented on one core of a single PC, running with Windows 7, Intel Core 2 Q9400 2.66GHz and 4GB RAM. In general, the experimental results match the theoretical analysis quite well. We present the details as follows.

We choose the condition mask \({\lambda }={\texttt {0x000f}}\) and \(\gamma _1={\texttt {0x21}}, \eta _1=\mathbf {0},\gamma _2={\texttt {0x23}},\eta _2=\mathbf {0},t^{\prime }=1,\mathcal {N}_{\mathcal {B}^{\prime }}=2^{25}\) (reduced by the list decoding and multi-pass method) in the experiments. We call this process the first step. Here, the list decoding method means that we select a list of key candidates other than a unique possible key. After the first step, we get a candidate list of size 256. In the second step, we choose another condition mask, \({\lambda } = {\texttt {0x00f}}\) and \(\gamma ={\texttt {0x1f}}, \omega =\mathbf {0}\), and then use these parameters to mount a new key recovery attack to reduce the size of key list further. After that, we can always acquire the correct key.

Precisely in this configuration, we have the condition bits \(\mathcal {B}'^i_{t+1}=B^i_{t+4}\). In the first step, we first collect \(\mathcal {N}_{\mathcal {B}^{\prime }}\) frames for a random key and store them in a binary file. It takes about 8 minutes and 160MB to fulfill this task. With these samples, we run Algorithm 1 to recover the possible keys stored in a list. The pre-computation of \(\mathcal {H}^{\prime }\) and \(\widehat{\mathcal {H}^{\prime }}\) needs about one second, and the results are stored in a 4MB table in RAM, not on the hard disk. Computing \(\mathcal {H},\widehat{\mathcal {H}},\mathcal {H}^{\prime \prime },\widehat{\mathcal {H}^{\prime \prime }}\) in total takes about 2 seconds. Compared with the 37 hours and 64GB table in [19], our attack can be easily carried out in real time on a single PC.

Our new attack is repeated 100 times with different randomly generated keys and IVs. In the first step of our experiments, the right key ranks first for 72 times and about \(99\%\) of the right keys are in the first 256 key candidates list. Thus, after the first step, we almost have got the right key. Then, we use the new condition masking to recover the right key in this 256 candidates and in 86 times, we get the unique one right key. The remaining experiments can reduce the size of the possible key list further. Note that in [19], the experiments are only carried out in the basic bitwise level with \(2^{26}\) frames and repeated 30 times for a fixed key.

One run of our attack is as follows. We first use two-level E0 to generate \(2^{25}\) frames for the key \({\texttt {0x8387cb74a2b0cf437ba6995f74de39e0}}\). In the experiments, we use the Mersenne twister to generate the key and the \(2^{25}\) 74-bit IVs, and set the first IV \(\text{ ADR }={\texttt {0x1ad5e266c6fa}}, \text{ CLK }={\texttt {0x6260c9}}\) as the benchmark, the others are xor values with the benchmark IV. We store the first 24 bits of each frame in a file and compute the \(\mathcal {F}^{\varLambda }_{\mathcal {B}^i_{\lambda }}(\mathcal {X}^i), i=1,\ldots ,2^{25}\). Note that if we choose another IV as the new benchmark IV, the values of new \(\mathcal {F}^{\varLambda }_{\mathcal {B}^i_{1\lambda }}(\mathcal {X}^i),i=1,\ldots ,2^{25}\) are just a permutation of the former ones. Second, we compute \((B^i_{t_1+4},B^i_{t_2+4},B^i_{t_3+4},B^i_{t_4+4})(i\ge 2)\) by \((B^1_{t_1+4},B^1_{t_2+4},B^1_{t_3+4},B^1_{t_4+4}) \oplus (\Delta _{i,1},\Delta _{i,2},\Delta _{i,3},\Delta _{i,4})\), where \(\Delta _{i,j}\) denotes the difference value of the i-th frame. Third, for each possible key, we use the FWT to compute the grade G(K). In this instance, the grade of the right subkey \((K_1=({\texttt {0x5}},{\texttt {0xf}},{\texttt {0x7}},{\texttt {0x3}}),K_2={\texttt {0x1}})\) is 8.118578, which ranks the first. In total, the running time is \(\mathcal {N}_{\mathcal {B}^{\prime }} + k\cdot 2^{k+1}\approx 2^{25}\). In order to recover more key bits, we can increase the time instant and using the same method to recover all the key bits. Table 7 gives a comparison of our attacks with the best previous attacks on two-level E0.

Table 7 Comparison of our attack in Sect. 8 with the previous attacks on two-level E0

10 The Ciphertext-Only Attack

In this section, we convert the attack in Sect. 8 against the real two-level E0 into a ciphertext-only attack, which is much more practical than the above known-IV (known-plaintext) attacks. In the real-world ciphertext-only scenario, the adversary only has access to a set of ciphertext bits \(cp_{t^{\prime }}\) intercepted from the air other than the keystream bit \(z_{t^{\prime }}\).

Note that \(z_{t^{\prime }} = cp_{t^{\prime }} \oplus m_{t^{\prime }}\), where \(m_{t^{\prime }}\) is the \(t^{\prime }\)-th real plaintext bit. Let \(CP_{t^{\prime }}=(cp_{t^{\prime }},\ldots ,cp_{t^{\prime }+l-1})\) and \(M_{t^{\prime }}=(m_{t^{\prime }},\ldots ,m_{t^{\prime }+l-1})\), then the core linear approximation Eq. (19) in Sect. 8 becomes

$$\begin{aligned}&\bar{\gamma }\cdot (CP_{t^{\prime }}^i\oplus \mathcal {L}_{t^{\prime }}(K)\oplus \mathcal {L}^{\prime }_{t^{\prime }}(P^i))\oplus \omega \cdot (L_1(K)\oplus L_2(P^i)) \\&\quad =\bigoplus _{j=1}^4 h^{\varLambda }_{\mathcal {B}'^i_{t_j+1}} \oplus h^{\bar{\gamma }} \oplus \bar{\gamma } \cdot M_{t^{\prime }}^i \;. \end{aligned}$$

Further note that the adversary always has some knowledge of the statistical distribution of the plaintext characters. Here, for brevity, we assume that the plaintexts consist of natural English sentences represented by ASCII codes. The ASCII codes and the statistical property of these symbols are listed in Table 8.

Table 8 Relative frequencies of letters in english text

In this case, the statistical distribution of the plaintext is usually heavily biased, i.e., \(\epsilon (\bar{\gamma } \cdot M_{t^{\prime }}^i) \ne 0 \). Let us denote the corresponding bias by \(\epsilon _{M}\). Besides, assume that the bias of Eq. (19) is \(\epsilon _{Z}\), and then according to the Piling-up Lemma, we can compute the total bias of the above linear approximation in the ciphertext-only attack as \(2\epsilon _{M}\epsilon _{Z}\), and the other parts of the attack are the same as the previous known-IV attack.

We use the bitwise linear approximation to mount the ciphertext-only attack. The reason that the vectorial approach cannot be applied here is as follows. Assume we use the same parameter configuration as that in the known-plaintext attack, i.e., \(\lambda = {\texttt {0x000f}}, \varGamma = (({\texttt {0x21}},\mathbf {0}),({\texttt {0x23}},\mathbf {0}))\). The target distribution in the ciphertext-only attack now becomes \(\mathcal {F}^{\varGamma }_{\mathcal {B}_{\lambda }} \oplus (\gamma _1 \cdot M, \gamma _2 \cdot M)\). By Table 8, we can compute the SEI of the plaintext as \(\Delta ((\gamma _1 \cdot M, \gamma _2 \cdot M))=1.04\), which is much larger than the bitwise bias of plaintext. But when computing the SEI of the distribution of \(\mathcal {F}^{\varGamma }_{\mathcal {B}_{\lambda }} \oplus (\gamma _1 \cdot M, \gamma _2 \cdot M)\) using the convolution method, the SEI decreases very fast and becomes \(\Delta (\mathcal {F}^{\varGamma }_{\mathcal {B}_{\lambda }} \oplus (\gamma _1 \cdot M, \gamma _2 \cdot M)) \approx 2^{-27.19}\) and accordingly the data complexity becomes \(\mathcal {N}_{\mathcal {B}^{\prime }} \approx 2^{32.75}\), which is well above the upper bound of \(2^{26}\). The decreasement of the SEI is mainly caused by the computation of the convolution between the two distributions. On the other hand, for the bitwise linear approximation case, the distribution of the xor of two underlying distributions can be computed by the piling-up lemma, which is detailed in the next Section.

11 Practical Implementation of the Ciphertext-Only Attack

In the experiments of the ciphertext-only attack, we choose the condition mask \(\lambda = {\texttt {0x00f}}\) and \(\gamma = {\texttt {0x1f}}, \eta = \mathbf {0}\). In this configuration, we have the conditional correlation \(\Delta (h^{\varLambda }_{\mathcal {B}^{\prime }_{t+1}}) \approx 2^{-3.67}\) and the unconditional correlation \(\Delta (h^{\gamma }) = 2^{-6.71}\). Assume the plaintext are represented by ASCII codes and we can use the Table 8 to compute the SEI of the plaintext, i.e., \(\Delta (\bar{\gamma } \cdot M_t) \approx 2^{-1.82}\). Therefore, according to Eq. (11), we can calculate the data complexity of the ciphertext-only attack as \(\mathcal {N}_{\mathcal {B}^{\prime }} \approx 2^{28.79}\), which is larger than the upper bound of \(2^{26}\) in the real Bluetooth system for a fixed key. To compensate this issue, we use the same list decoding and multi-pass method as those in the known-IV scenario to assure a high success probability.

Fig. 11
figure 11

The practical attack scenario of bluetooth encryption

In our experiments, we set the \(\mathcal {N}_{\mathcal {B}^{\prime }} \approx 2^{26}\) (slightly less than the theoretical estimate \(2^{28.79}\)). The pre-computation of \(\widehat{\mathcal {H}}^{\prime }\) is \(17 \cdot 2^{17}\), and we need time \(2 \cdot 17 \cdot 2^{17} \approx 2^{21.1}\) to compute \(\widehat{\mathcal {H}}, \widehat{\mathcal {H}^{\prime \prime }}\), and time \(n_{\mathcal {B}^{\prime }} = 2^{26}\) to compute \(\mathcal {H}\), so the total time is \(2^{26} + 2^ {21.1}\). Thus, our ciphertext-only attack can be easily carried out in real time on one core of a single PC and the cost of our attack is nearly the same as that of the known-plaintext attack.

Fig. 12
figure 12

The distribution of grades

In order to acquire enough number of the plaintexts that conform to the relative frequencies of the natural English, we collected the plaintexts from many famous novels. Then, we encrypted each letter by the two-level E0 scheme as a Bluetooth frame, and thus we can get \(2^{26}\) frames. As described in Sect. 5, we have known the format of the Bluetooth frame. Therefore, in the practical application we can use a Bluetooth sniffer to grab the Bluetooth packets from the Bluetooth devices, described in Fig. 11. We repeated 100 times of our ciphertext-only attack with different randomly generated keys and IVs. For each pair of key and IV, we generate \(2^{26}\) Bluetooth frames which are stored in a binary file. Then, we use the algorithm which is similar to Algorithm 1 to recover the key. In the experiments, we take the first 256 candidates in the list as the possible keys for each run. We found that in about 69 runs, the correct key ranks among the first 256 candidates. Figure 12 shows the distribution of grades in the 100 times experiments. We can see that more than \(70\%\) grades are larger then 4. We can use some new condition masks, e.g., \(\lambda = {\texttt {0x01f}}, {\texttt {0x000f}}\), to do the same experiments as above. This method will increase the success probability and decrease the size of key candidates. After recovering the partial key bits, the other bits of key can be acquired in the same way. The theoretical analysis is the same as in the known-IV attack. One run of our attack is as follows. We first generate \(2^{26}\) frames by the key \({\texttt {0x1c774e7b}}{\texttt {1626ed02f2b9b6b49afb82a1}}\) and encrypt the plaintexts to generate \(2^{26}\) ciphertexts, which need about 320Mb to be stored. The flow of the ciphertext-only attack is the same as the known-plaintext attack. In this mentioned instance, the grade of the right \(K_1 = ({\texttt {0x3}},{\texttt {0x0}},{\texttt {0xc}},{\texttt {0x2}}), K_2={\texttt {0x1}}\) is 9.020190, which ranks the first. The complexities of our attack are listed in Table 9.

Table 9 Complexities of our ciphertext-only attack

Countermeasure. Following the design criterion in Sect. 4.6, we recommend to discard the first \(2\cdot 39=78\) keystream bits at the beginning of the second level to resist against our attack. In this case, the unconditional correlations can be reduced to below \(2^{-218}\), which will frustrate our attack completely.

12 Conclusions

In this paper, we have studied the security of a general two-level E0-like encryption model and the real-world Bluetooth encryption scheme. A fast recursive method with time complexity analysis is formulated to compute the unconditional correlations in the general core keystream generator. Besides, the conditional correlation properties of the two-level model are derived and analyzed by the condition masking technique. A key recovery framework is established to extract the secret key in the model, which has more generality compared to the previous one. Both bitwise and vectorial attacks have been mounted on the model with theoretical analysis. A novel design criterion is suggested to resist our attack. As the case study, we described more threatening and real time attacks on two-level E0. Our attacks have been fully implemented in C language on one core of a single PC and are repeated hundreds of times with randomly generated keys and IVs. On average, it takes only a few seconds to restore the original encryption key. This clearly demonstrates the superiority of our method. Finally, we converted the attack into a ciphertext-only attack with only small increments in the complexities. This is the first practical ciphertext-only attack against the Bluetooth encryption in the real-world so far. We suggest to discard the first 78 keystream bits at the beginning of the second level to strengthen the security of Bluetooth encryption.