Keywords

1 Introduction

Since the meet-in-the-middle (MITM) attack was applied to KTANTAN [7], a lot of its improvements have been introduced such as the splice-and-cut technique [4], the initial structure [24], the biclique cryptanalysis [6, 19], the internal state guess [10, 14], the sieve-in-the-middle technique [9] and the parallel-cut technique [23]. Since the MITM attack basically exploits the weakness in the key scheduling function, it was believed that a block cipher having a strong key scheduling function has enough immunity against the MITM attack.

Isobe and Shibutani proposed the all-subkeys recovery (ASR) approach at SAC 2012 as an extension of the MITM attack [16], and showed several best attacks on block ciphers having relatively complex key scheduling function including CAST-128 [1], SHACAL-2 [13], FOX [18] and KATAN [8]. One of the advantages of the ASR attack compared to the basic MITM attack is that it does not need to take the key scheduling function into account, since it recovers all subkeys instead of the master key. Thus, it has been shown that the MITM attack may be more applicable to block ciphers. Moreover, the ASR approach enables us to evaluate the lower bounds on the security against key recovery attack for a block cipher structure, since the ASR attack is applicable independently from the underlying key scheduling function. For Feistel schemes, such lower bounds were shown by using the ASR attack with a couple of its improvements such as the function reduction in [17]. For instance, the function reduction reduces the number of subkeys required to compute the matching state by exploiting degrees of freedom of plaintext/ciphertext pairs. Then, the number of attacked rounds can be increased by the ASR attack. Therefore, in order to more precisely evaluate the security of a block cipher against the ASR attack, the following natural question arises: Are those advanced techniques applicable to other structures such as Lai-Massey and LFSR-type schemes?

Table 1. Summary of attacks on FOX64/128, KATAN32/48/64 and SHACAL-2 (single-key setting)

In this paper, we first apply the function reduction technique to Lai-Massey, LFSR-type and source-heavy generalized Feistel schemes to extend the ASR attacks on those structures. Then, we further improve the attacks on those structures by exploiting structure dependent properties and optimizing data complexity in the function reduction. For instance, the ASR attack with the function reduction on FOX can be improved by using the keyless one-round relation in Lai-Massey scheme. Moreover, combined with the repetitive ASR approach, which optimizes the data complexity when using the function reduction, the attack on FOX can be further improved. Those results are summarized in Table 1. As far as we know, all of the results given by this paper are the best single-key attacks with respect to the number of attacked rounds in literatureFootnote 1 We emphasize that our improvements keep the basic concept of the ASR attack, which enables us to evaluate the security of a block cipher without analyzing its key scheduling function. Therefore, our results are considered as not only the best single-key attacks on the specific block ciphers but also the lower bounds on the security of the target block cipher structures independently from key scheduling functions.

The rest of this paper is organized as follows. Section 2 briefly reviews the previously shown techniques including the all-subkeys recovery approach, the function reduction and the repetitive all-subkeys recovery approach. The improved all-subkeys recovery attacks on FOX64/128, KATAN32/48/64 and SHACAL-2 are presented in Sects. 3, 4 and 5, respectively. Finally, we conclude in Sect. 6.

2 Preliminary

2.1 All-Subkeys Recovery Approach [16]

The all-subkeys recovery (ASR) attack was proposed in [16] as an extension of the meet-in-the-middle (MITM) attack. Unlike the basic MITM attack, the ASR attack is guessing all-subkeys instead of the master key so that the attack can be constructed independently from the underlying key scheduling function.

Let us briefly review the procedure of the ASR attack. First, an attacker determines an \(s\)-bit matching state \(S\) in a target \(n\)-bit block cipher consisting of \(R\) rounds. The state \(S\) can be computed from a plaintext \(P\) and a set of subkey bits \(\mathcal {K}_{(1)}\) by a function \(\mathcal {F}_{(1)}\) as \(S = \mathcal {F}_{(1)}(P, \mathcal {K}_{(1)})\). Similarly, \(S\) can be computed from the corresponding ciphertext \(C\) and another set of subkey bits \(\mathcal {K}_{(2)}\) by a function \(\mathcal {F}_{(2)}\) as \(S = \mathcal {F}^{-1}_{(2)}(C, \mathcal {K}_{(2)})\). Let \(\mathcal {K}_{(3)}\) be a set of the remaining subkey bits, i.e., \(|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}| + |\mathcal {K}_{(3)}| = R \cdot \ell \), where \(\ell \) denotes the size of each subkey. For a plaintext \(P\) and the corresponding ciphertext \(C\), the equation \(\mathcal {F}_{(1)}(P, \mathcal {K}_{(1)}) = \mathcal {F}^{-1}_{(2)}(C, \mathcal {K}_{(2)})\) holds when the guessed subkey bits \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) are correct. Since \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) can be guessed independently, we can efficiently filter out the incorrect subkeys from the key candidates. After this process, it is expected that there will be \(2^{R \cdot \ell - s}\) key candidates. Note that the number of key candidates can be reduced by parallel performing the matching with additional plaintext/ciphertext pairs. In fact, using \(N\) plaintext/ciphertext pairs, the number of key candidates is reduced to \(2^{R \cdot \ell - N \cdot s}\), as long as \(N \le (|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}|) / s\). Finally, the attacker exhaustively searches the correct key from the remaining key candidates. The required computations (i.e. the number of encryption function calls) of the attack in total \(C_{comp}\) is estimated as

$$\begin{aligned} C_{comp} = \max (2^{|\mathcal {K}_{(1)}|}, 2^{|\mathcal {K}_{(2)}|}) \times N + 2^{R \cdot \ell - N \cdot s}. \end{aligned}$$
(1)

The number of required plaintext/ciphertext pairs is \(\max (N, \lceil (R \cdot \ell - N \cdot s)/n \rceil )\), where \(n\) is the block size of the target cipher. The required memory is about \(\min (2^{|\mathcal {K}_{(1)}|}, 2^{|\mathcal {K}_{(2)}|}) \times N\) blocks, which is the cost of the table used for the matching.

2.2 Improvements on All-Subkeys Recovery Approach

In the ASR attack, the number of the subkeys required to compute the state \(S\) from \(P\) or \(C\), i.e., \(\mathcal {K}_{(1)}\) or \(\mathcal {K}_{(2)}\), is usually dominant parameter in the required complexities. Thus, in general, reducing those subkeys \(\mathcal {K}_{1}\) and \(\mathcal {K}_{2}\) will make the ASR attack applicable to more rounds. In the followings, we briefly review and introduce a couple of techniques to reduce such subkeys required to compute the matching state.

Function Reduction Technique. For Feistel ciphers, the function reduction technique that directly reduces the number of involved subkeys was introduced in [17]. The basic concept of the function reduction is that fixing some plaintext bits, ciphertext bits or both by exploiting degrees of freedom of a plaintext/ciphertext pair allows an attacker to regard a key dependent variable as a new subkey. As a result, substantial subkeys required to compute the matching state are reduced. By using the function reduction, the lower bounds on the security of several Feistel ciphers against generic key recovery attacks were given in [17]. Note that a similar approach was presented in [11] for directly guessing intermediate state values, while in the function reduction, equivalently transformed key values are guessed.

Suppose that the \(i\)-th round state \(S_{i}\) is computed from the \((i-1)\)-th round state \(S_{i-1}\) XORed with the \(i\)-th round subkey \(K_{i}\) by the \(i\)-th round function \(G_{i}\), i.e., \(S_{i} = G_{i}(K_{i} \oplus S_{i-1})\). For clear understanding, we divide the function reduction into two parts: a key linearization and an equivalent transform as follows.

  • Key Linearization. Since the \(i\)-th round function \(G_{i}\) is a non-linear function, the \(i\)-th round subkey \(K_{i}\) cannot pass through \(G_{i}\) by an equivalent transform. The key linearization technique, which is a part of the function reduction, exploits the degree of freedom of plaintexts/ciphertexts to express \(S_{i}\) as a linear relation of \(S_{i-1}\) and \(K_{i}\), i.e., \(S_{i} = L_{i}(S_{i-1}, K_{i})\), where \(L_{i}\) is a linear function. Once \(S_{i}\) is represented by a linear relation of \(S_{i-1}\) and \(K_{i}\), \(K_{i}\) can be forwardly moved to a next non-linear function by an equivalent transform. Note that, if the splice-and-cut technique [4] is used with the key linearization, \(K_{i}\) can be divided into both forward and backward directions.

  • Equivalent Transform. After the key linearization, the \(i\)-th round subkey \(K_{i}\) is replaced with a new subkey \(K'_{i}\) to pass through a non-linear function. However, in order to reduce the involved subkey bits on the trails to the matching state, all-subkeys on the trails affected by \(K'_{i}\) are also replaced with new variables by an equivalent transform. Consequently, the number of subkeys required to compute the matching state can be reduced. For the Feistel ciphers, it is easily done by replacing all-subkeys in the even numbered rounds \(K_{j}\) with \(K'_{j} (= K'_{1} \oplus K_{j})\), where \(j\) is even.

The splice-and-cut technique [4], which was originally presented in the attack of the two-key triple DES [21], was well used in the recent meet-in-the-middle attacks [3, 6, 7, 19, 24]. It regards that the first and last rounds are consecutive by exploiting degree of freedom of plaintext/ciphertexts, and thus any round can be the start point. In general, the splice-and-cut technique is useful to analyze the specific block cipher that key-dependency varies depending on the chunk separation. However, in the ASR approach, the splice-and-cut technique does not work effectively, since the ASR treats all-subkeys as independent variables to evaluate the security independently from the key scheduling function. On the other hand, the function reduction exploits degrees of freedom of plaintexts/ciphertexts to reduce subkey bits required to compute the matching state, and does not use relations among subkeys. Therefore, the function reduction technique is more useful and suitable for the ASR approach than the splice-and-cut technique. However, as mentioned in the description of the key linearization, the combined use of the splice-and-cut and the function reduction in the key linearization is also possible, e.g. the attack on Feistel-1 [17] and the attack on SHACAL-2 in this paper.

Repetitive All-Subkeys Recovery Approach. Since the function reduction exploits the degree of freedom of plaintexts/ciphertexts, it sometimes causes an attack infeasible due to lack of available data. For such cases, we introduce a variant of the all-subkeys recovery approach called repetitive all-subkeys recovery approach that repeatedly applies the all-subkeys recovery to detect the correct key. The variant can reduce the required data for each all-subkeys recovery phase, though the total amount of the required data is unchanged. Note that a similar technique, called inner loop technique, was used in [5, 23] for reducing the memory requirements. The repetitive all-subkeys recovery approach is described as follows.

  1. 1.

    Mount the ASR attack with \(N\) plaintext/ciphertexts, where \(N\) is supposed to be less than \((|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}|) / s\), then put the remaining key candidates into a table \(T_1\). The number of expected candidates is \(|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}| - N \cdot s\).

  2. 2.

    Repeatedly mount the ASR attack with different \(N\) plaintext/ciphertexts. If the remaining candidate match with ones in \(T_1\), such candidates are put into another table \(T_2\). The number of expected candidates is \(|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}| - 2 \cdot N \cdot s\).

  3. 3.

    Repeat the above processes until the correct key is found, i.e., \(M = (|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}|)/(N \cdot s)\) times.

When the above procedure is repeated \(M\) \((\ge 2)\) times, the computational costs to detect \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) are estimated as

$$\begin{aligned} C_{comp} = (\max (2^{|\mathcal {K}_{(1)}|}, 2^{|\mathcal {K}_{(2)}|}) \times N)\, \times&\, M + (2^{|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}| - N \cdot s}) +\\&\,\dots + (2^{|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}| - (M -1) \cdot N \cdot s}). \end{aligned}$$

While the required data in total is \((|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}|) /s\) \((= ((|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}|)/(M \cdot s))~\cdot M )\), each ASR approach is done with \(N = (|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}|) /(M \cdot s)\) data, which is \(M\) times less than that required in the basic ASR attack. The required memory is about \(\max (2^{|\mathcal {K}_{(1)}| + |\mathcal {K}_{(2)}| - N \cdot s}, \min (2^{|\mathcal {K}_{(1)}|}, 2^{|\mathcal {K}_{(2)}|}) \times N\)) blocks, which is the cost for the table used in the matching. We demonstrate the effectiveness of the proposed variant in the attack on the reduced FOX in Sect. 3.

3 Improved All-Subkeys Recovery Attacks on FOX64 and FOX128

In this section, we present the improved ASR attacks using the function reduction and the repetitive ASR approach on the 6- and 7-round reduced FOX64 and FOX128 block ciphers. After short descriptions of FOX64 and FOX128, the function reduction on FOX64 is presented. Then, we show how to construct the attack on the 6-round FOX64, and how to extend it to the 7-round variant by using the repetitive ASR approach. Similarly, the function reduction on FOX128, the attack on the 6-round FOX128, and the attack on the 7-round FOX128 with the repetitive ASR approach are introduced, respectively.

Fig. 1.
figure 1

Round functions of FOX64 and FOX128

3.1 Descriptions of FOX64 and FOX128

FOX [18], also known as IDEA-NXT, is a family of block ciphers designed by Junod and Vaudenay in 2004. FOX employs a Lai-Massey scheme including two variants referred as FOX64 and FOX128 (see Fig. 1).

FOX64 is a 64-bit block cipher consisting of a 16-round Lai-Massey scheme with a 128-bit key. The \(i\)-th round 64-bit input state is denoted as two 32-bit words (\(L_{i-1}\) \(||\) \(R_{i-1}\)). The \(i\)-th round function updates the input state using the 64-bit \(i\)-th round key \(K_{i}\) as follows:

$$\begin{aligned} (L_{i}||R_{i})= & {} (\mathtt {or}(L_{i-1} \oplus \mathtt {f32}(L_{i-1} \oplus R_{i-1}, K_i)) || R_{i-1} \oplus \mathtt {f32}(L_{i-1} \oplus R_{i-1}, K_i)), \end{aligned}$$

where \(\mathtt {or}(x_{0} || x_{1}) = (x_{1} || (x_{0} \oplus x_{1}))\) for 16-bit \(x_{0}\), \(x_{1}\). \(\mathtt {f32}\) outputs a 32-bit data from a 32-bit input \(X\) and two 32-bit subkeys \(LK_{i}\) and \(RK_{i}\) as (\(\mathtt {sigma4}\)(\(\mathtt {mu4}\)(\(\mathtt {sigma4}\)(\(X \oplus LK_{i}\))) \(\oplus \) \(RK_{i}\)) \(\oplus LK_{i}\)), where \(\mathtt {sigma4}\) denotes the S-box layer consisting of four 8-bit S-boxes and \(\mathtt {mu4}\) denotes the \(4 \times 4\) MDS matrix. Two 32-bit subkeys \(LK_{i}\) and \(RK_{i}\) are derived from \(K_{i}\) as \(K_{i}\) = \((LK_{i} || RK_{i})\).

FOX128 is a 128-bit block cipher consisting of a 16-round modified Lai-Massey scheme with a 256-bit key. The \(i\)-th round 128-bit input state is denoted as four 32-bit words (\(LL_{i-1}\) \(||\) \(LR_{i-1}\) \(||\) \(RL_{i-1}\) \(||\) \(RR_{i-1}\)). The \(i\)-th round function updates the input state using the 128-bit \(i\)-th round key \(K_{i}\) as follows:

$$\begin{aligned} (LL_{i}||LR_{i})= & {} (\mathtt {or}(LL_{i-1} \oplus \phi _{L}) || LR_{i-1} \oplus \phi _{L}), \\ (RL_{i}||RR_{i})= & {} (\mathtt {or}(RL_{i-1} \oplus \phi _{R}) || RR_{i-1} \oplus \phi _{R}), \end{aligned}$$

where \((\phi _{L} || \phi _{R}) = \mathtt {f64}((LL_{i-1} \oplus LR_{i-1}) || (RL_{i-1} \oplus RR_{i-1}), K_i)\). \(\mathtt {f64}\) outputs a 64-bit data from a 64-bit input \(X\) and two 64-bit subkeys \(LK_{i}\) and \(RK_{i}\) as (\(\mathtt {sigma8}\)(\(\mathtt {mu8}\)(\(\mathtt {sigma8}\)(\(X \oplus LK_{i}\))) \(\oplus \) \(RK_{i}\)) \(\oplus LK_{i}\)), where \(\mathtt {sigma8}\) denotes the S-box layer consisting of eight 8-bit S-boxes and \(\mathtt {mu8}\) denotes the \(8 \times 8\) MDS matrix. Two 64-bit subkeys \(LK_{i}\) and \(RK_{i}\) are derived from \(K_{i}\) as \(K_{i}\) = \((LK_{i} || RK_{i})\).

3.2 Function Reduction on FOX64

Key Linearization (Fig. 2 ). If the value of \(L_{0} \oplus R_{0}\) is fixed to a constant \(CON_{1}\), the input of \(\mathtt {f32}\) is fixed as \(\mathtt {f32}(CON_{1}, K_1)\). By regarding \(\mathtt {f32}(CON_{1}, K_1)\) as a 32-bit new key \(K'_1\), \(K'_1\) is XORed to \(L_{0}\) and \(R_{0}\). Since \(\mathtt {or}\) is a linear operation, the state after the first round is expressed as \((L_{1}||R_{1}) = (\mathtt {or}(L_{0}) \oplus OK'_1) || (R_{0} \oplus K'_1 )\), where \(OK'_1 = \mathtt {or}(K'_1)\) (see Fig. 2). This implies that the first round keys linearly affect \(L_{1}\) and \(R_{1}\).

Fig. 2.
figure 2

Key linearization of FOX64

Equivalent Transform (Fig. 3 ). In the second round, \(OK'_1\) and \(K'_1\) are XORed with \(LK_2\) in the first and last operations of \(\mathtt {f32}\) function. Let \(LK'_2 = LK_2 \oplus K'_1 \oplus OK'_1\), \(K1'' = K1' \oplus LK2\), and \(OK1'' = \mathtt {or} (OK'1 \oplus LK2)\) be new keys. Then \(\mathtt {f32}\) function contains \(K'_2\) \((= LK'_2 || RL_2)\), and \(K''_1\) and \(OK1''\) linearly affect outputs of the second round.

In the third round, \(OK''_1\) and \(K''_1\) are also XORed with \(LK_3\) in the first and last operations of \(\mathtt {f32}\) function. Let \(LK'_3 = LK_3 \oplus K''_1 \oplus OK''_1\), \(K1''' = K1'' \oplus LK2\), and \(OK1''' = \mathtt {or}(OK''1 \oplus LK2)\) be new keys (see Fig. 3).

Note that the same technique can be applied to the inverse of FOX64, because the round function of FOX64 has the involution property.

3.3 Attack on the 6-Round FOX64

In this attack, we use the following one-round keyless linear relation of the Lai-Massey construction.

$$ \mathtt {or}^{-1}(L_{i + 1}) \oplus R_{i + 1} = L_{i} \oplus R_{i}. $$

From this equation, the 16-bit relation is obtained as follows

$$ ((L^{(1)}_{4} \oplus L^{(3)}_{4}) || L^{(3)}_{4}) \oplus (R^{(3)}_{4}||R^{(1)}_{4}) = (L^{(3)}_{3} || L^{(1)}_{3}) \oplus (R^{(3)}_{3} || R^{(1)}_{3}), $$

where \(L^{(j)}_{i}\) and \(R^{(j)}_{i}\) are the \(j\)-th byte of \(L_{i}\) and \(R_{i}\), respectively, and \(L^{(3)}_{i}\) and \(R^{(3)}_{i}\) are the most significant bytes ,i.e., \(L_{i} = \{L^{(3)}_{i}||L^{(2)}_{i}||L^{(1)}_{i}||L^{(0)}_{i} \}\) and \(R_{i} = \{R^{(3)}_{i}||R^{(2)}_{i}||R^{(1)}_{i}||R^{(0)}_{i} \}\).

Forward Computation in \(\varvec{\mathcal {F}_{(1)}}\) : For given \(\{\) \(K'_{2}\), \(LK'_{3}\), \(RK'^{(3)}_{3}\), \(RK'^{(1)}_{3}\), \(K'''^{(3)}_1\), \(K'''^{(1)}_1\), \(OK'''^{(3)}_1\), \(OK'''^{(1)}_1 \) \(\}\), \((L^{(3)}_{3} || L^{(1)}_{3}) \oplus (R^{(3)}_{3} || R^{(1)}_{3})\) is computable. Since (\(K'''^{(3)}_1\) \(||\) \(K'''^{(1)}_1\)) and (\(OK'''^{(3)}_1 ||OK'''^{(1)}_1 \)) linearly affect \((L^{(3)}_{3} || L^{(1)}_{3})\) and \((R^{(3)}_{3} || R^{(1)}_{3})\), respectively, we can regard \((K'''^{(3)}_1 || K'''^{(1)}_1) \oplus (OK'''^{(3)}_1 ||OK'''^{(1)}_1)\) as a new 16-bit key \(XORK_1\). Then, \((L^{(3)}_{3} || L^{(1)}_{3}) \oplus (R^{(3)}_{3} || R^{(1)}_{3})\) is obtained from \(112 (=64 + 32 + 8 + 8)\) bits of the key \(\{K'_{2}, LK'_{3}, RK'^{(3)}_{3}, RK'^{(1)}_{3} \}\) and linearly-dependent 16-bit key \(XORK_1\).

Backward Computation in \(\varvec{\mathcal {F}_{(2)}}\) : \(((L^{(1)}_{4} \oplus L^{(3)}_{4}) || L^{(3)}_{4}) \oplus (R^{(3)}_{4}||R^{(1)}_{4}) \) is obtained from 112 (=64 + 32 + 16) bits of the key \(\{\) \(K_6\), \(LK_{5}\), \(RK^{(1)}_{5}\), \(RK^{(3)}_{5}\) \(\}\). Using the indirect matching technique [3], 8 bits out of 16 bits of \(XORK_1\) are moved to the left half of the matching equation. Then, the left and right halves of the equation contains 120 bits of the key, i.e., \(|\mathcal {K}_{(1)}| = |\mathcal {K}_{(2)}| = 120\).

Evaluation. When the parameter \(N = 15\), the time complexity for finding the involved 240-bit key is estimated as

$$ C_{comp} = \max (2^{120}, 2^{120}) \times 15 + 2^{240 - 15 \cdot 16} = 2^{124}. $$

The required data for the attack is only 15 (=\(\max (15, \lceil (240- 15 \cdot 16)/64 \rceil )\)) chosen plaintext/ciphertext pairs, and the required memory is estimated as about \(2^{124}\) (=\(\min (2^{120}, 2^{120})\) \(\times \) \(15\)) blocks.

Fig. 3.
figure 3

Function reduction of FOX64

3.4 Attack on the 7-Round FOX64

If the function reduction is applied as well in the backward direction, the 7-round attack is feasible, i.e., the relation of \(L_{7} \oplus R_{7}\) is fixed to a constant \(CON_{2}\). Due to the involution property of the FOX64 round function, \(((L^{(1)}_{4} \oplus L^{(3)}_{4})\) \(||\) \(L^{(3)}_{4})\) \(\oplus \) \((R^{(3)}_{4}||R^{(1)}_{4})\) is also obtained from 112 (= 64 + 32 + 8 + 8) bits of the key and linearly-dependent 16-bit key \(XORK_2\). In this attack, we further regard \(XORK_1 \oplus XORK_2\) as a 16-bit new key. Then, similar to the attack on the 6-round FOX64, the left and right halves of the equation contain 120 bits of the key, i.e., \(|\mathcal {K}_{(1)}| = |\mathcal {K}_{(2)}| = 120\).

Repetitive ASR Approach. Recall that plaintexts and ciphertexts need to satisfy the 32-bit relations, \(L_{0} \oplus R_{0} = CON_{1}\) and \(L_{7} \oplus R_{7} = CON_{2}\). The required data for finding such pairs is equivalently estimated as the game that an attacker finds 32-bit multicollisions by 32-bit-restricted inputs. It has been known that an \(n\)-bit \(t\)-multicollision is found in \(t!^{1/t} \cdot 2^{n \cdot (t-1) / t}\) random data with high probability [25].

In the basic ASR approach, at least \(15 (= 240/16)\) multicollisions are necessary to detect the 240-bit involved key. To obtain such pairs with a high probability, it require \( 2^{32.55} ( = 15!^{1/15} \cdot 2^{32 \cdot (14) / 15})\) plaintext/ciphertext pairs. However, it is infeasible, since the degree of freedom of plaintexts is only 32 bits.

In order to overcome this problem, we utilize the repetitive all-subkeys recovery approach with \(M = 2\) variant. In each all-subkeys recovery phase, the required data is reduced to \(8\) and \(7\). Then, such eight 32-bit multicollisions are obtained from \(2^{29.9}\) plaintext/ciphertext pairs with a high probability. Thus, we can obtain the required data by exploiting free 32 bits.

Evaluation. The time complexity for finding the involved 240 bits key is estimated as

$$ C_{comp} = (\max (2^{120}, 2^{120}) \times 8) \times 2 + (2^{240 - 8 \cdot 16}) = 2^{124}. $$

The remaining \(208 (=448 - 240)\) bits are obtained by recursively applying all-subkeys recovery attacks. The time complexity for this phase is roughly estimated as \(2^{106 (=208/2 + 2)}\) using 4 (\( = \lceil 208/64 \rceil \)) plaintext/ciphertext pairs.

The required data is \(2^{30.9} (= 2^{29.9} \times 2)\) plaintext/ciphertext pairs, and the required memory is about \(2^{123}\) (=\( \max (2^{240 - 128}, \min (2^{120}, 2^{120}) \times 8\))) blocks.

3.5 Function Reduction on FOX128

Key Linearization (Fig. 4 ). If two 16-bit relations of \(LL_{0} \oplus LR_{0}\) and \(RL_{0} \oplus RR_{0}\) are fixed to \(CON_{1}\) and \(CON_{2}\), respectively, the input of \(\mathtt {f64}\) is fixed as \(\mathtt {f64}(CON_{1}\) \(||\) \(CON_{2}, K_1)\). By regarding \(\mathtt {f64}(CON_{1} || CON_{2}, K_1)\) as a 64-bit new key \(K'_1 = {KL'_1 || KR'_1}\), \(KL'_1\) and \(KR'_1\) are XORed to \(\{\) \(LR_{0}\) and \(LR_{0}\) \(\}\) and \(\{\) \(RR_{0}\) and \(RR_{0}\) \(\}\), respectively. The state after the first round is expressed as follows (see Fig. 4).

$$\begin{aligned} (LL_{1}||LR_{1} ||RL_{1}||RR_{1})&= (\mathtt {or}(LL_{0}) \oplus OKL'_1) || (LR_{0} \oplus KL'_1 ) || (\mathtt {or}(RL_{0}) \oplus \\&\quad OKR'_1) || (RR_{0} \oplus KR'_1 ), \end{aligned}$$

where \(OKL'_1 = \mathtt {or}(KL'_1)\) and \(OKR'_1 = \mathtt {or}(KR'_1)\). This implies that the first round keys linearly affect \(LL_{1}\), \(LR_{1}\), \(RL_{1}\) and \(RR_{1}\).

Equivalent Transform (Fig. 4 ). The equivalent transform is done similar to FOX64 as shown in Fig. 4.

Fig. 4.
figure 4

Function reduction of FOX128

3.6 Attack on the 6-Round FOX128

We use the following one-round keyless linear relation of the modified Lai-Massey construction,

$$ \mathtt {or}^{-1}(LL_{i + 1}) \oplus LR_{i + 1} = LL_{i} \oplus LR_{i}. $$

From this equation, the 16-bit relation is obtained as follows:

$$ ((LL^{(1)}_{4} \oplus LL^{(3)}_{4}) || LL^{(3)}_{4}) \oplus (LR^{(3)}_{4}||LR^{(1)}_{4}) = (LL^{(3)}_{3} || LL^{(1)}_{3}) \oplus (LR^{(3)}_{3} || LR^{(1)}_{3}). $$

Forward Computation in \(\varvec{\mathcal {F}_{(1)}}\) : For given \(\{\) \(K'_{2}\), \(LK'_{3}\), \(RKL'^{(3)}_{3}\), \(RKL'^{(1)}_{3}\), \(KL'''^{(3)}_1\), \(KL'''^{(1)}_1\), \(OKL'''^{(3)}_1\), \(OKL'''^{(1)}_1 \) \(\}\), \((LL^{(3)}_{3} || LL^{(1)}_{3}) \oplus (LR^{(3)}_{3} || LR^{(1)}_{3})\) is computable. Since (\(KL'''^{(3)}_1 || KL'''^{(1)}_1\)) and (\(OKL'''^{(3)}_1 ||OKL'''^{(1)}_1 \)) linearly affect the matching states \((LL^{(3)}_{3} || LL^{(1)}_{3})\) and \((LR^{(3)}_{3} || LR^{(1)}_{3})\), respectively, we are able to regard \((LK'''^{(3)}_1 || LK'''^{(1)}_1)\) \(\oplus \) \((OKL'''^{(3)}_1 ||OKL'''^{(1)}_1)\) as a new 16-bit key \(XORK_1\). Then, \((LL^{(3)}_{3} || LL^{(1)}_{3}) \oplus (LR^{(3)}_{3} || LR^{(1)}_{3})\) is obtained from \(208 (=128 + 64 +~8 +~8)\) bits of the key \(\{ K'_{2}, LK'_{3}, RKL'^{(3)}_{3}, RKL'^{(1)}_{3}\}\) and linearly-dependent 16 bits key \(XORK_1\).

Backward Computation in \(\varvec{\mathcal {F}_{(2)}}\) : \(((LL^{(1)}_{4} \oplus LL^{(3)}_{4}) || LL^{(3)}_{4}) \oplus (LR^{(3)}_{4}||LR^{(1)}_{4})\) is obtained from \(208 (=128 + 64 + 16)\) bits of the key \(\{\) \(K_6\), \(LK_{5}\), \(RKL^{(1)}_{5}\), \(RKL^{(3)}_{5}\) \(\}\). Using the indirect matching technique, 8 bits out of 16-bit \(XORK_1\) are moved to the left half of the matching equation. Then, the left and right halves of the equation contain 216 bits of the key, i.e., \(|\mathcal {K}_{(1)}| = |\mathcal {K}_{(2)}| = 216\).

Evaluation. When the parameter \(N = 26\), the time complexity for the involved 432 bits is estimated as

$$ C_{comp} = \max (2^{216}, 2^{216}) \times 26 = 2^{221}. $$

The remaining \(352\) \((= 768 - 416)\) bits are obtained by recursively applying the all-subkeys recovery attack. The time complexity for determining the remaining subkeys is roughly estimated as \(2^{177.6 (=352/2 + 1.6)}\) using 2 (\( = \lceil 352/128 \rceil \)) plaintext/ciphertext pairs.

The required data is only 26 chosen plaintext/ciphertext pairs, and the required memory is about \(2^{221} (=\min (2^{216}, 2^{216}) \times 26\)) blocks.

3.7 Attack on the 7-Round FOX128

If the function reduction is also used in the backward direction, the 7-round attack is feasible, i.e., two 16-bit relations of \(LL_{7} \oplus LR_{7}\) and \(RL_{7} \oplus RR_{7}\) are fixed to \(CON_{3}\) and \(CON_{4}\), respectively.

Due to the involution property of the FOX128 round function, \(((LL^{(1)}_{4}\) \(\oplus \) \(LL^{(3)}_{4})\) \(||\) \(LL^{(3)}_{4})\) \(\oplus \) \((LR^{(3)}_{4}||LR^{(1)}_{4})\) is also obtained from \(208 (=128 + 64 + 8 + 8)\) bits of the key and linearly-dependent 16 bits key \(XORK_2\). In this attack, we further regard \(XORK_1 \oplus XORK_2\) as a 16 bit new key. Then, similar to the attack on the 6-round FOX128, the left and right halves of the equation contain 216 bits of the key, i.e., \(|\mathcal {K}_{(1)}| = |\mathcal {K}_{(2)}| = 216\).

Repetitive ASR Approach. Recall that plaintexts and ciphertexts need to satisfy 64-bit \((32 \times 2)\) relations, \(LL_{0} \oplus LR_{0}\) and \(RL_{0} \oplus RR_{0}\), and \(LL_{7} \oplus LR_{7}\) and \(RL_{7} \oplus RR_{7}\), respectively. The cost is equivalently estimated as the game that an attacker finds 64-bit multicollisions with 64-bit-restricted inputs.

In the basic ASR approach, at least \(27 (=432/16)\) multicollisions are needed to detect the 432-bit involved key. To obtain such pairs with a high probability, it requires \( 2^{65.1} (= 27!^{1/27} \cdot 2^{64 \cdot (26) / 27})\) pairs. However, it is infeasible, since the degree of freedom of plaintexts is only 64 bits.

We utilize the repetitive all-subkeys recovery approach with \(M = 2\) variant. In each all-subkeys recovery phase, the required data is reduced to \(13\) and \(14\). Such 14 64-bit multicollisions are obtained, given \(2^{62.0}\) plaintext/ciphertext pairs with high probability.

Evaluation. The time complexity for finding involved 432 bits of the key is estimated as

$$ C_{comp} = (\max (2^{216}, 2^{216}) \times 14) \times 2 + 2^{432 - 16 \cdot 14} = 2^{224}. $$

The remaining \(480 (=896 - 432)\) bits are obtained by recursively applying the all-subkeys recovery attack. The time complexity for this phase is roughly estimated as \(2^{242 (=480/2 + 2)}\) using 4 (\( = \lceil 480/128 \rceil )\) plaintext/ciphertext pairs.

The required data is \(2^{63.0} (= 2^{62.0} \times 2)\) plaintext/ciphertext pairs, and the required memory is about \(2^{242}\) blocks.

4 Improved All-Subkeys Recovery Attacks on KATAN32/48/64

In this section, we show that the function reduction techniques are applicable to KATAN32/48/64, then we improve the ASR attacks on KATAN32/48/64 block ciphers by 9, 5 and 5 rounds, respectively.

After a short description of KATAN, we show how to apply the function reduction to KATAN32 in detail. Then, the detailed explanation for the attack on the 119-round reduced KATAN32 is given. For KATAN48 and KATAN64, the detailed explanations for applying the function reductions are omitted, since the analysis is done similar to KATAN32.

4.1 Description of KATAN

KATAN [8] family is a feedback shift register-based block cipher consisting of three variants: KATAN32, KATAN48 and KATAN64 whose block sizes are 32-, 48- and 64-bit, respectively. All of the KATAN ciphers use the same key schedule accepting an 80-bit key and 254 rounds. The plaintext is loaded into two shift registers \(L_{1}\) and \(L_{2}\). In each round, \(L_{1}\) and \(L_{2}\) are shifted by one bit, and the least significant bits of \(L_{1}\)  and \(L_{2}\) are updated by \(f_{b}(L_{2})\) and \(f_{a}(L_{1})\), respectively. The bit functions \(f_{a}\) and \(f_{b}\) are defined as follows:

$$\begin{aligned} f_{a}(L_{1})= & {} L_{1}[x_{1}] \oplus L_{1}[x_{2}] \oplus (L_{1}[x_{3}] \cdot L_{1}[x_{4}]) \oplus (L_{1}[x_{5}] \cdot IR) \oplus k_{2i}, \\ f_{b}(L_{2})= & {} L_{2}[y_{1}] \oplus L_{2}[y_{2}] \oplus (L_{2}[y_{3}] \cdot L_{2}[y_{4}]) \oplus (L_{2}[y_{5}] \cdot L_{2}[y_{6}]) \oplus k_{2i+1}, \end{aligned}$$

where \(L[x]\) denotes the \(x\)-th bit of \(L\), \(IR\) denotes the round constant, and \(k_{2i}\) and \(k_{2i+1}\) denote the 2-bit \(i\)-th round key. Note that for KATAN family, the round number starts from 0 instead of 1, i.e., KATAN family consists of round functions starting from the 0-th round to the 253-th round. \(L_{1}^{i}\) or \(L_{2}^{i}\) denote the \(i\)-th round registers \(L_{1}\) or \(L_{2}\), respectively. Let \(IR^i\) be the \(i\)-th round constant. For KATAN48 or KATAN64, in each round, the above procedure is iterated twice or three times, respectively. All of the parameters for the KATAN ciphers are listed in Table 2.

Table 2. Parameters of KATAN family

The key scheduling function of KATAN ciphers copies the 80-bit user-provided key to \(k_{0},...,k_{79}\), where \(k_{i} \in \{0,1\}\). Then, the remaining 428 bits of the round keys are generated as follows:

$$ k_{i} = k_{i-80} \oplus k_{i-61} \oplus k_{i-50} \oplus k_{i-13} \quad \ \text{ for } i = 80, ..., 507. $$

4.2 Function Reduction on KATAN32

Key Linearization. In the \(i\)-th round function of KATAN32, two key bits \(k_{2i}\) and \(k_{2i+1}\) are linearly inserted into states \(L^i_1[0]\) and \(L^i_2[0]\), respectively, these states are not updated in the \(i\)-th round. Thus, the key linearization technique is not necessary.

Equivalent Transform. Let us consider how linearly-inserted key bits are used in the following round functions. For instance, \(k_1\) is linearly inserted to \(L^1_1[0]\), and the updated state \(L^1_1[0]\) is described as (\(X[0] \oplus k_1\)), where \(X[i]\) is defined as

$$ X[i] = L^0_{2}[18 - i] \oplus L^0_{2}[7 - i] \oplus (L^0_{2}[12 - i] \cdot L^0_{2}[10 - i]) \oplus (L^0_{2}[8 - i] \cdot L^0_{2}[3 - i]), $$

where \(L_{2}^{0}[-i] = L_{2}^{i}[0]\). For computing \(f_a(L_1)\), the state \(L^1_1[0]\) = (\(X[0] \oplus k_1\)) is used in the following five positions,

  • \(L^4_{2}[3]: ((X[0] \oplus k_1 ) \cdot IR^4)\) is XORed with \(k_8\). If \(X[0]\) is fixed to a constant \(CON_{1}\), a new key \(k'_8\) is defined as \(((CON_{1} \oplus k_1 ) \cdot IR^4) \oplus k_8\).

  • \(L^6_{2}[5]: ((X[0] \oplus k_1 ) \cdot L^6_1[8])\) = \(((X[0] \oplus k_1 ) \cdot L^0_1[2])\) is XORed with \(k_{12}\). If \(L^0_1[2]\) is also fixed to a constant \(CON_{2}\), a new key \(k'_{12}\) is defined as \(((CON_{1} \oplus k_1) \cdot CON_{2}) \oplus k_{12}\).

  • \(L^8_{2}[7]: (X[0] \oplus k_1 )\) is directly XORed with \(k_{16}\). A new key \(k'_{16}\) is defined as \((CON_{1} \oplus k_1) \oplus k_{16}\).

  • \(L^9_{2}[8]\) : \(((X[0] \oplus k_1 ) \cdot L^9_1[5])\) = \(((X[0] \oplus k_1 ) \cdot (X[3] \oplus k_7 ))\) is XORed with \(k_{18}\). If \(X[3]\) is also fixed to a constant \(CON_{3}\), a new key \(k'_{18}\) is defined as \(((CON_{1} \oplus k_1) \cdot (CON_{3} \oplus k_7 )) \oplus k_{18}\).

  • \(L^{13}_{2}[12]: (X[0] \oplus k_1 )\) is directly XORed with \(k_{26}\). A new key \(k'_{26}\) is defined as \((CON_{1} \oplus k_1) \oplus k_{26}\).

Thus, by fixing \(X[0]\), \(L^0_1[2]\) and \(X[3]\) to constants and defining new key bits \(k'_{8}\), \(k'_{12}\), \(k'_{16}\), \(k'_{18}\) and \(k'_{26}\), we can omit one key bit \(k_1\), i.e., we can compute without \(k_1\) in the forward direction. Note that \(CON_{1}\), \(CON_{2}\) and \(CON_{3}\) are not restricted to constant values. Even if \(CON_{1}\), \(CON_{2}\) and \(CON_{3}\) are expressed by only key bits, we can define new keys in the same manner.

Table 3. Conditions for 8-bit function reductions

Conditions for Function Reduction. Table 3 shows conditions for 8-bit function reductions. If these 14 bits of \(L^0_1[0]\), \(L^0_1[2]\), \(L^0_1[2]\), \(X[0], \ldots , X[10]\) are fixed to constants or expressed by only key bits, then we can eliminate 8 bits of the key, \(k_1\), \(k_3\), \(k_5\), \(k_7\), \(k_9\), \(k_{11}\), \(k_{13}\) and \(k_{15}\), in the forward computation of KATAN32.

Let us explain how to control \(X[0]\) and \(X[10]\) by exploiting the degree of freedom of plaintexts. \(X[0]\) to \(X[10]\) are expressed as follows:

$$ \begin{array}{lll} X[0] &{}=&{} \underline{L^0_{2}[18]} \oplus L^0_{2}[7] \oplus (L^0_{2}[12] \cdot L^0_{2}[10]) \oplus (L^0_{2}[8] \cdot L^0_{2}[3]), \\ X[1] &{}=&{} \underline{L^0_{2}[17]} \oplus L^0_{2}[6] \oplus (L^0_{2}[11] \cdot L^0_{2}[9]) \oplus (L^0_{2}[7] \cdot L^0_{2}[2]), \\ X[2] &{}=&{} \underline{L^0_{2}[16]} \oplus L^0_{2}[5] \oplus (L^0_{2}[10] \cdot L^0_{2}[8]) \oplus (L^0_{2}[6] \cdot L^0_{2}[1]), \\ X[3] &{}=&{} \underline{L^0_{2}[15]} \oplus L^0_{2}[4] \oplus (L^0_{2}[9] \cdot L^0_{2}[7]) \oplus (L^0_{2}[5] \cdot L^0_{2}[0]), \\ X[4] &{}=&{} \underline{L^0_{2}[14]} \oplus L^0_{2}[3] \oplus (L^0_{2}[8] \cdot L^0_{2}[6]) \oplus (\underline{L^0_{2}[4]} \cdot (Y[0] \oplus k_0)), \\ X[5] &{}=&{} \underline{L^0_{2}[13]} \oplus L^0_{2}[2] \oplus (L^0_{2}[7] \cdot L^0_{2}[5]) \oplus (\underline{L^0_{2}[3]} \cdot (Y[1] \oplus k_2)), \\ X[6] &{}=&{} \underline{L^0_{2}[12]} \oplus L^0_{2}[1] \oplus (L^0_{2}[6] \cdot L^0_{2}[4]) \oplus (\underline{L^0_{2}[2]} \cdot (Y[2] \oplus k_4)), \\ X[7] &{}=&{} \underline{L^0_{2}[11]} \oplus L^0_{2}[0] \oplus (L^0_{2}[5] \cdot L^0_{2}[3]) \oplus (\underline{L^0_{2}[1]} \cdot (Y[3] \oplus k_6)), \\ X[8] &{}=&{} \underline{L^0_{2}[10]} \oplus (Y[0] \oplus k_0) \oplus (L^0_{2}[4] \cdot L^0_{2}[2]) \oplus (\underline{L^0_{2}[0]} \cdot (Y[4] \oplus k_8)), \\ X[9] &{}=&{} \underline{L^0_{2}[9]} \oplus (Y[1] \oplus k_2) \oplus (L^0_{2}[3] \cdot L^0_{2}[1]) \oplus ((\underline{Y[0]} \oplus k_0) \cdot (\underline{Y[5]} \oplus k_{10}), \\ X[10] &{}=&{} \underline{L^0_{2}[8]} \oplus (Y[2] \oplus k_4) \oplus (L^0_{2}[2] \cdot L^0_{2}[0]) \oplus ((\underline{Y[1]} \oplus k_2) \cdot (\underline{Y[6]} \oplus k_{12}), \end{array} $$

where

$$ \begin{array}{lll} Y[0] &{}=&{} \underline{L^0_{1}[12]} \oplus L^0_{1}[7] \oplus (L^0_{1}[5] \cdot L^0_{1}[8]) \oplus (L^0_{1}[3] \cdot IR^0), \\ Y[1] &{}=&{} \underline{L^0_{1}[11]} \oplus L^0_{1}[6] \oplus (L^0_{1}[4] \cdot L^0_{1}[7]) \oplus (L^0_{1}[2] \cdot IR^1), \\ Y[2] &{}=&{} L^0_{1}[10] \oplus L^0_{1}[5] \oplus (L^0_{1}[3] \cdot L^0_{1}[6]) \oplus (L^0_{1}[1] \cdot IR^2), \\ Y[3] &{}=&{} L^0_{1}[9 ] \oplus L^0_{1}[4] \oplus (L^0_{1}[2] \cdot L^0_{1}[5]) \oplus (L^0_{1}[0] \cdot IR^3), \\ Y[4] &{}=&{} L^0_{1}[8 ] \oplus L^0_{1}[3] \oplus (L^0_{1}[1] \cdot L^0_{1}[4]) \oplus (X[0] \cdot IR^4), \\ Y[5] &{}=&{} \underline{L^0_{1}[7 ]} \oplus L^0_{1}[2] \oplus (L^0_{1}[0] \cdot L^0_{1}[3]) \oplus (X[1] \cdot IR^5), \\ Y[6] &{}=&{} \underline{L^0_{1}[6 ]} \oplus L^0_{1}[1] \oplus (X[0] \cdot L_{1}[2]) \oplus (X[2] \cdot IR^6). \end{array} $$

\(X[0], \ldots ,X[3]\) are easily fixed to constants by controlling 4 bits of \(L^{0}_{2}[18]\), \(L^{0}_{2}[17]\), \(L^{0}_{2}[16]\) and \(L^{0}_{2}[15]\) (4-bit condition). \(X[4], \ldots ,X[8]\) contain key bits in AND operations. If \(L^{0}_{2}[4] = L^{0}_{2}[3] = L^{0}_{2}[2] = L^{0}_{2}[1] = L^{0}_{2}[0] = 0\), those key bits are omitted from the equations (5-bit condition). Then \(X[4]\) to \(X[7]\) are also fixed to constants by controlling 4 bits of \(L^{0}_{2}[14]\), \(L^{0}_{2}[13]\), \(L^{0}_{2}[12]\) and \(L^{0}_{2}[11]\) (4 bit condition). In \(X[8]\), \(k_0\) is also linearly inserted. If (\(L^{0}_{2}[10] \oplus Y[0] \oplus (L^{0}_{2}[4] \cdot L^{0}_{2}[2]))\) is fixed to a constant by controlling \(L^{0}_{2}[10]\), then \(X[8]\) is expressed as the form of \(CON \oplus k_0\), which depends only on key bits, where \(CON\) is an arbitrary constant.

In \(X[9]\) and \(X[10]\), 4 bits of \(Y[0]\), \(Y[1]\), \(Y[5]\) and \(Y[6]\) are needed to be fixed. These values are controlled by \(L^{0}_{1}[12]\), \(L^{0}_{1}[11]\), \(L^{0}_{1}[7 ]\) and \(L^{0}_{1}[6]\). If the other bits are fixed by \(L^{0}_{2}[9]\) and \(L^{0}_{2}[8]\), \(X[9]\) and \(X[10]\) are expressed by only key bits.

Therefore, if plaintexts satisfy \(23 (= 3 + 4 + 5 + 4 + 1 + 4 + 2)\) bit conditions described in Table 3, 8 bits of the key are able to be omitted when mounting the ASR attack.

Procedure for Creating Plaintexts. We show how to create plaintexts satisfying the conditions. By using the equations of \(X[0]\) to \(X[10]\) and \(Y[0]\) to \(Y[6]\), such plaintexts are easily obtained as follows.

  1. 1.

    Set 18 predetermined values of \(L^{0}_{1}[0]\), \(L^{0}_{1}[1]\), \(L^{0}_{1}[2]\), \(X[0], \ldots , X[10]\), \(Y[0]\), \(Y[1]\), \(Y[5]\) and \(Y[6]\).

  2. 2.

    Choose values of free 9 bits of \(L^{0}_{2}[5]\), \(L^{0}_{2}[6]\), \(L^{0}_{2}[7]\), \(L^{0}_{1}[3]\), \(L^{0}_{1}[4]\), \(L^{0}_{1}[5]\), \(L^{0}_{1}[8]\), \(L^{0}_{1}[9]\) and \(L^{0}_{1}[10]\).

  3. 3.

    Obtain \(L^{0}_{2}[8], \ldots , L^{0}_{2}[13]\) from equations of \(X[5], \ldots , X[10]\), and \(L^{0}_{1}[6]\) and \(L^{0}_{1}[7]\) from equations of \(Y[5]\) and \(Y[6]\), respectively.

  4. 4.

    Obtain \(L^{0}_{2}[14], \ldots , L^{0}_{2}[18]\) from equations of \(X[0], \ldots , X[4]\), and \(L^{0}_{1}[11]\) and \(L^{0}_{1}[12]\) from equations of \(Y[0]\) and \(Y[1]\), respectively.

  5. 5.

    Repeat steps 2 to 4 until the required number of plaintexts are obtained.

4.3 Attacks on 119-Round KATAN32

Let us consider the 119-round variant of KATAN32 starting from the first (0-th) round. In this attack, \(L^{69}_2[18]\) is chosen as the matching state.

Forward Computation in \(\varvec{\mathcal {F}_{(1)}}\) : \(L^{69}_2[18]\) depends on 83 subkey bits. This implies that \(L^{69}_{2}[18]\) can be computed by a plaintext \(P\) and 83 bits of subkeys. More specifically, \(L^{69}_{2}[18] = \mathcal {F}_{(1)}(P, \mathcal {K}_{(1)})\), where \(\mathcal {K}_{(1)} \in \) \(\{k_{0}, ..., k_{70}\), \(k_{72}\), \(\ldots \), \(k_{76}\), \(k_{80}\), \(k_{83}\), \(k_{84}\), \(k_{85}\), \(k_{89}\), \(k_{93}\), \(k_{100} \}\) and \(|\mathcal {K}_{(1)}| = 83\). If the function reduction technique with the 23-bit condition of plaintexts is used, 8 bits of \(\{k_1\), \(k_3\), \(k_5\), \(k_7\), \(k_9\), \(k_{11}\), \(k_{13}\), \(k_{15}\}\) can be omitted in computations of \(\mathcal {F}_{(1)}\). Thus, \(L^{69}_2[18]\) is computable with \(75(= 83 - 8)\) bits. In addition, since 4 bits of \(\{k_{68}, k_{75}, k_{85}, k_{100} \}\) linearly affect \(L^{69}_{2}[18]\), we can regard \(k_{68} \oplus k_{75} \oplus k_{85} \oplus k_{100}\) as a new key \(k_f\). Thus, \(72 (= 75 - 4 + 1)\) bits are involved in the forward computation.

Backward Computation in \(\varvec{\mathcal {F}_{(2)}}\) : In the backward computation starting from the 118-th round, the matching state \(L^{69}_{2}[18]\) is computed as \(L^{69}_{2}[18] = \mathcal {F}_{(2)}^{-1}(C, \mathcal {K}_{(2)})\), where \(\mathcal {K}_{(2)} \in \) \(\{k_{138}\), \(k_{150}\), \(k_{154}\), \(k_{158}\), \(k_{160}\), \(k_{162}\), \(k_{165}\), \(k_{166}\), \(k_{168}\), \(k_{170}\), \(k_{172}, \ldots k_{237}\}\), and \(|\mathcal {K}_{(2)}| = 76\). Since 4 bits of \(\{k_{138}, k_{160}, k_{165}, k_{175} \}\) linearly affect \(L^{69}_{2}[18]\), we can regard \(k_{138} \oplus k_{160} \oplus k_{165} \oplus k_{175}\) as a new key \(k_b\). Furthermore, by the indirect matching, \(k_b\) is moved to the forward computation, then \(k_b \oplus k_f\) is regarded as a new key in \(\mathcal {F}_{(1)}\). Thus, \(72 (= 76 - 4 )\) bits are involved in the backward computation.

Evaluation. For the 119-round reduced KATAN32, the matching state \(S\) is chosen as \(L^{69}_{2}[18]\) (1-bit state). When \(N = 144\) \((\le ( 72 + 72)/1)\), the time complexity for finding \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) is estimated as

$$ C_{comp} = \max (2^{72}, 2^{72}) \times 144 = 2^{79.1}. $$

The required data is only 144 chosen plaintext/ciphertext pairs in which the 23 bits of each plaintext satisfy conditions. The required memory is about \(2^{79.1}\) blocks.

Finally, we need to find the remaining \(94 (= 119 \times 2 - 144)\) bits of subkeys by using the simple MITM approach in the setting where \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) are known. The time complexity and the required memory for this process are roughly estimated as \(2^{49} (= 2^{48}+2^{46})\) and \(2^{46}\) blocks, respectively. These costs are obviously much less than those of finding \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\).

4.4 Function Reduction on KATAN48

Table 4 shows conditions for 4-bit function reductions, where \(X'[i]\) is defined as:

$$ X'[i] = L^0_{2}[28 - i] \oplus L^0_{2}[19 - i] \oplus (L^0_{2}[21 - i] \cdot L^0_{2}[13 - i]) \oplus (L^0_{2}[15 - i] \cdot L^0_{2}[6 - i]). $$

If these values are fixed to target constants, we can eliminate the key bits in the computation of KATAN48.

Table 4. Conditions for 4-bit function reductions

\(X'[0], \ldots ,X'[6]\) are fixed by controlling 7 bits of \(L^0_{2}[22], \ldots , L^0_{2}[28]\) (7 bit condition). \(X'[7], \ldots ,X'[15]\) contain key bits in AND operations. If \(L^0_{2}[8] = L^0_{2}[7] =, \ldots , = L^0_{2}[1] = L^{0}_{2}[0] = 0\), these key bits are omitted from these equations (9 bit condition). Then \(X'[7], \ldots ,X'[15]\) are also fixed by controlling 9 bits of \(L^0_{2}[13], \ldots , L^0_{2}[21]\) (9 bit condition).

Therefore, if plaintexts satisfy 33 (= 8 + 7 + 9 + 9) bit conditions described in Table 4, 4 bits of the key are able to be omitted when mounting the ASR attack.

4.5 Attacks on 105-Round KATAN48

Let us consider the 105-round variant of KATAN48 starting from the first (0-th) round. In this attack, \(L^{61}_2[28]\) is chosen as the matching state.

Forward Computation in \(\varvec{\mathcal {F}_{(1)}}\) : \(L^{61}_2[28]\) depends on 79 subkey bits. This implies that \(L^{61}_{2}[28]\) can be computed by a plaintext \(P\) and 79 bits of subkeys. More specifically, \(L^{61}_{2}[28] = \mathcal {F}_{(1)}(P, \mathcal {K}_{(1)})\), where \(\mathcal {K}_{(1)} \in \) \(\{k_{0}, ..., k_{68}\), \(k_{70}\), \(k_{71}\), \(k_{72}\), \(k_{75}\), \(k_{77}\), \(k_{78}\), \(k_{81}\), \(k_{85}\), \(k_{87}\), \(k_{92} \}\) and \(|\mathcal {K}_{(1)}| = 79\). If the function reduction technique with the 33-bit condition of plaintexts is used, 4 bits of \(k_1\), \(k_3\), \(k_5\), \(k_7\) can be omitted in computations of \(\mathcal {F}_{(1)}\). Thus, \(L^{61}_2[28]\) is computable with \(75(= 79 - 4)\) bits. In addition, 3 bits of \(\{k_{75}, k_{81}, k_{92} \}\) linearly affect \(L^{61}_{2}[28]\). Thus, we can regard \(k_{75} \oplus k_{81} \oplus k_{92}\) as a new key. By using indirect matching, \(k_f = k_{75} \oplus k_{81} \oplus k_{92}\) is moved to \(\mathcal {F}_{(2)}\). Then, \(72 (= 75 - 3)\) bits are involved in the forward computation.

Backward Computation in \(\varvec{\mathcal {F}_{(2)}}\) : In the backward computation starting from the 104-th round, the matching state \(L^{61}_{2}[28]\) is computed as \(L^{61}_{2}[28]\! =\! \mathcal {F}_{(2)}^{-1}(C, \mathcal {K}_{(2)})\), where \(\mathcal {K}_{(2)} \in \) \(\{k_{122}\), \(k_{128}\), \(k_{130}\), \(k_{134}\), \(k_{136}\), \(k_{138}\), \(k_{140}\), \(k_{141}\), \(k_{142}\), \(k_{144}, \ldots k_{209}\}\), and \(|\mathcal {K}_{(2)}| = 75\). 4 bits of \(\{k_{122}, k_{130}, k_{140}, k_{141} \}\) linearly affect \(L^{61}_{2}[28]\). Thus, we can regard \(k_b = k_{122} \oplus k_{130} \oplus k_{140} \oplus k_{141}\) as a new key. Furthermore, we define \(k_f \oplus k_b\) as a new key. Then, \(72 (= 75 - 4 + 1)\) bits are involved in the backward computation.

Evaluation. For the 105-round reduced KATAN48, the matching state \(S\) is chosen as \(L^{61}_{2}[28]\) (1-bit state). When \(N = 144\) \((\le ( 72 + 72)/1)\), the time complexity for finding \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) is estimated as

$$ C_{comp} = \max (2^{72}, 2^{72}) \times 144 = 2^{79.1}. $$

The required data is only 144 chosen plaintext/ciphertext pairs. The required memory is about \(2^{79.1}\) blocks.

Finally, we need to find the remaining \(66\) \((= 105 \times 2 - 144)\) bits of subkeys by using the simple MITM approach in the setting where \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) are known. The time complexity and the required memory for this process are roughly estimated as \(2^{34} (= 2^{34} + 2^{32})\) and \(2^{32}\) blocks, respectively. These costs are obviously much less than those of finding \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\).

4.6 Function Reduction on KATAN64

Table 5 shows conditions for 2-bit function reductions, where \(X''[i]\) is defined as:

$$ X''[i] = L^0_{2}[38 - i] \oplus L^0_{2}[25 - i] \oplus (L^0_{2}[33 - i] \cdot L^0_{2}[21 - i]) \oplus (L^0_{2}[14 - i] \cdot L^0_{2}[9 - i]). $$

If these values are fixed to target constants, we can eliminate the key bits in the computation of KATAN64.

Table 5. Conditions for 2-bit function reductions

\(X''[0], \ldots , X''[9]\) are fixed by controlling 10 bits of \(L^0_{2}[29], \ldots , L^0_{2}[38]\) (10 bit condition). \(X''[10]\), \(\ldots \), \(X''[14]\) contain key bits in AND operations. If \(L^0_{2}[4] = , \ldots , = L^{0}_{2}[0] = 0\), these key bits are omitted from these equations (5 bit condition). Then \(X''[10], \ldots , X''[14]\) are also fixed by controlling 5 bits \(L^0_{2}[18], ..., L^0_{2}[23]\) (5 bit condition).

Therefore, if plaintexts satisfy \(29 (= 9 + 10 + 5 + 5)\) bit conditions described in Table 5, 2 bits of the key are able to be omitted when mounting the ASR attack.

4.7 Attacks on 99-Round KATAN64

Let us consider the 99-round variant of KATAN64 starting from the first (0-th) round. In this attack, \(L^{57}_2[38]\) is chosen as the matching state.

Forward Computation in \(\varvec{\mathcal {F}_{(1)}}\) : \(L^{57}_2[38]\) depends on 74 subkey bits. This implies that \(L^{57}_{2}[38]\) can be computed by a plaintext \(P\) and 74 bits of subkeys. More specifically, \(L^{57}_{2}[38] = \mathcal {F}_{(1)}(P, \mathcal {K}_{(1)})\), where \(\mathcal {K}_{(1)} \in \) \(\{k_{0}, ..., k_{66}\), \(k_{70}\), \(k_{71}\), \(k_{72}\), \(k_{75}\), \(k_{77}\), \(k_{81}\), \(k_{88}\) \(\}\) and \(|\mathcal {K}_{(1)}| = 74\). If the function reduction technique with the 29-bit condition of plaintexts is used, 2 bits of \(k_1\), \(k_3\) can be omitted in computations of \(\mathcal {F}_{(1)}\). Thus, \(L^{57}_2[38]\) is computable with \(72(= 74 - 2)\) bits. In addition, 3 bits of \(\{k_{71}, k_{77}, k_{88} \}\) linearly affect \(L^{57}_{2}[38]\). Thus, we can regard \(k_{71} \oplus k_{77} \oplus k_{88}\) as a new key. Then, \(70 (= 72 - 3 + 1)\) bits are involved in the forward computation.

Backward Computation in \(\varvec{\mathcal {F}_{(2)}}\) : In the backward computation starting from the 98-th round, the matching state \(L^{57}_{2}[38]\) is computed as \(L^{57}_{2}[38] = \mathcal {F}_{(2)}^{-1}(C, \mathcal {K}_{(2)})\), where \(\mathcal {K}_{(2)} \in \) \(\{ k_{114}\), \(k_{116}\), \(k_{120}\), \(k_{122}\), \(k_{124}\), \(k_{126}\), \(k_{128}\), \(k_{130}, \ldots k_{197}\}\), and \(|\mathcal {K}_{(2)}| = 75\). 3 bits of \(\{k_{114}, k_{122}, k_{131}\}\) linearly affect \(L^{57}_{2}[38]\). Thus, we can consider \(k_{114} \oplus k_{122} \oplus k_{131}\) as a new key, and move it to the forward computation by the indirect matching. Then, \(72 (= 75 - 3)\) bits are involved in the backward computation.

Evaluation. For the 99-round reduced KATAN64, the matching state \(S\) is chosen as \(L^{57}_{2}[38]\) (1-bit state).

When \(N = 142\) \((\le ( 72 + 70)/1)\), the time complexity for finding \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) is estimated as

$$ C_{comp} = \max (2^{72}, 2^{70}) \times 142 = 2^{79.1}. $$

The required data is only 142 chosen plaintext/ciphertext pairs. The required memory is about \(2^{77.1}\) blocks.

Finally, we need to find the remaining \(56 (= 99 \times 2 - 142)\) bits of subkeys by using the simple MITM approach in the setting where \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\) are known. The time complexity and the required memory for this process are roughly estimated as \(2^{28}\) and \(2^{28}\) blocks, respectively. These costs are obviously much less than those of finding \(\mathcal {K}_{(1)}\) and \(\mathcal {K}_{(2)}\).

5 Improved All-Subkeys Recovery Attack on SHACAL-2

This section presents the ASR attacks on SHACAL-2 with the function reduction techniques. Then, we propose a 42-round attack on SHACAL-2, based on the 41-round attack on SHACAL-2 [13].

5.1 Description of SHACAL-2

SHACAL-2 [13] is a 256-bit block cipher based on the compression function of SHA-256 [12]. It was submitted to the NESSIE project and selected in the NESSIE portfolio [22].

SHACAL-2 inputs the plaintext to the compression function as the chaining variable, and inputs the key to the compression function as the message block. First, a 256-bit plaintext is divided into eight 32-bit words \(A_{0}\), \(B_{0}\), \(C_{0}\), \(D_{0}\), \(E_{0}\), \(F_{0}\), \(G_{0}\) and \(H_{0}\). Then, the state update function updates eight 32-bit variables, \(A_{i}\), \(B_{i}\), ..., \(G_{i}\), \(H_{i}\) in 64 steps as follows:

$$\begin{aligned} T_1= & {} H_i \boxplus \varSigma _1(E_i) \boxplus Ch(E_i,F_i,G_i) \boxplus K_i \boxplus W_i, \\ T_2= & {} \varSigma _0(A_i) \boxplus Maj(A_i,B_i,C_i), \\ A_{i+1}= & {} T_1 \boxplus T_2, \ B_{i+1} = A_i, \ C_{i+1} = B_i, \ D_{i+1} = C_i, \\ E_{i+1}= & {} D_{i} \boxplus T_1, \ F_{i+1} = E_i, \ G_{i+1} = F_i, \ H_{i+1} = G_i, \end{aligned}$$

where \(K_i\) is the \(i\)-th step constant, \(W_{i}\) is the \(i\)-th step key (32-bit), and the functions \(Ch\), \(Maj\), \(\varSigma _0\) and \(\varSigma _1\) are given as follows:

$$\begin{aligned} Ch(X,Y,Z)= & {} XY \oplus \overline{X}Z, \\ Maj(X,Y,Z)= & {} XY \oplus YZ \oplus XZ, \\ \varSigma _0 (X)= & {} (X \ggg 2) \oplus (X \ggg 13) \oplus (X \ggg 22),\\ \varSigma _1 (X)= & {} (X \ggg 6) \oplus (X \ggg 11) \oplus (X \ggg 25). \end{aligned}$$

After 64 steps, the function outputs eight 32-bit words \(A_{64}\), \(B_{64}\), \(C_{64}\), \(D_{64}\), \(E_{64}\), \(F_{64}\), \(G_{64}\) and \(H_{64}\) as the 256-bit ciphertext. Hereafter \(p_{i}\) denotes the \(i\)-th step state, i.e., \(p_{i}\) \(=\) \(A_{i} || B_{i} || ... || H_{i}\).

The key scheduling function of SHACAL-2 takes a variable length key up to 512 bits as the inputs, then outputs 64 32-bit step keys. First, the 512-bit input key is copied to 16 32-bit words \(W_{0}\), \(W_{1}\), ..., \(W_{15}\). If the size of the input key is shorter than 512 bits, the key is padded with zeros. Then, the key scheduling function generates 48 32-bit step keys (\(W_{16}, ..., W_{63}\)) from the 512-bit key (\(W_{0}, ..., W_{15}\)) as follows:

$$ W_{i} = \sigma _{1}(W_{i-2}) \boxplus W_{i-7} \boxplus \sigma _{0}(W_{i-15}) \boxplus W_{i-16}, (16 \le i < 64), $$

where the functions \(\sigma _0(X)\) and \(\sigma _1(X)\) are defined by

$$\begin{aligned} \sigma _0(X)= & {} (X \ggg 7) \oplus (X \ggg 18) \oplus (X \gg 3),\\ \sigma _1(X)= & {} (X \ggg 17) \oplus (X \ggg 19) \oplus (X \gg 10). \end{aligned}$$

5.2 Function Reduction on SHACAL-2

In the round function of SHACAL-2, a round key \(W_i\) is inserted to the state \(T_i\) by an arithmetic addition operation. We show that the splice and cut framework is applicable by using the partial key linearization technique.

The computation of \(T_1\) is expressed as

$$ T_1 = (H_i \boxplus W_i) \boxplus \varSigma _1(E_i) \boxplus Ch(E_i,F_i,G_i) \boxplus K_i. $$

In a straight way, the computation of \((H_i \boxplus W_i)\) is not divided into two parts as \((HL_i \boxplus WL_i) || (HR_i \boxplus WR_i)\) due to the carry bit between these computations, where \(HL_i\) and \(WL_i\) denote the higher \(x\)-bits of \(H_i\) and \(W_i\), respectively, and \(HR_i\) and \(WR_i\) are the lower \((32 - x)\)-bits of \(H_i\) and \(W_i\). If \(HR_i\) is fixed to 0, it is equivalent to \((HL_i \boxplus WL_i) || (HL_i \oplus WR_i)\). Then, it allows us to independently compute these two parts without dealing with carry bits. Therefore, by using the splice and cut framework, 32 key bits of one round is divided into forward and backward computations as shown in Fig. 5.

However, we can not reduce the number of involved key bits by using an equivalent transform. It is because that the involved 32-bit key \(W_i\) is used at least eight times in the forward and backward directions. In order to fully control values in each state, more than \(512 (32 \times 8)\) bits of conditions are required.

Fig. 5.
figure 5

Splice and cut of SHACAL-2

5.3 Attacks on 42-Round SHACAL-2

We show that the splice and cut framework [4] is applicable to SHACAL-2 by using the key linearization technique. Then we extend the 41-round attack [16] by one more round. In particular, the splice and cut technique is done in the first round, and the higher 15 bits are move to the backward computation, and the lower 17 bits are move to the forward computation. Then we choose the lowest 1 bit of \(A_{17}\) as the matching point.

Forward Computation in \(\varvec{\mathcal {F}_{(1)}}\) : The lowest 1 bit of \(A_{17}\) can be computed from the 16-th state \(p_{16}\) and the lowest 1 bit of \(W_{16}\), since the other bits of \(W_{16}\) are not affected to the lower 1 bit of \(A_{17}\). Thus, the matching state \(S\) (the lowest 1 bit of \(A_{17}\)) is calculated as \(S = \mathcal {F}_{(1)}(P, \mathcal {K}_{(1)})\), where \(\mathcal {K}_{(1)} \in \) {the lower 17 bits of \(W_{0}, W_{1}, ..., W_{15}, \) the lowest 1 bit of \(W_{16}\}\) and \(|\mathcal {K}_{(1)}| = 498 (= 32 \times 15 + 1 + 17)\).

Backward Computation in \(\varvec{\mathcal {F}_{(2)}}\) : We utilize the following observation [16].

Observation 1

The lower \(t\) bits of \(A_{j-10}\) are obtained from the \(j\)-th state \(p_{j}\) and the lower \(t\) bits of three subkeys \(W_{j-1}\), \(W_{j-2}\) and \(W_{j-3}\).

From Observation 1, the matching state \(S\) (the lowest 1 bit of \(A_{17}\)) can be computed as \(S = \mathcal {F}^{-1}_{(2)}(C, \mathcal {K}_{(2)})\), where \(\mathcal {K}_{(2)} \in \) \(\{\)the higher 15 bits of \(W_{0}\), \(W_{27}, ..., W_{41}, \) the lowest \(1\) bits of \(W_{24}\), \(W_{25}\) and \(W_{26}\}\). Thus, \(|\mathcal {K}_{(2)}| = 498 (= 32 \times 15 + 1 \times 3 + 15)\).

Evaluation. The matching state \(S\) is the lowest 1 bit of \(A_{17}\), \(|\mathcal {K}_{(1)}| = 498\) and \(|\mathcal {K}_{(2)}| = 498\). Thus, using 996 chosen plaintext/ciphertext pairs (i.e. \(N = 244 \le (498 + 498) / 1\)), the time complexity for finding all-subkeys is estimated as

$$ C_{comp} = \max (2^{498}, 2^{498}) \times 996 + 2^{1344 - 996} = 2^{508}. $$

The required data is \(2^{25}(= 996 \times 2^{15})\) chosen plaintext/ciphertext pairs, since 15 bits of plaintext are not controlled in the backward computation when using the splice and cut technique. The required memory is \(2^{508} (= \min (2^{498}, 2^{498})\) \(\times \) \(996\)) blocks.

6 Conclusion

The concept of the ASR attack is quite simple, which recovers all-subkeys instead of the master key, but useful to evaluate the security of block cipher structures without analyzing key scheduling functions. Thus, it is valuable to study its improvements to design a secure block cipher. We first observed the function reduction technique, which improved the ASR attack and was originally applied to Feistel schemes. Then, with some improvements such as the repetitive ASR approach, we applied the function reduction to other block cipher structures including Lai-Massey, generalized Lai-Massey, LFSR-type and source-heavy generalized Feistel schemes.

As applications of our approach, we presented the improved ASR attacks on the 7-, 7-, 119-, 105-, 99-, and 42-round reduced FOX64, FOX128, KATAN32, KATAN48, KATAN64 and SHACAL-2. All of our results updated the number of attacked rounds by the previously known best attacks. We emphasize that our attacks work independently from the structure of the key scheduling function. In other words, strengthening the key scheduling function does not improve the security against our attack. It implies that our results give the lower bounds on the security of the target structures such as Lai-Massey scheme rather than the specific block ciphers against generic key recovery attack. Therefore, we believe that our results are useful for a deeper understanding the security of the block cipher structures.