Keywords

1 Introduction

The past decade in technology has been marked by the ever decreasing size of computing devices. This, in combination with their increasingly ubiquitous use e.g. as smart devices, wearable systems, as part of the Internet of Things [12], has enabled humans to perform everyday activities more efficiently. At the same time these new technologies have also created new security challenges.

An important problem today is the design of cryptographic algorithms that are both efficient and secure, have small memory footprint and are low-cost and easy to implement and deploy on multiple platforms. Finding an optimal compromise between these, often conflicting, requirements is the difficult area researched by the field of lightweight cryptography. The applications of lightweight cryptographic algorithms vary from mobile devices, through RFID tags to electronic locks and their importance is likely to continue increasing in the future.

To address the persistent need for secure and efficient lightweight primitives, numerous proposals have been made in the past few years. In the area of symmetric-key encryption some of the more prominent block ciphers that were proposed are: Present [6], Piccolo [17], Klein [9], Twine [19], Katan and Ktantan [7], LED [10], HIGHT [11] and CLEFIA [18].

Most recently, in June 2013, yet two more algorithms have been put forth by researchers from the National Security Agency (NSA) of the USA – the block ciphers Simon and Speck [4]. Compared to their predecessors, the latter two have very competitive performance, small memory footprint and beat most existing lightweight ciphers in terms of efficiency and compactness. Furthermore, the two designs are very simple and elegant. They are both built on the ARX philosophy [13, 20], using only basic arithmetic operations such as modular addition, XOR, bitwise AND and bit rotation.

Evidence of the performance and implementation advantages of Simon and Speck exists in the form of extensive results and comparisons to existing lightweight algorithms described in the design document [4]. However the latter does not provide any security evaluation of the two ciphers and no analysis of their cryptographic strength is given. Recently several external cryptanalytic results on Simon and Speck became available:  [13]. The first two in particular analyze the differential properties of the ciphers and describe key-recovery attacks on reduced round variants.

Table 1. Summary of attacks on Simon and Speck. All listed attacks are chosen-plaintext.

Our Contribution. In this paper we further investigate the differential behavior of block ciphers Simon and Speck. We apply a recently proposed technique for automatic search for differential trails in ARX ciphers called threshold search [5]. We find better differential trails on Simon32 and Simon48 than the ones reported by [2] and claimed to be the best and, we confirm the trail on Simon64. Improved trails that cover one round more than the previously reported best trails [1] on Speck32, Speck48 and Speck64 are found. We further extend the threshold search technique for finding differentials. With the new tool we improve the differentials on Simon32, Simon48 and Simon64 reported by [2] and we present new differentials on Speck32, Speck48 and Speck64. We use these new results to improve the currently best known attacks on several versions of Simon and Speck.

The second major contribution of the paper is an efficient algorithm for the computation of the differential probabilities (DP) of the bitwise AND operation – the single source of non-linearity in the round function of Simon. We describe algorithms for the computation of the exact DP of AND with independent inputs and rotationally dependent inputs (one input is equal to the rotation of the other one) as used in Simon. In addition, methods for computing the maximum DP over all inputs and over all outputs of the AND operation are also proposed. All described algorithms have linear time complexity in the word size. These algorithms are used in the threshold search and in the differential search tool for Simon.

Finally, we briefly comment on the strong differential effect in Simon – a property already noted in [3]. In addition we provide new insights into the clustering of differential trails that causes this effect. A summary of the main results from the search on trails and differentials is provided in Table 2. Note that in this table is mentioned a figure for the time complexity of the attacks on Simon32 and Simon64 described in [2] that we were not able to verify (Table 1). Footnote 1

Table 2. Summary of the best found differential trails and differentials in Simon and Speck;

The outline of the paper is as follows. We begin with Sect. 2 where the XOR differential probability of the AND operation is analyzed. Next in Sect. 3 are presented techniques for searching for trails and differentials in ARX algorithms. The block ciphers Simon and Speck are briefly described in Sect. 4. Full differential trails are presented in Sect. 5. Finally, in Sect. 6 we comment on the strong differential effect of Simon. Section 9 concludes the paper.

A few words on notation: with \(x_i\) is denoted the \(i\)-th bit of the \(n\)-bit word \(x\) (\(x_0\) is the LSB); \({\overline{x_i}}\) represents the modulo-2 complementation of \(x_i\) i.e. \({\overline{x_i}} \,=\, x_i \,\oplus \, 1\); the symbols \(\wedge \) and \(\vee \) denote respectively bitwise logical AND and OR operations; the left and right rotation of the bits of \(x\) by \(r\) positions is denoted respectively with \(x\,\lll \, r\) and \(x\,\ggg \, r\); \(|S|\) represents the cardinality of the set \(S\). The concatenation of the bit strings \(x\) and \(y\) is denoted by \(x | y\).

2 The XOR Differential Probability of AND

2.1 Independent Inputs

Definition 1

( \( {\mathrm {xdp}^{ \& }}\) with independent inputs). Let \(\alpha , \beta \) and \(\gamma \) be fixed \(n\)-bit XOR differences. The XOR differential probability (\(\mathsf{DP }\)) of the logical AND operation (\( \mathrm {xdp}^{ \& }\)) is the probability with which \(\alpha \) and \(\beta \) propagate to \(\gamma \) through the AND operation, computed over all pairs of \(n\)-bit inputs \((x,y)\):

$$ \begin{aligned} {\mathrm {xdp}^{ \& }}({\alpha }, {\beta } \rightarrow {\gamma }) = 2^{-2n} \cdot |\{(x,y) : ((x \oplus {\alpha }) \wedge (y \oplus {\beta })) \oplus (x \wedge y) = {\gamma }\}|. \end{aligned}$$
(1)

In the remaining of the text the acronym \(\mathsf{DP }\) will be used to denote XOR differential probability unless specified otherwise. When the input differences \(\alpha \) and \(\beta \) are independent, the \(\mathsf{DP }\) \(\mathrm {xdp}^{\wedge }({\alpha }, {\beta } \rightarrow {\gamma })\) can be efficiently computed according to the following theorem.

Theorem 1

For fixed \(n\)-bit XOR differences \(\alpha , \beta \) and \(\gamma \) the probability \( {\mathrm {xdp}^{ \& }}({\alpha }, {\beta } \rightarrow {\gamma })\) is equal to

$$ \begin{aligned} {\mathrm {xdp}^{ \& }}({\alpha }, {\beta } \rightarrow {\gamma }) = {2^{-n}}\cdot {\prod _{i=0}^{n-1}\left( (2\cdot ({\overline{\alpha }_i} \wedge {\overline{\beta }_i}\wedge {\overline{\gamma }_i})) \vee ({\overline{\overline{\alpha }_i\wedge \overline{\beta }_i}})\right) \wedge \overline{({\overline{\alpha }_i}\wedge {\overline{\beta }_i}\wedge \gamma _i)}}. \end{aligned}$$
(2)

Proof

Note that \( {\mathrm {xdp}^{ \& }}({\alpha }, {\beta } \rightarrow {\gamma }) = 0 \iff \exists i:~ 0 \le i < n:~ ({\overline{\alpha }_i}\wedge {\overline{\beta }_i}\wedge \gamma _i) = 1\). Therefore whenever the probability is zero, the term \((\overline{{\overline{\alpha }_i}\wedge {\overline{\beta }_i}\wedge \gamma _i)}\) evaluates to zero and hence the right hand-side of (2) is also zero. If the probability is non-zero and \(\alpha _i = \beta _i = \gamma _i = 0\) at bit position \(i\) then \(({\overline{\alpha }_i}\wedge {\overline{\beta }_i}\wedge {\overline{\gamma }_i}) = 1\) which is multiplied by the number of valid pairs \((x_i, y_i)\) (cf. Definition 1) i.e. \(4\) . If \(\alpha _i \ne \beta _i\) then exactly two pairs \((x_i,y_i)\) satisfy the differential at bit position \(i\) irrespective of the value of \(\gamma _i\). In this case \(({\overline{\overline{\alpha }_i\wedge \overline{\beta }_i}}) = 1\) and it is multiplied by the number of valid pairs \((x_i, y_i)\) which is \(2\). Therefore for non-zero probability, the product on the right-hand size of (2) is a multiple of \(2^{n}\). The latter cancels with the term \(2^{-2n}\) (cf. Definition 1) and so the final expression is multiplied by \(2^{-n}\).\(\square \)

Theorem 1 implies the following corollary.

Corollary 1

Given \(n\)-bit input differences \(\alpha , \beta \) and output difference \(\gamma \), the probability \( {\mathrm {xdp}}^{ \& }(\alpha , \beta \rightarrow \gamma )\) can be computed in \(\mathcal {O}(n)\) time.

Proof

Follows directly from Theorem 1.\(\square \)

2.2 Rotationally Dependent Inputs

Note that when the inputs to the \(\mathtt AND \) operation are dependent on each other, the \(\mathsf{DP }\) computed with Theorem 1 is not accurate. In particular, let the two inputs \(x\), \(y\) to \(\mathtt AND \) be such that \(y = (x \,\lll \, r)\). So, an input XOR difference \(\alpha \) applied to \(x\) will result into an input difference \((\alpha \,\lll \, r)\) to \(y\). Considering the dependencies between the input variables, the \(\mathsf{DP }\) in this case is defined as follows:

Definition 2

( \( {\mathrm {xdp}^{ \& }}\) with dependent inputs). For a fixed rotation constant \(r\) and \(n\)-bit input difference \(\alpha \), the \(\mathsf{DP }\) of the bitwise AND operation is defined as

$$ \begin{aligned}&{\mathrm {xdp}}^{ \& }(\alpha , (\alpha \lll r) \rightarrow \gamma ) \nonumber \\&\qquad = 2^{-n} \cdot \left| \{x : \big (x \wedge (x \lll r)\big ) \oplus \big ((x \oplus \alpha ) \wedge ((x \oplus \alpha ) \lll r) \big ) = \gamma \}\right| . \end{aligned}$$
(3)

In the following part of this section we describe a method for the computation of the probability \( {\mathrm {xdp}^{ \& }}({\alpha }, ({\alpha \,\lll \, r}) \rightarrow {\gamma })\) (3) in linear time in the word size \(n\). We begin by stating several necessary definitions and lemmas.

A cycle of length \(t\) is a special subset of the set of indices \(\mathcal {I} = \{0, 1, \ldots , n-1\} (=\mathbf {Z}_n)\) indicating the bit positions of an \(n\)-bit word \(x\) (index \(0\) denotes the LS bit of \(x\)). More formally:

Definition 3

A cycle of length \(t\) is a set of bit indices \(C_i = \{i, i+r, i+2r, \ldots , i+(t-1)r\} \subseteq \mathcal {I}\), where \(t \in \mathbf {N}\) is such that \(i + tr = i \mod (n)\) and \(i\) is the smallest element of \(C_i\).

In a cycle \(C_i\) of length \(t\), \(i + (s-1)r\) is said to be preceding element to \(i + sr\) (\(1 < s < t\)). Moreover, \(i + (t-1)r\) is preceding element to \(i\). Since each \(C_i\) for \(i \in \mathcal {I}\) is an equivalence class, \(\mathcal {I}\) can be partitioned into disjoint cycles:

Lemma 1

For fixed \(r \in \mathcal {I}\), \(C_i \cap C_j = \emptyset \) iff \(i \ne j: ~0 \le i, j < n\) and, \(\mathcal {I} = \bigcup {C_i}\).

Example 1

For \(n = 8\) and \(r = 2\) there are exactly \(2\) cycles each having length \(t = 4\): \(C_0 = \{0, 2, 4, 6\}\) and \(C_1 = \{1, 3, 5, 7\}\).

By Definition 2, computing the probability \( {\mathrm {xdp}^{ \& }}\) is equivalent to counting the number of values \(x\) that satisfy the differential \(({\alpha }, ({\alpha \lll r}) \rightarrow {\gamma })\). For simplicity, let \(r\) be such that the set of bit indices \(\mathcal {I}\) of \(x\) has a single cycle \(C_0 = \{ 0, r, 2r, \ldots ,n-r\}\). Within this cycle the bits of the input and output differences are represented as a sequence of \(3\)-tuples in the following way:

$$\begin{aligned} (\alpha _{0}, \alpha _{(n - r)}, \gamma _{0}), ~(\alpha _{r}, \alpha _{0}, \gamma _{r}), ~(\alpha _{2r}, \alpha _{r}, \gamma _{2r}), \ldots , (\alpha _{(n - r)}, \alpha _{(t-1)r}, \gamma _{(n - r)}). \end{aligned}$$
(4)

Note that in sequence (4), for each \(3\)-tuple the index of the second element is a preceding index of the index of the first element.

Example 2

Let \(n = 5\) and \(r = 2\). Consider the input differences \(\alpha = \alpha _4 \alpha _3 \alpha _2 \alpha _1 \alpha _0\) and \((\alpha \,\lll \, 2) = \alpha _2 \alpha _1 \alpha _0 \alpha _4 \alpha _3\) and the output difference \(\gamma = \gamma _4 \gamma _3 \gamma _2 \gamma _1 \gamma _0\). In this case there is a single cycle \(C_0\) of length \(t = 5\): \(C_0 = \{0, 1, 2, 3, 4\}\). The corresponding sequence of \(3\)-tuples is:

$$\begin{aligned} (\alpha _0, \alpha _3, \gamma _0), ~(\alpha _2, \alpha _0, \gamma _2), ~(\alpha _4, \alpha _2, \gamma _4), ~(\alpha _1, \alpha _4, \gamma _1), ~(\alpha _3, \alpha _1, \gamma _3). \end{aligned}$$
(5)

The difference \(3\)-tuples in (4) are satisfied by a number of possible bit assignments of \(x\) at the corresponding positions: \((x_{0}, x_{n-r}), ~(x_{r}, x_{0}), ~(x_{2r}, x_{r}), \ldots , (x_{n-r}, x_{(t-1)r})\). In order to efficiently count the number of such assignments we use a variant of the technique proposed in [15] for the computation of the \(\mathsf{DP }\) of modular addition and XOR.

Any \(2\)-tuple of bits of the form \((x_{sr}, x_{(s-1)r})\) can have \(4\) values \(\{(0,0), (0,1), (1,0), (1,1)\}\), where (\(0 \le s \le n-1\)). These are viewed as a nodes of a graph. In total, for the full word length \(n\) the graph has \(4n\) nodes. A valid assignment of two consecutive \(2\)-tuples \((x_{sr}, x_{(s-1)r})\) and \((x_{(s+1)r}, x_{sr})\), (\(0 \le s < n-1\)) is represented as a directed edge between the corresponding nodes. In this way we can construct a directed acyclic graph (DAG) composed of \((n-1)\) edge-disjoint bipartite subgraphs. Each bipartite subgraph is formed by the nodes from two consecutive \(2\)-tuples of bits of \(x\) and the edges between them. A valid path from an initial node \((x_0, x_{n-r})\) to a final node \((x_{n-r}, x_{n-2r})\) in the DAG corresponds to a value of \(x\) that satisfies the differential \((\alpha ,(\alpha \,\lll \, r) \rightarrow \gamma )\). A path is said to be valid iff the initial and final nodes \((x_{0}, x_{n-r})\) and \((x_{n-r}, x_{n-2r})\) are consistent i.e. the value assigned to \(x_{n-r}\) in both nodes is the same.

The DAG constructed as explained above is represented as a sequence of \(4 \,\times \, 4\) adjacency matrices, each corresponding to one bipartite subgraph. Computing the probability \(\mathrm {xdp}^{\wedge }\) is then equivalent to counting the number of valid paths in the DAG. This can be performed in linear time in the word size by a sequence of \((n-1)\) multiplications of adjacency matrices. If the number of such paths is \(N\), the final probability \( {\mathrm {xdp}^{ \& }}({\alpha }, {\beta } \rightarrow {\gamma })\) is \(\frac{N}{2^n}\). This process is further illustrated with the Example 3.

In case of more than one cycle the described process can be performed independently for each cycle \(C^{j}, 0 \le j < m\) due to the fact that all cycles are disjoint (cf. Lemma 1). Let \(N_j\) be the number of paths in the DAG for the \(j\)-th cycle. Then the \(\mathsf{DP }\) is given by \(\frac{\prod _{1}^{m} N_j}{2^{n}}\).

Example 3

Assume the same setting as in Example 2: \(n = 5\), \(r = 2\) and let \(\alpha = 00110_{2}\) and \(\gamma =00000_{2}\). Consider the resulting sequence of \(3\)-tuples (5). In the DAG (Fig. 1), the dependency between the bits of \(x\) corresponding to two consecutive \(3\)-tuples must be satisfied. For example, an edge between \((x_0, x_3) = (0, 1)\) corresponding to (\(\alpha _0, \alpha _3, \gamma _0) = (1, 0, 0)\) and \((x_2, x_0) = (1, 0)\) corresponding to \((\alpha _2, \alpha _0, \gamma _2) = (0, 1, 0)\) is drawn, because \(x_0 = 0\) for both the nodes. However there is no edge between \((x_2, x_0) = (0, 0)\) and \((x_4, x_2) = (0, 1)\) since, \(x_0\) is not equal for both the nodes.

Fig. 1.
figure 1

DAG used in the computation of \( {\mathrm {xdp}}^{ \& }(\alpha , (\alpha \lll r) \rightarrow \gamma )\) for \(n = 5\), \(r = 2\), \(\alpha = 00110_{2}\), \(\gamma =00000_{2}\). Every path composed of thick edges is a valid path and hence a valid assignment of bits of \(x\). The fading nodes denote the bit assignments of \(x\) which do not satisfy the input output difference

A valid path from an initial node (corresponding to the first \(3\)-tuple \((\alpha _0, \alpha _3, \gamma _0)\) in the sequence (5)) to a final node (corresponding to the last \(3\)-tuple \((\alpha _3, \alpha _1, \gamma _3)\)) in this graph is equivalent to a value of \(x\) that satisfies the differential. A valid path implies that the initial and final nodes are consistent with each other. For example, no path from the initial node \((x_0, x_3) = (0, 1)\) is valid, because, all final node have \(x_3 = 0\). Since the total number of valid paths in the graph is \(N = 4\) the \(\mathsf{DP }\) is \(\frac{4}{2^5} = 0.125\).

The method for the computation of the probability \( {\mathrm {xdp}^{ \& }}({\alpha }, {\beta } \rightarrow {\gamma })\) described above supports the following proposition.

Proposition 1

For fixed \(n\)-bit differences \(\alpha \) and \(\gamma \), the probability \( {\mathrm {xdp}^{ \& }}({\alpha }, ({\alpha \lll r}) \rightarrow {\gamma })\) can be computed in \(\mathcal {O}(n)\) time.

Impossible Input-Output Difference. For a given (\(\alpha , \gamma \)) an impossible difference can be of two types. Any input/output difference which leads to a difference 3-tuple (\(\alpha _i, \alpha _{i-r}, \gamma _i\)) = (\(0, 0, 1\)), is an impossible input/output difference. The other types of input difference can be detected while computing the probability of the corresponding difference. Note that in the corresponding DAG, a path can be invalid even if every bipartite directed subgraph valid, e.g. (\(\alpha , \gamma \)) = (\(11111_2, 00000_2\)). The following DAG shows this case.

Fig. 2.
figure 2

DAG used in the computation of \( {\mathrm {xdp}}^{ \& }(11111_2, (11111_2 \lll 2) \rightarrow 00000_2)\). Both the paths, composed of thick edges and dashed edges, are invalid path, since there is contradiction in the bit value \(x_3\). However, each directed bipartite subgraph is independently valid.

Proposition 2

For a fixed \(n\)-bit input difference \(\alpha \) and rotation \(r\),

$$ \begin{aligned} \mathrm {DP}_{\mathrm {max}}(\alpha )=\mathrm {\mathrm {max}}_{\gamma } ~{{\mathrm {xdp}}^{ \& }(\alpha , \alpha \lll r \rightarrow \gamma )} \end{aligned}$$
(6)

can be computed in \({\mathcal {O}}(n)\) time.

Finally, we note that the approach described above bears some similarity to the technique proposed in [15] for the computation of the \(\mathsf{DP }\) of modular addition and XOR. Similarly to [15] we also map the problem of computing differential probabilities to the well-studied problem in graph theory of counting the number of paths in a graph. Apart from this similarity however, we would like to stress that the described method is fundamentally different than [15]. In the latter the nodes of the graph represent information that is propagated over the bit positions (namely, the carries and borrows resulting from the modular addition) and the edges represent the actual values of the pairs. In our case, the nodes of the graph represent the values of the pairs, while the edges describe the valid connections between the bits of those values so that the correct dependence due to the rotation operation is preserved.

3 Automatic Search for Trails and Differentials

3.1 Threshold Search

In [14] Matsui proposed a practical algorithm for finding the best differential trail for the DES block cipher. Given the best trail on \(i\) rounds and an over-estimation of the best probability for \(i+1\) rounds the algorithm finds the best trail on \(i+1\) rounds. Starting from \(i = 1\) these steps are repeated recursively until \(i+1 = n\). In the process, the differential probabilities of the non-linear components of the target cipher (the S-boxes in the case of DES) are obtained from a pre-computed difference distribution table (DDT). The differentials for the \(i\)-th round are then processed in sorted order by probability. For each, the recursion proceeds to round \(i+1)\) only if the estimated probability of the trail for \(n\) rounds is equal to- or greater than the initial estimate.

Recently, in [5] a variant of Matsui’s algorithm is proposed which is applicable to the class of ARX ciphers. What is special about the latter is that they do not have S-boxes. Instead they rely on basic arithmetic operations such as addition modulo \(n\) to achieve non-linearity. Computing a full DDT for the modular addition operation would require \(4 \,\times \, 2^{3n}\) bytes of memory and is therefore impractical for \(n > 16\). To address this, in [5] a partial DDT (pDDT) rather than the full DDT is computed. A pDDT contains (a fraction of) all differentials that have probability above a fixed probability threshold (hence the name – threshold search).

Since some (possibly many) differentials are missing from the initial (also called primary) pDDT, at some point during the search it is likely that for a given input difference the algorithm will require a matching differential that is not present in the primary pDDT. Such differentials are computed on-demand and are stored in a secondary pDDT maintained dynamically during the search.

In order to prevent the size of the secondary pDDT from exploding while at the same time keeping the probability of the constructed trails high, [5] further introduce the notion of highways and country roads – resp. high and low probability differentials (w.r.t. the fixed threshold). Every differential from the primary pDDT is a highway while every differential from the secondary pDDT is a country road.

To further control the size of the country roads table, additional restrictions on the considered differences can be added. For example, it may be required that every country road at given round \(i\) is such that there is at least one transition at round \(i+1\) that is a highway. This reduces the number of possible country roads while at the same time ensures that the considered paths have relatively high probability. This condition has been applied in the trail search for Simon. Another restriction can be on the Hamming weight of the considered differences. Such restriction has been applied in the differential search on Speck.

Several parameters control the performance of the threshold search technique. The most important ones are the probability threshold, which determines which differentials are considered as highways and the maximum size of the primary pDDT (note that it may be infeasible to compute and/or store all differentials that have probability above the threshold). The probability threshold influences the probability of the final trail: the lower the threshold, the more paths are considered and hence the more likely to find a high probability trail. At the same time, with the increase of the number of explored paths, the complexity of the algorithm also grows and hence it takes longer to terminate. The maximum size of the primary pDDT determines the precomputation time and the memory requirements for the algorithm.

3.2 Extension to Differentials

We further extend the method outlined above to the case of differentials. Given the best trail found by the threshold search and the corresponding array of best found probabilities for each round, a differential search proceeds according to the above strategy but always starting from the same input difference (corresponding to the best found trail). At every round are explored only paths whose estimated probabilities are by at most a factor \(\varepsilon \) away from best probability (e.g. \(\varepsilon = 2^{-15}\)). For example, let \(B_i: 1 \,\le \, i \,\le \, n\) be the probabilities of the best found differentials resp. for \(1, 2, \ldots , n\) rounds computed with the threshold search. Denote with \(p_1, p_2, \ldots , p_{r-1}\) the probabilities of a partially constructed trail up to round \(r - 1\). At round \(r\) the differential search will explore all transitions that have probability \(p_r \ge {(\varepsilon B_{n})}/{(p_1 \ldots p_{r-1} B_{n - r})}\). A pseudocode of this procedure applied to Simon is listed in Algorithm 1.

figure a

Note that a somewhat similar branch-and-bound approach has been applied by [13] to search for differentials in Simon. The main difference is that according to the cited technique, at every round is maintained an array of the best differentials encountered so far ranked by probability. The search proceeds to the next round by considering the top \(N\) such differentials.

In our approach instead of storing intermediate differentials, we prune the search tree by limiting the search to an \(\varepsilon \) region within the best found probability, since the latter is already known from the threshold search.

Note that although the proposed technique searches for differentials starting with best trail found with the threshold search, it can easily be modified to search for multiple input and output differences, while keeping track of the best one. Finally, in order to improve the efficiency, the differential search can be further parametrized by limiting the maximum Hamming weight of the differences.

4 Description of SIMON and SPECK

The Simon and Speck families of lightweight block ciphers are defined for word sizes \(n = 16,24,32,48\) and \(64\) bits. The key is composed of \(m\) \(n\)-bit words for \(m = 2,3,4\) (i.e. the key size \(mn\) varies between \(64\) and \(256\) bits) depending on the word size \(n\). The block cipher instances corresponding to a fixed word size \(n\) (block size \(2n\)) and key size \(mn\) are denoted by Simon \(2n/mn\) and Speck \(2n/mn\).

Fig. 3.
figure 3

SIMON round function

Fig. 4.
figure 4

SPECK round function

Block cipher Simon has Feistel structure and its round function under a fixed round key \(k\) is defined on inputs \(x\) and \(y\) as:

$$\begin{aligned} R_k(x, y) = ((y \oplus f(x) \oplus k), x). \end{aligned}$$
(7)

The function \(f(\cdot )\) is defined as \(f(x) = ((x \lll 1) \wedge (x \lll 8)) \oplus (x \lll 2)\), where the symbol \(\wedge \) denotes the logical AND operation.

Block cipher Speck has structure similar to Threefish – the block cipher used in the hash function Skein [8]. Its round function under a fixed round key \(k\) is defined on inputs \(x\) and \(y\) as:

$$\begin{aligned} R_k(x, y) = (f_k(x,y),~ f_k(x,y) \oplus (y \lll \beta )), \end{aligned}$$
(8)

where the function \(f_k(\cdot ,\cdot )\) is defined as \(f_k(x, y) = ((x \ggg \alpha ) + y) \oplus k\). The rotation constants are \(\alpha = 7, \beta = 2\) for block size \(32\) bits and \(\alpha = 8, \beta = 3\) for all other block sizes. Although Speck is not a Feistel cipher itself, it can be represented as a composition of two Feistel maps as described in [4]. The round functions of Simon and Speck are shown in Figs. 3 and 4 respectively. The number of rounds, block size and key size of the block ciphers are summarized in Tables 3 and 4.

Table 3. Parameters for Simon
Table 4. Parameters for Speck

5 Application to SIMON and SPECK

The trails obtained by using the threshold search technique and the differentials found with differential search tool (both described in Sect. 3) are presented in this section. The best found trails for Simon and Speck are shown respectively on Tables 5 and 6. In the tables, \(\sum _{r}\mathrm {log}_2 p_r\) represents the probability of a single trail obtained as the sum of the probabilities of its transitions; \(p_{\mathrm {diff}}\) is the probability of the corresponding differential and \(\#{\mathrm {trails}}\) is the number of trails clustered in the differential; \(\mathrm {max~HW}\) is the maximum Hamming weight allowed for the differences during the search; \(p_{\mathrm {thres}}\) is the probability threshold used in the threshold search algorithm and pDDT denotes the number of elements in the partial DDT.

Note that all trails shown in Tables 5 and 6 were found using the technique described in Sect. 3 by starting the search from the top round and proceeding downwards. The only exception is the trail on Speck48. Since this trail begins with a very low probability transition, when starting the search from the first round, it was computationally feasible to construct the shown trail only up to round 6. The full trail on 11 rounds shown in Table 6 was found by starting the search from a middle round (round 6) as has also been done in [1].

6 Differential Effect in SIMON

The clustering of multiple trails satisfying the same input/output difference (differential effect) in Simon can be visualized by the digraph in Fig. 9. It depicts a cluster of more than \(275\,000\) trails satisfying the \(21\) round differential \((\mathtt 4000000 , \mathtt 11000000 ) \xrightarrow {21R} (\mathtt 11000000 , \mathtt 4000000 )\). The thickness of an edge in the digraph is proportional to the probability of the corresponding input and output difference connected by this edge.

An interesting property clearly visible in the digraph in Fig. 9 is that it is composed of multiple smaller subgraphs positioned at alternate levels. Each such subgraph represents a biclique. Clearly, the bigger the number and size of such bicliques, the stronger the differential effect would be and hence the larger the probability of the differential. Therefore, the ability to obtain good estimation of the probability of a given differential for Simon is intimately related to the ability to identify and characterize such complete bipartite subgraphs. Thus we take a closer look into those special structures below.

Fig. 5.
figure 5

Example of a bipartite subgraph embedded in the differential (graph) of Simon32.

Table 5. Differential trails for Simon32, Simon48 and Simon64.
Table 6. Differential trails for Speck32, Speck48 and Speck64.

In Fig. 5 is shown an example of a complete bipartite subgraph (biclique) similar to the ones composing the digraph in Fig. 9. Note that each node has the same left input difference \({\varDelta }_L\) due to the Feistel structure of Simon.

Consider the pair of left and right input differences (\({\varDelta }_L^{i}, {\varDelta }_R^{i}\)) = (\(\mathtt 11 , \mathtt 106 \)) (hexadecimal values). Through the non-linear component \(f(x) = (x \lll 1) \wedge (x \lll 8)\) of the round function, the difference \({\varDelta }_L^{i} =\) 11 propagates to a set of output differences. This set has the form \(\nabla = \mathtt 000* \) \(\mathtt 000* \) \(\mathtt 00*0 \) \(\mathtt 00*0 \), where \(*\) can take values \(0/1\). Note that for some assignments of the \(*\) bits, the resulting difference may have zero probability as was explained in Sect. 2.2, Fig. 2. For \(\nabla = \{\) 0122, 0102, 0120 \(\}\) three distinct output differences \({\varDelta }_L^{i+1}\) from one round of Simon are produced. They are shown as the three lower level nodes in Fig. 5 and are obtained as \(\nabla \oplus (({\varDelta }_L^{i} \lll 2) \oplus {\varDelta }_R^{i}) \,=\, \nabla \oplus (\mathtt 44 ) \oplus {\varDelta }^i_R\).

Another node with the same input difference \({\varDelta }_L^i\) to the round function, but with different right difference \({\varDelta }^i_R\) e.g. (\({\varDelta }_L^{i}, {\varDelta }_R^{i}\)) = (\(\mathtt 11 , \mathtt 104 \)) (see Fig. 5) produces a corresponding set of output differences \(\nabla '\), which may or may not have common elements with \(\nabla \) in general. For example, in this case \(\nabla ^\prime = \{\) 0100, 0120, 0122 \(\}\) produced by the node (\(\mathtt 11 , \mathtt 104 \)). In either case though, \(\nabla \) and \(\nabla '\) may still produce the same set of output differences (\({\varDelta }_L^{i+1}, {\varDelta }^{i+1}_R\)). When this happens then a biclique is formed. This is shown in Fig. 5 where both \(\nabla \) and \(\nabla '\) result in the same set of output differences \(({\varDelta }_L^{i+1}, {\varDelta }^{i+1}_R) \in \{(\mathtt 4 ,\mathtt 11 ), (\mathtt 26 ,\mathtt 11 ), (\mathtt 6 ,\mathtt 11 )\}\).

In general, when the sets \(\nabla \), \(\nabla ^\prime \) produced from two different pairs of input differences have high (and possibly equal) probabilities, the complete subgraphs that are formed as a result, have thick edges (corresponding to high probability). Such subgraphs contribute to the clustering of differential trails in Simon.

Note that the described subgraphs may not be formed for all possible elements in \(\nabla \) of an arbitrary node since, as already mentioned, some of them may propagate with \(0\) probability through the non-linear component \(f\). Furthermore, because the complete bipartite subgraphs depend on the input differences, they can not occur at arbitrary positions in the digraph (Fig. 9). The frequent occurrence of such special subgraph structures in Simon in large numbers is the main cause for the strong differential effect observed experimentally using the tool for differential search.

7 Key Recovery Attack on Simon32

In this section we describe a key recovery attack on Simon32 with \(64\) bit key. The input difference to Round-\(r\) is denoted as \({\varDelta }^{r-1}\) and, bit positions \(i_1,i_2, .. ,i_t\) of \(x\) as \(x[i_1,i_2,..,i_t]\). Also \(K^r\) denotes the round key for the Round-\((r+1)\) and \(\mathcal {E}_r\) denotes the encryption function used with \(r\) rounds.

7.1 Attack on 19 Rounds

To attack 19 rounds of SIMON32 we add 2 rounds on top and 4 rounds at the bottom of a set of four 13 round differentials. For this attack consider the following \(13\) round differentials

$$\begin{aligned} \mathcal {D}_1:(\mathtt 2000 , \mathtt 8000 ) \rightarrow (\mathtt 2000 , \mathtt 0 )\\ \mathcal {D}_2:(\mathtt 4000 , \mathtt 0001 ) \rightarrow (\mathtt 4000 , \mathtt 0 )\\ \mathcal {D}_3:(\mathtt 0004 , \mathtt 0010 ) \rightarrow (\mathtt 0004 , \mathtt 0 )\\ \mathcal {D}_4:(\mathtt 0008 , \mathtt 0020 ) \rightarrow (\mathtt 0008 , \mathtt 0 ) \end{aligned}$$

each having probability \(\approx \)2\(^{-28.5}\). The truncated difference at the beginning of Round-0, for the above mentioned differentials look as following:

$$\begin{aligned} (\mathtt 00*0 0000 1*00 0000 , \mathtt **00 001* *0*0 0000 )\\ (\mathtt 0*00 0001 *000 0000 , \mathtt *000 01** 0*00 000* )\\ (\mathtt 0001 *000 0000 0*00 , \mathtt 01** 0*00 000* *000 )\\ (\mathtt 001* 0000 0000 *000 , \mathtt 1**0 *000 00** 0000 ) \end{aligned}$$

Observing the active and inactive bit positions of the above truncated differentials we can construct a set \(2^{25}\) plaintexts where each \(\mathcal {P} = (P_L, P_R) \in \mathcal {P}\) has \(9\) bits, e.g. \(P_L[0, 1, 4, 5, 9, 10, 15], P_R[1, 2]\) fixed to an arbitrary value. We can identify \(2^{25}\) pairs of plaintexts (for each differential) from \(\mathcal {P}\) so that the pairs satisfy the corresponding \(({\varDelta }_L^2, {\varDelta }_R^2)\) after two rounds of encryption. For this we need to guess the following round-key bits – \((\mathcal {D}_1) K^0[8,6]\), \((\mathcal {D}_2) K^0[9,7], (\mathcal {D}_3) K^0[13,11], (\mathcal {D}_4) K^0[14,12]\). Hence with \(4\) key guesses, \(4\) sets of \(2^{23}\) pairs of plaintexts corresponding to a differential \(\mathcal {D}_i\) can be identified (where each pair in a set follows the top 2-round differential obtained from \(\mathcal {D}_i\)). Note that by varying some fixed bits of plaintexts in \(\mathcal {P}\) we can identify \(2^{30.5}\) pairs for each differential and for each (2 bits) key guess.

Each set of identified \(2^{30.5}\) pairs of plaintexts is filtered by verifying the fixed bits of the corresponding truncated difference \({\varDelta }^{18}\). This reduces the number of pairs to \(2^{30.5-18} = 2^{12.5}\) for each differential. In order to partially decrypt each pair of ciphertext it is necessary to guess the following key bits (and linear combinations of key bits) from last \(3\) rounds.

$$\begin{aligned} \mathcal {D}_1^K&= \{ K^{18}, K^{17}[3,5-8,12,14], K^{16}[6] \oplus K^{17}[4], K^{16}[4] \oplus K^{17}[2]\}\end{aligned}$$
(9)
$$\begin{aligned} \mathcal {D}_2^K&= \{K^{18}, K^{17}[4,6-9,13,15], K^{16}[7] \oplus K^{17}[5], K^{16}[5] \oplus K^{17}[3]\}\end{aligned}$$
(10)
$$\begin{aligned} \mathcal {D}_3^K&= \{ K^{18}, K^{17}[8,10-13,1,3], K^{16}[11] \oplus K^{17}[9], K^{16}[9] \oplus K^{17}[7]\}\end{aligned}$$
(11)
$$\begin{aligned} \mathcal {D}_4^K&= \{ K^{18}, K^{17}[9,11-14,2,4], K^{16}[12] \oplus K^{17}[10], K^{16}[10] \oplus K^{17}[8]\} \end{aligned}$$
(12)

In each differential we need to guess \(25\) bits (and linear combination of bits) from last 3 round-keys. So, for any differential \(\mathcal {D}_i\) it is necessary to guess: \(25+2\) (from \(K^0\)\(= 27\) bits. For key recovery attack let us first consider the two differentials \(\mathcal {D}_1\) and \(\mathcal {D}_2\). Note that there are 19 bits common between \(\mathcal {D}_1^K\) and \(\mathcal {D}_2^K\). For detecting the correct key we maintain an array of counters of size \(2^{27}\) for each \(\mathcal {D}_1\) and \(\mathcal {D}_2\). A counter is incremented when it is correctly verified using a partially decrypted pair of plaintexts by comparing with corresponding \({\varDelta }^{15}\). For each differential \(\mathcal {D}_1\) and \(\mathcal {D}_2\), we expect to have \((2^{27} \times 2^{12.5})/2^{14} = 2^{25.5}\) increments. We expect approximately \(4\) correct pairs for each differential and the probability of a counter being incremented is \(1/2^2\). So, it is expected to have \((\frac{1}{4})^4 \times 2^{25.5} = 2^{17.5}\) counters with 4 increments for each case. Let these two sets of counters be \(T_1\) and \(T_2\). Since \(\mathcal {D}_1^K\) and \(\mathcal {D}_2^K\) has \(19\) common key bits, after combining \(T_1\) and \(T_2\) we expect to obtain \(2^{17.5} \times (2^{17.5}/2^{19}) \,=\, 2^{16}\) candidates for \(19+6+6+4 \,=\, 35\) bits. Let us denote this set of counters as \(T^\prime \) (Fig. 6).

Next we partially decrypt \(2^{12}\) pairs of ciphertexts corresponding to \(\mathcal {D}_3\) to verify the difference \({\varDelta }^{15}\) for each \(27\) bit key guess. As described previously, we maintain an array of \(2^{27}\) counters. A counter is incremented when it is verified correctly by a pair of ciphertexts. The expected number of counters having value \(4\) is \(2^{17.5}\). Let us denote this set of counters as \(T_3\). \(\mathcal {D}^K_3\) and \(\mathcal {D}^K_1 \cup \mathcal {D}^K_2\) has \(20\) common round-key bits. Hence, combining \(T_3\) and \(T^\prime \) we expect to get \(2^{16} \times (2^{17.5}/2^{20}) = 2^{13.5}\) candidates for \(35+(25+2-20) \,=\, 42\) round-key bits (out of which \(36\) bits correspond to last 3 round keys).

Using the fourth differential \(\mathcal {D}_4\) in a similar way we obtain \(2^{9}\) candidates for \(42+(25-22)+2 \,=\, 47\) bits of round-keys, from which we can determine \(39\) bits of last \(3\) round-keys.

In order to recover the key we should know all the last 4 round-keys. For the remaining \(64-39 = 25\) bits of last four round-keys we use exhaustive search. Hence the total number of key guesses is \(2^{9+25}\,=\, 2^{34}\) (Fig. 7).

Attack Complexity. The time complexity for encrypting plaintexts is \(2^{31.5}\). In the key guessing phase \(2^{12}\) filtered pairs are decrypted for last 4 rounds for each \(2^{25}\) key guesses. This is done for each differential. The partial decryption of ciphertext pairs (and increment of the counters) can be done in steps with partial key guess at each step of the last four rounds. This is done by filtering (due to the fixed bits of the truncated differences) at the beginning of Round-16 to Round-18. The complexity for this process is given as:

$$\begin{aligned} 4 \cdot 4 \cdot ( 2^{12.5} \cdot 2^{16} + 2^{12.5} \cdot 2^9 \cdot 2^7 + 2^{12.5} \cdot 2^2 \cdot 2^2) \cdot \frac{1}{19} \approx 2^{33} \end{aligned}$$
(13)

For identifying the \(2^{30.5}\) pairs with \(4\) key guesses for each differential requires \(2^{31.5} \cdot 2^{2} = 2^{33.5}\) two round encryptions. The complexity due to this part is \(2^{33.5} \times 4 \times (2/19) \approx 2^{32}\). Hence the total complexity of the attack is \(\approx 2^{34}\).

We also show attacks on round-reduced Simon48 and Simon64. The details of these attacks are described in Appendixes (C and D).

Fig. 6.
figure 6

Truncated difference (in binary notation) in the last 4 rounds of the \(18\) and \(19\) round key-recovery attacks on Simon32.

Fig. 7.
figure 7

Top 2 rounds in the attack of Simon32

Fig. 8.
figure 8

Differential trail for Round-10 and Round-11 in SPECK32

Fig. 9.
figure 9

Clustering of multiple trails satisfying the \(21\) round differential \((\mathtt 4000000 , \mathtt 11000000 ) \rightarrow (\mathtt 11000000 , \mathtt 4000000 )\) in Simon64. The thickness of the edges between two nodes is proportional to the number of right pairs that follow the differential. The graph depicts more than \(275\,000\) differential trails in total.

8 Key Recovery Attack on Speck32

In this section we describe a chosen plaintext (CP) attack on 11 rounds of Speck32 using the same notations as in Sect. 7. To attack Speck32 we use the 9 round differential trail with probability \(2^{-30}\) given in Table 6. We add one round (Round-1) at the top of the trail and one round at the bottom (Round-11) of the trail to cover 11 rounds in total. If we encrypt \(2^{30}\) pairs of plaintexts such that (\({\varDelta }_L^1 = \mathtt 8054 ,~ {\varDelta }_R^1 = \mathtt A900 \)), then it is expected to produce \(2^{30} \,\times \, \frac{1}{2^{30}} \,=\, 1\) pair of plaintext satisfying the input/output differences at Round-2 and Round-10 and, \(2^{30} \,\times \, \frac{1}{2^{28}} \,=\, 4\) pairs of plaintexts satisfying the input/output differences at Round-2 and Round-9 (Fig. 8).

The key recovery attack is performed according to the following steps:

  1. 1.

    Filtering: The least significant \(7\) bits of difference after the modular addition at Round-10 are always \(\mathtt 100 0000 \). This implies that \({\varDelta }^{10}\) should be of the form \(\mathtt **** **** *100 0000 \), where \(\mathtt * \) denotes unknown bit values. Hence \(2^{30}\) pairs of plaintext/ciphertexts can be filtered by unrolling the output difference of ciphertexts and verifying the 7 bits of \({\varDelta }^{10}\). This reduces the number of pairs to \(2^{30-7} \,=\, 2^{23}\).

  2. 2.

    Partial Key Guessing: For the filtered pairs, we guess all \(16\) bits of \(K^{10}\) and \(11\) bits of \(K^9\)(e.g. \(K^9[5-15]\)) and, one carry bit at bit position \(5\) (in the modular addition at Round-10). For each of these \(2^{28}\) partial key (and carry bit) guess we keep a counter. A counter is incremented if after partially decrypting (last 2 rounds) a pair of ciphertexts satisfies the difference (\({\varDelta }^9_L,{\varDelta }_R^9\)) = (\(\mathtt 8040 , \mathtt 8140 \)). This will result in \((2^{28} \times 2^{23})/2^{25} \,=\, 2^{26}\) increments of all the counters. Probability of a counter getting incremented is \(2^{26}/2^{28} \,=\, \frac{1}{2^2}\) and, 4 pairs are expected to satisfy the condition at the end of Round-9. Hence, number of counters incremented by 4 are \(2^{26} \times (\frac{1}{2^2})^4 \,=\, 2^{18}\).

  3. 3.

    Exhaustive Search: For the remaining \(64-27 = 37\) bits from the last rounds keys \(K^9, K^8, K^7\) we use exhaustive search.

Attack Complexity. The complexity for decrypting of \(2^{23}\) ciphertext pairs for each \(2^{28}\) guesses of key bits and carry bit is, \((2^{28} \cdot 2^{23}) \cdot \frac{1}{11} \approx 2^{47}\). The total number of key guesses is \(2^{18} \cdot 2^{37} \,=\, 2^{55}\). Hence, total complexity is dominated by \(\approx \)2\(^{55}\).

With the same attack strategy, we also attack Speck48 and Speck64. The details of those attacks are described in Appendixes (A and B).

9 Conclusion

In this paper were presented new results on the differential analysis of lightweight block ciphers Simon and Speck. In particular, by applying new techniques for the automatic search of trails and differentials in ARX ciphers, several previous results were improved. Those improvements were further used to mount the currently best known attacks on several versions of Simon and Speck. In addition an efficient algorithm for the computation of the DP of the AND operation was presented. A detailed analysis of the strong differential effect in Simon was given and the reason for it was analyzed. The described methods are general and are therefore applicable to other ARX designs.