1 Introduction

Differential cryptanalysis, first published by Biham and Shamir [9] to analyse the DES, has become one of the prime attack vectors which any modern symmetric-key primitive has to be resistant against. The idea behind differential cryptanalysis is to find a correlation between the difference of a pair of plaintexts and ciphertexts which holds with high probability. The challenge for an cryptanalyst consists of finding such a correlation or to show that no such correlation exists. A popular approach is to design a cipher in such a way that one can find a bound on the best differential characteristics, either directly e.g., the wide-trail strategy deployed in AES or using methods based on Matsui’s algorithm, MILP or SAT.

A differential characteristic specifies all the intermediate differences after each round of the primitive. However, when constructing a differential distinguisher one only cares about the input and output difference. It is often assumed that a single characteristic dominates the probability of such a differential, however this is not true in general and leads to imprecise estimates of the probability in many cases [10, 24].

In the work by Lai, Massey and Murphy [33] they showed that if an iterated cryptographic primitive has independent round-keys, it can be considered as a Markov cipher. As differential cryptanalysis considers just the first and last difference and ignores the intermediate values, the probability of such a differential can then be computed as the sum of all characteristics, that are formed by the differentials. While this assumes that the rounds are independent, it provides a more precise estimate and the probability of the most probable differential will always be greater than the probability of the most probable characteristic.

Contributions. We provide a broad study covering different design strategies and investigate the differential gap between single characteristics and differentials for the block ciphers LBlock, Midori, Present, Prince, Rectangle, Simon, Skinny, Sparx, Speck and Twine. In order to do this, we use an automated approach for enumerating the characteristics with the highest probability contributing to a differential based on SMT solvers [41], which we adopt to different design strategies. This allows us to efficiently enumerate a large set of characteristics contributing to the probability of a differential resulting in a precise estimate for the probability of differentials.

For Skinny-64 we present an 8-round differential distinguisher with a probability of \(2^{-56.93}\), while the best single characteristic only suggests a probability of \(2^{-72}\). For Midori-64 we show that the best characteristic for 8 rounds, with a probability of \(2^{-76}\) can be used to find a differential with a probability of \(2^{-60.86}\). Our results show that in the case of many new lightweight ciphers like Midori-64, Skinny-64, and Sparx-64 the probabilities improve significantly and that we can find differential distinguishers which are able to cover more rounds. This suggests that one should be particularly careful with lightweight block ciphers when using simpler approximations like counting the number of active S-boxes.

Our method is generic and can easily be applied to other designs as one only needs to describe the differential behaviour of the round function and can re-use all the components we implemented for doing so. This allows both to find optimal differential characteristics and to enumerate all characteristics contributing to a differential.

Furthermore, we provide experiments to verify that our estimates of the differential probability provide a good approximation. However, we also noticed that the distribution over the choice of keys varies significantly for some design strategies and that commonly made assumptions do not hold for reduced-round versions. While for Skinny-64 the distribution over the keys follows relatively closely what one would expect we noticed that for Midori-64 for a large class of keys there are no pairs following the differential at all, while for very few keys the probability is significantly higher.

Related Work. Daemen and Rijmen firstly studied the probability of differentials for AES in their work on Plateau Characteristics [20]. In their work, they analysed AES on the distribution of differential probability over the choice of keys and showed that all 2-round characteristics have either a zero probability or for a small subset of keys the probability is non-zero. However, they only considered AES, but conjectured that other ciphers with 4-uniform S-boxes will show a similar result. In the case of AES and AES-like ciphers, there has also been a lot of research in studying the expected differential/linear probability (MEDP/MELP) [18, 30], that is used to provable bound the security of a block cipher against differential/linear cryptanalysis.

In recent years, many automated tools were proposed that could help designers to prove bounds against differential/linear attacks. Mouha et al. [42] used Mixed Integer Linear Programming (MILP) to count active S-boxes and compute provable bounds. Furthermore, there have been a few approaches of using automated tools to find optimal characteristics, and to collect many characteristics with the same input/output differences. This idea was first introduced by Sun et al. [46] who used MILP. Likewise, tools using SAT/SMT solvers are used where the results were applied to Salsa-20 [41], Norx [5], and Simon [31].

Moreover, there exist several design and attack papers that study the effect of numerous characteristics contributing to the probability of a differential: Mantis [24], Noekeon [29], Salsa [41], Simon/Speck [11, 31], Rectangle [54] and Twine [10]. Yet, these are often based on truncated differentials or dedicated algorithms for finding large numbers of characteristics. For example in [25], Eichlseder and Kales attack Mantis-6 by finding a large cluster of differential characteristics. Contrary to the attack on Mantis-5 by Dobraunig et al. [24] where the cluster was found manually, in the attack on Mantis-6, Eichlseder and Kales used a tool based on truncated differentials.

Similar effects have also been observed in the case of linear cryptanalysis, where Abdelraheem et al. [1] showed that the security margins based on the distribution of linear biases are not always accurate. Their work has further been studied and improved by Blondeau and Nyberg [13].

Software. All the models for enumerating the differential characteristics are publicly available at https://github.com/TheBananaMan/cryptosmt.

Outline. The remainder of this paper is structured as follows. After briefly revisiting some of the necessary definitions about differential cryptanalysis in Sect. 2, we provide details about the automated tools that we use in Sect. 3 and describe how to efficiently find differential characteristics for various ciphers. In Sect. 4 we present the results of our analysis on the gap between single differential characteristics and differentials for various cryptographic primitives. We also analyze the best differential attacks, that are published on those ciphers so far, and show if the attacks can be improved by considering the aforementioned differential gaps. Moreover, in Sect. 5 we give details about our experiments of the distribution over keys for the probability of differentials.

2 Differentials and Differential Characteristics

Differential cryptanalysis is one of the most powerful techniques in the analysis of symmetric-key primitives. Many extensions to it have been developed and it has found wide applications on block ciphers, stream ciphers and cryptographic hash functions. In the following, we state some definitions and notations that we will use throughout the paper.

A block cipher is a family of permutations parameterised by a key \(K \in \mathbb {F}_2^k\), that maps a set of plaintexts \(P \in \mathbb {F}_2^n\) to a set of ciphertexts \(C \in \mathbb {F}_2^n\)

$$\begin{aligned} \text {E}_K: \mathbb {F}_2^k \times \mathbb {F}_2^n \rightarrow \mathbb {F}_2^n. \end{aligned}$$
(1)

Virtually all currently used block ciphers are iterative block ciphers, i.e., they are composed of applying a simple round function r times

$$\begin{aligned} \text {E}_K(\cdot ) = f_r(\cdot ) \circ \ldots \circ f_1(\cdot ). \end{aligned}$$
(2)

The idea of differential cryptanalysis is to look at pairs of plaintexts \((p_1, p_2)\) and the corresponding ciphertexts \((c_1, c_2)\) and try to find a correlation between the differences \(\alpha \) and \(\beta \), where \(\alpha = p_1 \oplus p_2\) and \(\beta = c_1 \oplus c_2\).

Definition 1

A differential is a pair of differences \((\alpha , \beta ) \in \mathbb {F}_2^n \times \mathbb {F}_2^n\).

If such a correlation holds with high probability, we can use this to distinguish the block cipher from a random permutation and further use this to mount key-recovery attacks.

Definition 2

The differential probability of a differential over a block cipher is

$$\begin{aligned} \text {DP}(\alpha \xrightarrow {\text {E}_K} \beta ) = \Pr _X(\text {E}_K(X) \oplus \text {E}_K(X \oplus \alpha ) = \beta ). \end{aligned}$$
(3)

where X is a random variable that is uniformly distributed over \(\mathbb {F}_2^n\).

For ease of notation we define the weight of a differential as \(-\log _2(\text {DP}(\cdot ))\). Any non-zero differential for a random permutation \(F_{\$}: \mathbb {F}_2^n \rightarrow \mathbb {F}_2^n\) will have a differential probability close to \(2^{-n}\). Therefore one is interested in finding any differential with \(\text {DP}(\alpha \xrightarrow {\text {E}_K} \beta ) \gg 2^{-n}\). In general, it is computationally infeasible to compute the exact value of the \(\text {DP}\) as this would require to exhaustively search through the whole space of all possible plaintexts. One can use the structure of a block cipher, to obtain a good approximation of the actual \(\text {DP}\) with less computational effort by tracking the differences through the round functions.

Definition 3

A differential characteristic is a sequence of differences

$$\begin{aligned} Q = (\alpha _1 \xrightarrow {f_1} \alpha _2 \xrightarrow {f_2} \ldots \xrightarrow {f_{r-1}} \alpha _r). \end{aligned}$$
(4)

Yet, it is still computationally infeasible to compute the exact value of \(\text {DP}(Q)\) and the general approach is to assume independence of the rounds. For most designs it is feasible to compute the exact probability of a differential for a single round. One can therefore compute

$$\begin{aligned} \text {DP}(Q) \approx \prod _{i=1}^{r-1} \Pr _X(\alpha _i \xrightarrow [X]{f_i} \alpha _{i+1}). \end{aligned}$$
(5)

While this assumption of independent rounds is not true in general, it has been shown to serve as a good approximation in practice. However, if an adversary wants to construct a distinguisher, she actually does not care about any intermediate differences and is only interested in the probability of the differential. The adversary can therefore collect all differential characteristics sharing the same input and output difference to get a better estimate

$$\begin{aligned} \Pr (\alpha _1 \xrightarrow {E} \alpha _r) = \sum _{\alpha _2, \ldots , \alpha _{r-1}} \Pr _X(\alpha _1 \xrightarrow [X]{f_1} \alpha _2 \xrightarrow [f_1(X)]{f_2} \cdots \alpha _{r-1} \xrightarrow [f_{r-1} \circ \ldots \circ f_1(X)]{f_{r-1}} \alpha _r). \end{aligned}$$
(6)

It is often assumed that the probability of the differential is close to the probability of the best single characteristic. While this might hold for some ciphers this assumption has been shown to be inaccurate in several cases and does not hold for many modern block ciphers [10, 24]. We will show later in Sect. 4 that this assumption fails particularly often for some recently designed lightweight block ciphers.

We consider two different criteria for a design: differential characteristic resistant (DCR), which means that no single differential characteristic exists with a probability larger than \(2^{-n}\) and differential resistant (DR) which means that it should be difficult to find a differential with a probability larger than \(2^{-n}\). Note that we typically can not avoid that there are differentials with \(\text {DP}\ge 2^{-n}\), as if we fix the input difference to \(\alpha _1\) then \(\sum _{\alpha _r \ne 0} \Pr (\alpha _1 \xrightarrow {E} \alpha _r) = 1\). This implies that there exists at least one differential with a probability \(\text {DP}\ge 2^{-n}\). In the Wide-Trail Strategy which was used to design the AES and subsequently many other ciphers, Daemen and Rijmen suggest that it is a sound design strategy to restrict the probability of difference propagation [19]. Nevertheless, this does not result in a proof for security.

Note that in the definitions so far the influence of the keys was ignored. However, the \(\text {DP}\) for a specific differential strongly depends on the choice of the secret key and it is therefore of interest how this distribution looks like. To solve this problem we could compute the probabilities of a differential over the whole key space, however this is again practically infeasible which leads one to use the expected differential probability.

Definition 4

The expected differential probability of a block cipher \(\text {E}_k\) of an r-round differential \((\alpha , \beta )\), with a key-size of \(\kappa \)-bits is defined as

$$\begin{aligned} \mathrm{EDP}(\alpha \xrightarrow {\text {E}} \beta ) = 2^{\kappa } \sum _{k\in \mathbb {F}_2^{\kappa }} \Pr _X(\alpha \xrightarrow [X]{\text {E}_k} \beta ). \end{aligned}$$
(7)

In order to derive some sort of security proof against differential cryptanalysis often the Hypothesis of Stochastic Equivalence [33] is used which states that for all differentials Q it holds that for most keys K the differential probability of a characteristic is close to the expected differential probability, \(\text {DP}_K(Q) \approx \text {EDP}(Q)\). In practice this hypothesis does not always hold [16], which we will also see later in Sect. 5.

3 Finding Differential Characteristics Efficiently

While there are many methods based on SAT, MILP or Matsui’s algorithm to find differential characteristics and even prove an upper bound on the probability of the best single characteristic, it remains a hard problem to find a good estimate on the probability of the best differential. Even finding those differential characteristics remains a difficult problem for some design strategies and cryptanalysts had to search manually for differentials in some attacks [53]. Nowadays a variety of automated tools [12, 35, 45] is available which are constantly improved and help cryptanalysts in finding good differential characteristics.

3.1 SAT/SMT Solvers

SAT solvers are used to solve the Boolean satisfiability problem (SAT) and are based on heuristic algorithms. A solver starts from an initial assignment for the literals and then builds a search tree by using systematic backtracking until all conflicting clauses are resolved and either an assignment of variables for a satisfiable set of clauses is returned or the solver decides that this instance is unsatisfiable. The most commonly algorithms used in SAT solvers are based on the original idea of DPLL [21].

SMT solvers are more powerful than SAT solvers in the sense that they can express constraints on a higher abstraction layer and allow simple first-order logic. In general, SMT solvers often translate the problem to SAT and then use an improved version of the DPLL algorithm and backtracking to infer when theory conflicts arise. Moreover, the solver checks the feasibility of conjunctions from the first-order logic predicates as it interacts with the Boolean formulas that are returned by the SAT solver.

There exists a few SAT/SMT solvers that are suitable for our use cases. STP [50] is an SMT solver that uses the CVC and SMTLIB2 language to encode the constraints and then invokes a SAT solver to check for satisfiability of the model. CryptoMiniSat [40] is an advanced SAT solver that supports features like XOR recoveryFootnote 1 to simplify clauses. As XOR operations are commonly used in cryptography this can be an advantage and potentially reduces the solving time. We also considered other solvers like Boolector [43], which for some instances provide a better performance, however in general this only provides an improvement by a small constant factor and it is hard to identify for which instances one obtains any advantage.

3.2 From Differential Cryptanalysis to Satisfiability Modulo Theories

When using automated tool like SAT/SMT solvers, one can simplify the search for differential characteristics and differentials by modeling the differential behavior of the block cipher. For this we represent all intermediate states of our block cipher as variables which corresponds to the differences and encode the transitions of differences through the round functions as constraints that can be processed by the SMT/SAT solver. An advantage of using SMT over SAT for the modeling is that most SMT solvers support reasoning over bit-vectors which are commonly used in block cipher designs, especially when considering word-oriented ciphers. This both simplifies the modeling of the constraints and can lead to an improved time for solving the given problem instances compared to an encoding in SAT.

Constructing an SMT Model. In this paper, we focus on a tool that uses the CVC languageFootnote 2 for encoding the differential behavior of block ciphers. Therefore, we encode the constraints imposed by the round function for each round of the block cipher and the probability of the resulting differential transitions. Our main goal here is to construct an SMT model which decides whether

$$\begin{aligned} \exists Q: \text {DP}(Q) = 2^{-t}, \end{aligned}$$
(8)

which allows us to find the best differential characteristic Q for a cipher by finding the minimum value t for which the model is satisfiable.

In order to represent the differential behaviour of a cipher we consider any operation in the cipher, e.g., the application of an S-box, matrix multiplication, word-wise operation or bit operation, and add constraints for a valid transition from an input to an output difference such that any valid assignment to the variables corresponds to a valid differential characteristic in the actual operation. For any non-linear component we introduce additional variables \(w^j\) which represent the \(\log _2\) probability of the differential transition. The probability of Q is then given by \(\sum w^j\). This means that a valid assignment for all these variables directly gives us the differential characteristic Q with all intermediate differences and \(\text {DP}(Q) = p\).

In the following we give an overview on how the different components of the ciphers can be modeled in the SMT model. The algorithms to find the optimal differential characteristics and consequently good estimates for the differentials are described in Sect. 3.3.

S-Boxes. Substitution Permutation Network (SPN) ciphers typically use S-boxes, which are non-linear functions operating on a small number of bits. These are often 4- or 8-bit functions and therefore we can compute the differential probability by simply constructing the Difference Distribution Table (DDT), which is a full lookup table of all possible pairs of input/output differences, for each S-box. In our SMT model we represent the input difference to an n-bit S-box as \(\alpha = \alpha _1, \ldots , \alpha _n\) respectively the output as \(\beta = \beta _1, \ldots , \beta _n\). These variables correspond to the input/output difference to this S-box and we want to constraint them to only allow non-zero probability combinations of input and output differences. We further introduce additional variables \(w = w_1, \ldots , w_n\) which are used to represent the probability of the transition. The probability of the transition is encoded as \(2^{-wt(w)}\), where \(wt(\cdot )\) denotes the Hamming weight of w.

In order to construct the constraints on the variables, we first find all valid transitions and their corresponding probability. We want to construct a CNF which is satisfiable if and only if the assignment corresponds to such a valid characteristic. One simple way to this is by just considering all assignments which are impossible. If a transition is defined as \((a \xrightarrow {S} b)\) and has a probability c then we add the following clause

$$\begin{aligned} \begin{aligned} T =~&N(a_1,\alpha _1) \vee \ldots \vee N(a_n,\alpha _n) \vee \\&N(b_1, \beta _1) \vee \ldots \vee N(b_1,\beta _n) \vee \\&N(c_1, w_1) \vee \ldots \vee N(c_n, w_n) \end{aligned} \end{aligned}$$
(9)

where

$$\begin{aligned} N(x_i, y_i) = {\left\{ \begin{array}{ll} \lnot y_i, \text {if}\ x_i = 0\\ {\lnot }y_i, \text {if}\ x_i = 1\\ \end{array}\right. }. \end{aligned}$$
(10)

This clause is only satisfiable if the variables of the corresponding S-box are not set to the invalid assignment. For example let \(a = (1, 0, 1, 1)\), \(b = (0, 0, 0, 0)\) and \(c = (0, 0, 0, 0)\) then we add the clause

$$\begin{aligned} (\lnot \alpha _0 \vee \alpha _1 \vee \lnot \alpha _2 \vee \lnot \alpha _3 \vee \beta _0 \vee \beta _1 \vee \beta _2 \vee \beta _3 \vee w_0 \vee w_1 \vee w_2 \vee w_3). \end{aligned}$$
(11)

We implemented this approach to generate the SMT models for 4- and 8-bit S-boxes, where most of the lightweight ciphers actually use 4-bit S-boxes which allows a very compact description (i.e., to represent the 4-bit S-box of Skinny we need 12 variables and about 3999 clauses in CNF). Note that our method is limited to S-boxes which have a DDT with entries that are a power of 2. For other S-boxes a similar method could be used by using l additional variables for encoding probabilities of the form \(2^{-0.5}, 2^{-0.25}, \ldots \) to get an approximation of the actual probability.

Linear Layers. The diffusion layers of Substitution Permutation Networks in lightweight ciphers are often constructed with simple bit-permutations (e.g., Present) or by multiplication with matrices having only binary coefficients (e.g., Midori, Skinny). ARX-based ciphers (e.g., Speck) use the diffusion properties of XOR combined with rotations. Feistel networks (e.g., Simon, LBlock, Twine) also mix the state by switching parts of the states on every Feistel switch.

For modeling rotations and bit-permutations in an SAT/SMT solver, we simply have to re-index the variables accordingly before they are input to another function. This can be achieved using SMT predicates (ASSERT and equality) in the CVC language. Rotations can be realized using predicates for shifting words and the word-wise or function that are available in the CVC language. The multiplication by a binary matrix can be modeled using the xor predicate at the word-level.

ARX Designs. ARX designs use modular additions (modulo \(2^n\)), XOR and rotations. As modular addition is the only non-linear component, that is not already available in the SMT solver, we use an algorithm proposed by Lipmaa and Moriai [36] to efficiently compute the differential probability of modular addition. Let \(xdp^+(\alpha , \beta \rightarrow \gamma )\) be the XOR differential probability of modular addition, where \(\alpha , \beta \) are input differences and \(\gamma \) is the output difference, then it holds that a differential is valid if and only if:

$$\begin{aligned} \text {eq}(\alpha \ll 1, \beta \ll 1, \gamma \ll 1) \wedge (\alpha \oplus \beta \oplus \gamma \oplus (\beta \ll 1)) = 0 \end{aligned}$$
(12)

where

$$\begin{aligned} \text {eq}(x,y,z) := (\lnot x \oplus y) \wedge (\lnot x \oplus z). \end{aligned}$$
(13)

The weight of a valid differential is defined as:

$$\begin{aligned} w(\alpha ,\beta ,\gamma ) := -\log _2{(xdp^+(\alpha , \beta \rightarrow \gamma ))} = wt'(\lnot eq(\alpha , \beta , \gamma )). \end{aligned}$$
(14)

where \(wt'(\cdot )\) denotes the Hamming weight omitting the most significant bit. We implemented this algorithm to calculate the differential probability of modular additions.

3.3 Finding the Best Characteristics and Differentials

We use the open-source tool CryptoSMT [45] for the automated search of differential characteristics and implemented several missing functionalities for block ciphers (i.e., support for S-boxes as described in Sect. 3.2, and binary diffusion matrices). CryptoSMT is based on the state-of-the-art SAT/SMT solvers, CryptoMiniSat [40] and STP [50].

The tool offers a simple API that allows cryptanalysts and designers to formulate various cryptanalytic problems and solve them with the underlying SAT/SMT solver. We added the models for the block ciphers Skinny, Midori, Rectangle, Present, Prince, Sparx, Twine and LBlock (Note that some of these are block cipher families and we focused on a subset of parameters) to CryptoSMT and use the following two functionalities provided by the tool:

  • Decide if a differential characteristic with probability p exists.

  • Enumerate all differential characteristics with a probability of p.

Based on this we can achieve our two goals, namely finding the best differential characteristic and estimating the probability of the differential.

Best Differential Characteristic. In order to find the characteristic Q with maximum probability \(p_\text {max}\) for r rounds of a block cipher we start by checking whether our model is satisfiable for a probability of p, starting at \(p = 1\). If our model is not satisfiable we continue by checking whether there is a valid assignment for \(p = 2^{-1}\). Note that for all our block ciphers the probability of the differential transitions are powers of two and therefore there does not exist any differential characteristic which has a probability \(p'\) such that \(2^{-(t+1)}< p' < 2^{-t}\) for any integer t. We continue this process until we reach a model which is satisfiable, which gives us an assignment of all variables of the state forming a valid differential characteristic with probability \(p_\text {max} = 2^{-t}\). Considering that we start with probability \(p=1\) and then we constantly increase the weight, and finish as soon as we found an valid assignment, we can ensure that we found the best differential characteristic.

Estimate for the Probability of a Differential. In order to find a good differential we can use a tool assisted approach to compute an approximation for Eq. 6, as shown in [41]. We first obtain the best single characteristic Q with probability \(p = 2^{-t}\) which gives us the input difference \(\alpha _1\) and output difference \(\alpha _r\). Subsequently we modify our model and fix the input and output difference to \(\alpha _1\) respectively \(\alpha _r\). Note that this restricts the search space significantly and results in a much faster time for solving any subsequent SMT instances.

The next step is to find all differential characteristics Q, such that \(\text {DP}(Q) = 2^{-u}\), for \(u = t, t + 1, \ldots \), under this new constraints. This allows us to collect more and more terms of the sum in Eq. 6, improving our approximation for the differential. By doing this process we always search for those differential characteristics which contribute the most to the probability of the differential first.

Here we assume that the input and output difference imposed by the best differential characteristic correspond to a good differential. While this assumption might not always hold and some of the differentials we found significantly improve the best differential distinguishers there could still exist better starting points for our search, for example as shown in [32] against the block cipher Simeck.

4 Analysis of the Gap in Lightweight Ciphers

The construction of cryptographic primitives optimized for resource constrained devices has received a lot of attention over the last decade and various design strategies and optimisation targets have been explored. All these primitives exhibit the idea of using simpler operations in order to save costs and therefore often exhibit a simpler algebraic structure compared to other symmetric-key algorithms.

For some design strategies this leads to a significant larger gap between single characteristics and differentials. This gap becomes especially relevant for aggressively optimised designs with minor security margins. Table 1 gives an overview of all the block ciphers we analysed with the methodology outlined in Sect. 3 and their security margins as well as the best known differential attacks.

Table 1. Best attacks and security margins (active S-boxes) for various design strategies for symmetric cryptographic primitives. D/MD/RK/ID/R/TD = differential, multiple differential, related-key, impossible differential, rectangle, truncated differential

4.1 Designs Strategies

We categorise these lightweight ciphers according to their design strategies as this has the largest influence on the gap. In general one can distinguish between two main design families: Substitution-Permutation Networks (SPN) and Feistel Networks. Within these families we can gather ciphers according to other structural properties. These are for SPN: AES-like, Bit-sliced S-boxes, Bit-based Permutation Layers, Reflection Ciphers, ARX-based and for Feistel: ARX-based, Generalized Feistel Networks and Two-branched.

In our study, we then analyzed the differential gaps for Midori [6], Skinny [8], Rectangle [54], Present [14], Prince [15], Sparx [23], Simon [7], Speck [7], Twine [47], and LBlock [47] where Table 1 categorises the ciphers according their aforementioned structural properties.

4.2 Skinny

Skinny [8] is an AES-like tweakable block cipher, based on the Tweakey framework [28]. The aim of Skinny is to achieve the hardware performance of the AND-RX-cipher Simon and have strong security bounds against differential/linear attacks (this includes the related-key scenario), while also having competitive software performance. The resistance against differential/linear attacks in Skinny is based on counting the minimal number of active S-boxes, in the single-key and related-tweakey models. As the design of Skinny is based on a few very simple but highly efficient cryptographic building blocks it seems intuitive that one can expect that a large number of differential characteristics will contribute to a differential. Recent attacks [3, 38] exploited the low branch number of the binary diffusion matrix, as well as properties of the tweakey schedule.

Using our tool-assisted approach we analysed this gap in Skinny-64 (see Fig. 1) and can provide some new insights to the security of Skinny-64. For example the best 8-round single differential characteristic \(Q^8_{\max }\) suggests a probability of \(2^{-72}\) while the differential \(D^8\) defined by the input/output difference of \(Q^8_{\max }\) consists of a large cluster of characteristics leading to the differential

$$\begin{aligned} \texttt {0x0104401000C01C00} \xrightarrow {8-\text {round}\ {\texttt {Skinny}-64}} \texttt {0x0606060000060666} \end{aligned}$$
(15)

with a probability larger than \(2^{-56.55}\) by taking all 821896 characteristicsFootnote 3 into account which have \(\text {DP}> 2^{-99}\). Note that the probabilities and the number of characteristics are obtained with a fixed input/output difference as noted in Eq. 15. This suggests that estimates from active S-boxes should be taken with care as the gap is fairly large. However, the number of rounds in Skinny-64 is chosen very conservatively and it provides a large security margin.

In particular the probability of the differential improves very quickly when adding more characteristics, as the distribution of the number of characteristics with a probability \(2^{-t}\) is very flat over the choice of t (see Fig. 1). For example there are 39699 characteristics with \(\text {DP}= 2^{-75}\) and 25413 characteristics with \(\text {DP}= 2^{-76}\) and the probability of the differential only improves marginally by considering more characteristics with a lower probability. On the contrary, for designs like Simon (see Fig. 5) this distribution grows exponentially as the probability of the single characteristics decreases as has also been noted in [31], and one has to take a much larger number of characteristics into account before getting a good approximation. For a detailed overview over how many characteristics contribute to each differential see Appendix A.

Fig. 1.
figure 1

Probability for the best single characteristics and differentials for Skinny-64 (left), and the distribution of the number of characteristics with a fixed probability contributing to the best 8-round differential for Skinny-64 (right). The green line indicates the probability of the differential when summing up the probability of all characteristics up to this probability, which highlights the small improvement when adding all lower probability characteristics. (Color figure online)

4.3 Midori

Midori is an AES-like lightweight block cipher optimized for low-energy usage using a binary near-MDS matrix combined with a generic cell permutation for diffusion. Despite that Midori-64 has a large number of \(2^{32}\) weak keys, for which Midori-64 can be practically broken with invariant subspace attacks [27], there has been no differential attacks on even reduced versions of Midori, apart from a related-key attack by Gérault and Lafourcade [26].

The gap between the differential probability of a single characteristic and a differential behaves similar to Skinny-64, i.e., counting the active S-boxes gives an inaccurate bound against differential distinguishers. For example we found new differentials for Midori-64 where the 8-round single differential characteristic suggests a probability of \(2^{-76}\) and the corresponding 8-round differential

$$\begin{aligned} \texttt {0x0A000000A0000005} \xrightarrow {8-\text {round}\ {\texttt {Midori}-64}} \texttt {0x000000000000A0AA} \end{aligned}$$
(16)

has a probability larger than \(2^{-60.86}\) by summing all 693730 characteristics up to a probability of \(2^{-114}\). Similar to Skinny the distribution of the contributing characteristics is very flat, which means that we quickly approach a good estimate for the probability of the differential (see Fig. 2).

Fig. 2.
figure 2

Probability for the best single characteristics and differentials for various rounds of Midori-64 (left), and distribution of the characteristics contributing to the best 8-round differential for Midori-64 (right).

4.4 Sparx

Sparx [23] is based on the long-trail strategy, introduced alongside with Sparx, which can be seen as combining the ARX approach with an SPN, allowing to provide bounds on the differential resistance of an ARX cipher by counting the active S-boxes. While it is also feasible to prove such a bound using the methodology from Sect. 3, it is often computationally infeasible or the bounds are not very tight [41]. The designers of Sparx used the YAARX toolkit [12] to show truncated characteristics, that they used to compute the differential bounds. One of the main design motivations of Sparx was that it should be very difficult to find differential characteristics for a large number of rounds for ARX-based ciphers with a state of more than 32 bits [22].

In general ARX ciphers do not have a very strong differential effect compared to the previous lightweight SPN constructions, however as Sparx is in-between those it is an interesting target. Our results suggest that Sparx-64 has a differential effect comparable to other ARX designs like Speck-64 (see Fig. 3). The major limitation for applying our approach to Sparx is that the search for optimal differential characteristics on Sparx is computationally very costly. While single-characteristics up to 6 rounds can be found in less then 5 min, the 10-round single-characteristic took already 32 days, on a single coreFootnote 4.

Fig. 3.
figure 3

Comparison of the best single characteristics and differentials for various rounds of Speck-64 (left), and Sparx-64 (right).

4.5 Results for Other Lightweight Ciphers

Table 2 summarizes the gaps between single-characteristics and differentials for all lightweight block ciphers we analyzed. We observed that for most ciphers a large gap between the probability for single-characteristics and differentials exists and that a higher number of rounds is required for the block ciphers to be differential resistant. The gaps also increase significantly with the number of rounds, which is not surprising as with more rounds there are more valid differential characteristics for a given input/output difference.

The biggest gap, in term of number of rounds, occurs for Simon-64 with a gap of five rounds. There is also a 2-round gap for ciphers like Present, Midori and Twine. However, it seems that the gap for Simon-64 grows faster, considering that the differentials and characteristics seem to follow an exponential growth as also observed in [31]. In comparison Present, Midori and Twine seem to grow in a linear way. In relation to the number of rounds, the gap for Midori also has quite a significant impact and allows to extend the distinguisher by two rounds. Further we observed that there seem to be nearly no gaps for ciphers like Rectangle and Speck. We illustrate the gaps for the analyzed ciphers in Fig. 4 and we provide Fig. 5 showing the distribution of valid differential characteristics that contribute to the probability of the best differential for each cipher.

Table 2. Gap between the number of rounds required for a cipher to be differential characteristic resistant (DCR) and differential resistant (DR). Note that DR is only a lower bound and there might still exist better differentials.
Fig. 4.
figure 4

Probability for the best single characteristics and differentials for various rounds of different block ciphers. 1st row: Simon-64 (left) and Present (right), 2nd row: Rectangle (left) and Prince (right), 3rd row: Speck-64 (left) and Twine (right), 4th row: LBlock (left)

Fig. 5.
figure 5

Distribution of the characteristics contributing to the best differential for various block ciphers. 1st row: Simon-64 (left) and Present (right), 2nd row: Rectangle (left) and Prince (right), 3rd row: Speck-64 (left) and Twine (right), 4th row: LBlock (left) and Sparx-64 (right)

4.6 Application of the Differential Gaps to the Best Published Differential Attacks

In the following, we analyze the best published attacks and discuss improvements of the attacks when possible:

Gérault and Lafourcade [26] proposed related-key differential attacks on full-round Midori-64, where they use 16 15-round and \(4 \cdot 14\)-round related-key differential characteristics to recover the key. In their attacks they do not exploit differentials. In comparison, the best differential that we found reaches 8 rounds with a probability of \(2^{-60.86}\).

Liu et al. [38] propose related-tweakey rectangle attacks on 26 rounds of Skinny-64-192 and they use optimal single differential characteristics based on truncated differential characteristics. The authors exploit the differential gap of Skinny by using 5000 single differential characteristics to compute the differential for a 22-round distinguisher. In comparison, the best differential characteristic with no differences in the tweak/key that we found reaches 8 rounds with a probability of \(2^{-56.55}\).

Zhang et al. [54] studied the differential effect and showed an 18-round differential attack, where they used a 14-round differential with a probability of \(2^{-62.83}\). In our analysis we found a better differential for 14 rounds with probability of \(2^{-60.63}\) by summing up 40627 single-characteristics which would improve the complexity of these attacks. For more rounds the distinguisher are below \(2^{-64}\).

Liu and Jin [37] presented an 18-round attack based on slender-sets. Wang et al. [51] further presented normal differential attacks on 16-round Present where they used a differential with probability \(2^{-62.13}\) by summing up 91 differential characteristics which is comparable to our differentials.

Canteaut et al. [17] showed differential attacks on 10 rounds of Prince, by considering multiple differential characteristics. In their attack they use 12 differentials for 6 rounds with a probability of \(2^{-56.42}\) by summing up 1536 single-characteristics. The differential we found for 6 rounds only has a probability of about \(2^{-62}\), but does not lead to further improvements of the attack.

Ankele and List [4] studied truncated differential attacks on 16 rounds of Sparx-64/128 and used single differential characteristics, for the first part of the 14-round distinguisher, and truncated the second part of the distinguisher. The designers of Sparx-64 claim that Sparx is differential secure for 15 rounds, however, by considering the differential effect of Sparx-64, also in comparison with Speck-64, it seems likely that there exist differentials with more than 15 rounds with a data complexity below using the full codebook.

Abed et al. [2] presented differential attacks on Simon-64, where they used a 21-round distinguisher with a probability of \(2^{-61.01}\). Better distinguishers are reported by [39] for 23 rounds with a probability of \(2^{-63.91}\). The differentials we found are in line with previous results.

Song et al. [44] presented 20-round attacks on Speck-64 by constructing a distinguisher from two short characteristics where they concatenated the two characteristics to a 15-round characteristic with probability \(2^{-60.56}\). The distinguishers used in the attack are already based on differentials and the differentials we found do not lead to any improvement.

Biryukov et al. [10] showed a 25-round impossible differential attack and a truncated differential attack on 23 rounds by chaining several iterated 4-round characteristics together. In the paper the authors also considered differentials for 12 rounds with a probability of \(2^{-52.08}\) and 16 rounds with probability \(2^{-67.59}\). The best differential that we found reaches 15 rounds with a probability of \(2^{-62.89}\).

Wang et al. [52] published a 24-round impossible differential attack on LBlock. Due to the nature of impossible differential attacks, characteristics with probability 1 are used for constructing these. The best differential that we found reaches 15 rounds with a probability of \(2^{-61.43}\).

5 Experimental Verification and the Influence of Keys

In Sect. 2 we made several assumptions in order to compute \(\text {DP}(Q)\) and in this section we compare the theoretical estimates with experiments for reduced-round versions. This serves two purpose: First we want to see how close our estimate for \(\text {DP}(\alpha , \beta )\) is and secondly we want to see the distribution over the choice of keys. Specifically, we are interested in the number of pairs

$$\begin{aligned} \delta _K(\alpha , \beta ) = \#\{x \in \mathbb {F}_2^n\ |\ \text {E}_K(x) \oplus \text {E}_K(x \oplus \alpha ) = \beta \}. \end{aligned}$$
(17)

This number of good pairs will vary over the choice of the key. For a random process we would expect that the number of valid pairs is about \(\text {DP}\cdot 2^n\) and follows a Poisson distribution.

Definition 5

Let X be a Poisson distributed random variable representing the number of pairs (ab) with values in \(\mathbb {F}_2^n\) following a differential \(D = (\alpha \xrightarrow {f} \beta )\), that means \(f(a) \oplus f(a \oplus \alpha ) = \beta \), then

$$\begin{aligned} \Pr (X = l) = \frac{1}{2}(2^{n}p)^l \frac{e^{-(2^{n}p)}}{l!} \end{aligned}$$
(18)

where p is the probability of the differential.

In the following, we experimentally verify differentials for Skinny, Speck and Midori for a large number of random pairs of plaintexts and a random choice of keys to see how good this approximation is.

5.1 Skinny

As a first example we look at Skinny-64. We use the 6-round differential

$$\begin{aligned} D = (\texttt {0x0000010010000041}, \texttt {0x4444004040044044}) \end{aligned}$$

for Skinny-64. The best characteristic which is part of D has a probability of \(2^{-32}\) and by collecting all characteristics (100319) contributing to this differential we estimate \(\text {DP}(D) \approx 2^{-23.52}\). We try out \(2^{30}\) randomly selected pairs for 10000 keys and count the number of pairs following D. From our estimate we would expect that on average we get about 89 pairs for a key.

Fig. 6.
figure 6

Distribution of \(\delta _K(D)\) over a random choice of K for 6-round Skinny-64.

As one can see from Fig. 6 our estimate of \(\text {DP}(D)\) provides a good approximation for the distribution over the keys, although the distribution has a larger variance than we expected.

5.2 Speck

For Speck-64 we look at the differential

$$\begin{aligned} D = ((\texttt {0x40004092}, \texttt {0x10420040}), (\texttt {0x8080A080}, \texttt {0x8481A4A0})) \end{aligned}$$

over 7 rounds. The best characteristic in D has a probability of \(2^{-21}\) and this only slightly improves to about \(2^{-20.95}\) using 6 additional characteristics. We again run our experiments for \(2^{30}\) randomly selected pairs for 10000 keys and count the number of pairs following D. On average we would expect 530 pairs.

Fig. 7.
figure 7

Distribution of \(\delta _K(D)\) over a random choice of K for 7-round Speck-64.

In Fig. 7 it can be seen that for 7-round Speck-64 the distribution is bimodal and we over- respectively underestimate the number of valid pairs for most keys.

5.3 Midori

For Midori-64 we look at the differential

$$\begin{aligned} D = (\texttt {0x0200200000020000}, \texttt {0x0202220020020020}) \end{aligned}$$

over 4 rounds. The best characteristic in D has a probability of \(2^{-32}\) and this improves to about \(2^{-23.79}\) using 896 additional characteristics. We again run our experiments for \(2^{30}\) randomly selected pairs for 3200 keys and count the number of pairs following D. On average we would expect about 74 pairs.

Fig. 8.
figure 8

Distribution of \(\delta _K(D)\) over a random choice of K for 4-round Midori-64. We omitted the 2545 keys with 0 good pairs in this plot.

In Fig. 8 it can be seen that for 4-round Midori-64 the distribution is very different from the previous cases. For some keys the probability is significantly higher and for about \(80\%\) of the keys we get 0 good pairs. This means that for a large fraction of keys we actually found an impossible differential and one should be careful when constructing differential distinguishers for Midori. In particular it would be interesting to classify this set of impossible keys and we leave this as an open problem. Moreover, this also implies the existance of a large class of weak keys, that has also been observed in the invariant subspace attacks on Midori-64 [27, 34, 49].

6 Conclusions

In this work we showed for several lightweight block ciphers that the gap between single characteristics and differentials can be surprisingly large. This leads to significantly higher probability of differentials in several designs and allows us to have differential distinguishers covering more rounds.

We provided a simple framework to automate the process of collecting many differential characteristics that are contributing to the probability of a differential. We hope this will encourage future designs of cryptographic primitives to apply our methodology in order to provide better bounds on the security against differential cryptanalysis.

Further we verified differentials for a reduced number of rounds experimentally and showed that our improved estimates of the probability of differentials of Skinny closely resembles what happens in experiments. However, we can also observe that some commonly made assumptions on the distribution of good pairs following a differential over the choice of keys has to be made very carefully. For instance, the results for Speck and Midori indicate that one needs to be very careful in presuming that the estimates apply to all key values.