Keywords

1 Introduction

In the past decade, the development of ubiquitous computing applications triggered the rapid expansion of the lightweight cryptography research field. All these applications operating in very constrained devices may require certain symmetric-key cryptography components to guarantee privacy and/or authentication for the users, such as block or stream ciphers, hash functions or MACs. Existing cryptography standards such as AES [18] or SHA-2 [33] are not always suitable for these strong constraints. There have been extensive research conducted in this direction, with countless new primitives being introduced [2, 4, 5, 12, 15, 22, 39], many of them getting broken rather rapidly (designing a cipher with strong constraints is not an easy task). Conforming to general trend, the American National Institute for Science and Technology (NIST) recently announced that it will consider standardizing some lightweight functions in a few years [34]. Some lightweight algorithms such as PRESENT [12], PHOTON [21] and SPONGENT [11] have already been included into ISO standards (ISO/IEC 29192-2:2012 and ISO/IEC 29192-5:2016).

Comparing different lightweight primitives is a very complex task. First, lightweight encryption encompasses a broad range of use cases, from passive RFID tags (that require a very low power consumption to operate) to battery powered devices (that require a very low energy consumption to maximise its life span) or low-latency applications (for disk encryption). While it is generally admitted that a major criterion for lightweight encryption is area minimisation, the throughput/area ratio is also very important because it shows the ability of the algorithm to provide good implementation trade-offs (this ratio is also correlated to the power or energy consumption of the algorithm). Moreover, the range of the various platforms to consider is very broad, starting from tiny RFID tags to rather powerful ARM processors. Even high-end servers have to be taken into account as it is likely that these very small and constrained devices will be communicating with back-end servers [6].

While most ciphers take lightweight hardware implementations into account to some extend, PRESENT [12] is probably one of the first candidates that was exclusively designed for that purpose. Its design is inspired by SERPENT [7] and is very simple: the round function is simply composed of a layer of small 4-bit Sboxes, followed by a bit permutation layer (essentially free in hardware) and a subkey addition. PRESENT has been extensively analysed in the past decade, and while its security margin has eroded, it remains a secure cipher. One can note that the weak point of PRESENT is the tendency of linear trails to cluster and to create powerful linear hulls [10, 17].

Since the publication of PRESENT, many advances have been obtained, both in terms of security analysis and primitive design. The NSA proposed in 2013 two ciphers [4], SIMON and SPECK, that can reach much better efficiency in both hardware and software when compared to all other ciphers. However, this comes at the cost that proving simple linear/differential bounds for SIMON is much more complicated than for Substitution-Permutation-Network (SPN) ciphers like PRESENT (SIMON is based on a Feistel construction, with an internal function that uses only a AND, some XORs and some rotations). Besides, no preliminary analysis or rationale was provided by the SIMON authors. Last year, the tweakable block cipher SKINNY [5] was published to compete with SIMON ’s efficiency for round-based implementations, while providing strong linear/differential bounds.

As of today, SIMON and SKINNY seem to have a clear advantage in terms of efficiency when compared to other designs. Yet, PRESENT remains an elegant design, that suffers from being one of the first lightweight encryption algorithm to have been published, and thus not benefiting from the many advances obtained by the research community in the recent years.

Our Contributions. In this article, we revisit the PRESENT construction, 10 years after the original publication of PRESENT. This led to the creation of GIFT, a new lightweight block cipher, improving over PRESENT in both security and efficiency. Interestingly, our cipher GIFT offers extremely good performances and even surpasses both SKINNY and SIMON for round-based implementations (see Table 1). This indicates that GIFT is probably the cipher the most suited for the very important low-energy consumption use cases. Due to its simplicity and natural bitslice organisation of the inner data flow, our cipher is very versatile and performs also very well on software, reaching similar performances as SIMON, the current fastest lightweight candidate on software.

Table 1. Hardware performances of round-based implementations of PRESENT, SKINNY, SIMON and our new cipher GIFT, synthesized with STM 90 nm Standard cell library.

In more details, we have revisited the PRESENT design strategy and pushed it to its limits, while providing special care to the known weak point of PRESENT: the linear hulls. The diffusion layer of PRESENT being composed of only a bit permutation, most of the security of PRESENT relies on its Sbox. This Sbox presents excellent cryptographic properties, but is quite costly. Indeed, it is trivial to see that the PRESENT Sbox needs to have a branching number of 3, or very good differential paths would exist otherwise (with only a single active Sbox per round). We managed to remove this constraint by carefully crafting the bit permutation in conjunction with the Difference Distribution Table (DDT)/Linear Approximation Table (LAT) of the Sbox. We remark that, to the best of the authors knowledge, this is the first time that the linear layer and the Sbox are fully intricate in a SPN cipher.

In terms of performances, removing this Sbox constraint allowed us to choose a much cheaper Sbox, which is actually what composes most of the overall area cost in PRESENT. GIFT is not only much smaller, but also much faster than PRESENT. As can be seen in Table 2, GIFT is by far the cipher that uses the least total number of operation per bit up to now. In terms of security, we are able to provide strong security bounds for simple differential and linear attacks. We can even show that GIFT is very resistant against linear hulls, and the clustering effect is greatly reduced when compared to PRESENT, thus correcting its main weak point. We have conducted a thorough security analysis of our candidate with state-of-the-art cryptanalysis techniques.

Table 2. Total number of operations and theoretical performance of GIFT and various lightweight block ciphers. N denotes a NOR gate, A denotes a AND gate, X denotes a XOR gate.

We end up with a very natural and clean cipher, with a simple round function and key schedule (composed of only a bit permutation, thus essentially free in hardware). The cipher can be seen in three different representations (classical 1D, bitslice 2D, and 3D), each offering simple yet different perspective on the cipher’s security and opportunities for implementation improvements. GIFT comes in two versions, both with a 128-bit key: one 64-bit block version GIFT-64 and one 128-bit block version GIFT-128. The only difference between these two versions is the bit permutation to accommodate twice more state bits for GIFT-128.

In our hardware implementations of GIFT the storage composes about 75% of the total area, and the (very cheap) Sbox about 20%. Since any weaker choice of the Sbox would lead to a very insecure design, we argue that GIFT is probably very close to reaching the area limit of lightweight encryption.

Outline. We first specify GIFT in Sect. 2, and we provide the design rationale in Sect. 3. A thorough security analysis is performed in Sect. 4, while performances and implementation strategies are given in Sects. 5 and 6 for hardware and software respectively. All details are provided in the full version of the paper.

2 Specifications

In this work, we propose two versions of GIFT, GIFT-64-128 is a 28-round SPN cipher and GIFT-128-128 is a 40-round SPN cipher, both versions have a key length of 128-bit. For short, we call them GIFT-64 and GIFT-128 respectively.

GIFT can be perceived in three different representations. In this paper, we adopt the classical 1D representation, describing the bits in a row like PRESENT. It can also be described in bitslice 2D, a rectangular array like RECTANGLE [44], and even in 3D cuboid like 3D [32]. These alternative representations are detailed in the full version.

Round Function. Each round of GIFT consists of 3 steps: SubCells, PermBits, and AddRoundKey, which is conceptually similar to wrapping a gift:

  1. 1.

    Put the content into a box (SubCells);

  2. 2.

    Wrap the ribbon around the box (PermBits);

  3. 3.

    Tie a knot to secure the content (AddRoundKey).

Figure 1 illustrates 2 rounds of GIFT-64.

Fig. 1.
figure 1

2 Rounds of GIFT-64.

Table 3. Specifications of GIFT Sbox GS.
  • Initialization. The cipher receives an n-bit plaintext \(b_{n-1}b_{n-2}...b_{0}\) as the cipher state S, where \(n=64,128\) and \(b_{0}\) being the least significant bit. The cipher state can also be expressed as s many 4-bit nibbles \(S=w_{s-1}||w_{s-2}||...||w_{0}\), where \(s=16,32\). The cipher also receives a 128-bit key \(K=k_{7}||k_{6}||...||k_{0}\) as the key state, where \(k_{i}\) is a 16-bit word.

  • SubCells. Both versions of GIFT use the same invertible 4-bit Sbox, GS. The Sbox is applied to every nibble of the cipher state. \( w_i \leftarrow GS(w_i),\ \forall i \in \{0,...,s-1\}. \) The action of this Sbox in hexadecimal notation is given in Table 3.

  • PermBits. The bit permutation used in GIFT-64 and GIFT-128 are given in Tables 4 and 5 respectively. It maps bits from bit position i of the cipher state to bit position P(i). \( b_{P(i)} \leftarrow b_{i},\ \forall i \in \{0,...,n-1\}. \)

  • AddRoundKey. This step consists of adding the round key and round constants. An n / 2-bit round key RK is extracted from the key state, it is further partitioned into 2 s-bit words \(RK = U||V = u_{s-1}...u_{0}||v_{s-1}...v_{0}\), where \(s=16,32\) for GIFT-64 and GIFT-128 respectively.

    For GIFT-64, U and V are XORed to \(\{ b_{4i+1}\}\) and \(\{ b_{4i}\}\) of the cipher state respectively. \( b_{4i+1}\leftarrow b_{4i+1} \oplus u_{i},\ b_{4i} \leftarrow b_{4i} \oplus v_{i},\ \forall i \in \{0,...,15\}. \)

    For GIFT-128, U and V are XORed to \(\{ b_{4i+2}\}\) and \(\{ b_{4i+1}\}\) of the cipher state respectively. \( b_{4i+2}\leftarrow b_{4i+2} \oplus u_{i},\ b_{4i+1} \leftarrow b_{4i+1} \oplus v_{i},\ \forall i \in \{0,...,31\}. \)

    For both versions of GIFT, a single bit “1” and a 6-bit round constant \(C = c_5 c_4 c_3 c_2 c_1 c_0 \) are XORed into the cipher state at bit position \(n-1\), 23, 19, 15, 11, 7 and 3 respectively. \(b_{n-1} \leftarrow b_{n-1} \oplus 1,\ b_{23} \leftarrow b_{23} \oplus c_5,\ b_{19} \leftarrow b_{19} \oplus c_4,\ b_{15} \leftarrow b_{15} \oplus c_3,\ b_{11} \leftarrow b_{11} \oplus c_2,\ b_{7} \leftarrow b_{7} \oplus c_1,\ b_{3} \leftarrow b_{3} \oplus c_0\).

Table 4. Specifications of GIFT-64 Bit Permutation.
Table 5. Specifications of GIFT-128 Bit Permutation.

Key Schedule and Round Constants. The key schedule and round constants are the same for both versions of GIFT, the only difference is the round key extraction. A round key is first extracted from the key state before the key state update.

For GIFT-64, two 16-bit words of the key state are extracted as the round key \(RK = U||V \). \( U \leftarrow k_{1},\ V \leftarrow k_{0}. \)

For GIFT-128, four 16-bit words of the key state are extracted as the round key \(RK = U||V \). \( U \leftarrow k_{5}||k_{4},\ V \leftarrow k_{1}||k_{0}. \)

The key state is then updated as follows, \( k_{7}||k_{6}||...||k_{1}||k_{0} \leftarrow k_{1} \ggg 2 ||k_{0} \ggg 12||...||k_{3}||k_{2}, \) where \(\ggg i\) is an i bits right rotation within a 16-bit word.

The round constants are generated using the same 6-bit affine LFSR as SKINNY, whose state is denoted as \((c_5, c_4, c_3, c_2, c_1, c_0)\). Its update function is defined as: \( (c_5, c_4, c_3, c_2, c_1, c_0) \leftarrow (c_4, c_3, c_2, c_1, c_0, c_5 \oplus c_4 \oplus 1). \) The six bits are initialized to zero, and updated before being used in a given round. The values of the constants for each round are given in the table below, encoded to byte values for each round, with \(c_0\) being the least significant bit.

Rounds

Constants

1 - 16

01,03,07,0F,1F,3E,3D,3B,37,2F,1E,3C,39,33,27,0E

17 - 32

1D,3A,35,2B,16,2C,18,30,21,02,05,0B,17,2E,1C,38

33 - 48

31,23,06,0D,1B,36,2D,1A,34,29,12,24,08,11,22,04

Remark: GIFT aims at single-key security, so we do not claim any related-key security (even though no attack is known in this model as of today). In case one wants to protect against related-key attacks as well, we advice to double the number of rounds.

3 Design Rationale

First, let us propose a subclassification for SPN ciphers.

Definition 1

Substitution-bitPermutation network (SbPN) is a subclassification of Substitution-Permutation network, where the permutation layer (p-layer) only comprises of bit permutation. An m / n-SbPN cipher is an n-bit cipher in which substitution layer (s-layer) comprises of m-bit (Super-)Sboxes.

For SPN ciphers like AES and SKINNY, we can shift the XOR components from the p-layer to the s-layer to form Super-Sboxes, leaving the p-layer with only bit permutation. For example, PRESENT is a 4/64-SbPN cipher, SKINNY-64 is a 16/64-SbPN cipher, and SKINNY-128 and AES are 32/128-SbPN ciphers.

Having that said, GIFT-64 is a 4/64-SbPN cipher while GIFT-128 is (probably the first of its kind) a 4/128-SbPN cipher.

3.1 The Designing of GIFT

Before we discuss the design rationale of GIFT, we would like to share some background story about GIFT, its design approach, and its comparison with another PRESENT-like ciphers.

The Origin of GIFT . It all started with a casual remark “What if the Sboxes in PRESENT are replaced with some smaller Sboxes, say the PICCOLO Sbox? It will be extremely lightweight since the core cipher only has some Sboxes and nothing else...”. We quickly tested it but only to realise that the differential bounds became very low because the Sbox does not have differential branching number of 3. That is when we started analyzing the differential characteristics and studying the interaction between the linear layer and the Sbox. Surprisingly, we found that by carefully crafting the linear layer based on the properties of the Sbox, we were able to achieve the same differential bound as PRESENT without the constraint of differential branching number of 3. In addition, this result can also be applied to the improve linear cryptanalysis resistance which was lacking in PRESENT. Eventually, a small present—GIFT was created.

Design Approach. It is natural to ask how GIFT is different from the other lightweight primitives, especially the recent SKINNY family of block ciphers that was proposed at CRYPTO2016. One of the main difference is the design approach. SKINNY was designed with a high-security-reduce-area approach, that is to have a strong security property, then try to remove/reduce various components as much as possible. While GIFT adopts a small-area-increase-security approach, starting from a small area goal, we try to improve its security as much as possible.

Other PRESENT -like Ciphers. Besides PRESENT, one may also compare GIFT-64 with RECTANGLE since both are 4/64-SbPN ciphers and an improvement on the design of PRESENT. RECTANGLE was designed to be software friendly and to achieve a better resistance against the linear cryptanalysis as compared to PRESENT. However, although its bit permutation (ShiftRow) was designed to be software friendly, little analysis was done on the how differential and linear characteristics propagate through the cipher. Whereas for GIFT, we study the interplay of the Sbox and the bit permutation to achieve better differential and linear bounds. In addition, the ShiftRow of RECTANGLE achieves full diffusion in 4 rounds at best. Whereas GIFT-64 achieves full diffusion in 3 rounds like PRESENT, which can be proven to be the optimal for 4/64-SbPN ciphers.

3.2 Designing of GIFT Bit Permutation

To better understand the design rationale of the linear layer, we first look at the permutation layer of PRESENT to analyze the issue when the Sbox is replaced with another Sbox that does not have branching number of 3. Next, we show how we can solve this issue by carefully designing the bit permutation.

Linear Layer of PRESENT . The bit permutation of PRESENT is given in Table 6.

Table 6. Bit permutation of PRESENT.

It is known that the bit permutation can be partitioned into 4 independent bit permutations, mapping the output of 4 Sboxes to the input of 4 Sboxes in the next round.

For convenience, we number the Sboxes in i th round as \(Sb_0^{i}, Sb_1^{i},...,Sb_{s-1}^{i}\), where \(s=n/4\). These Sboxes can be grouped in 2 different ways - the Quotient and Remainder groups, Qx and Rx, defined as

  • \(Qx = \{ Sb_{4x}, Sb_{4x+1}, Sb_{4x+2}, Sb_{4x+3} \}\),

  • \(Rx = \{ Sb_{x}, Sb_{q+x}, Sb_{2q+x}, Sb_{3q+x} \}\), where \(q = \frac{s}{4}, 0 \le x \le q-1\).

In PRESENT, \(n = 64\) and output bits of \(Qx^{i} = \{ Sb_{4x}^{i}, Sb_{4x+1}^{i}, Sb_{4x+2}^{i}, Sb_{4x+3}^{i} \}\) map to input bits of \(Rx^{i+1} = \{ Sb_{x}^{i+1}, Sb_{4+x}^{i+1}, Sb_{8+x}^{i+1}, Sb_{12+x}^{i+1} \}\), this group mapping is defined in Table 7, where the entry (lm) at row rw and column cl denotes that the l th output bit of the Sbox corresponding to the row rw at i th round will map to the m th input bit of the Sbox corresponding to the column cl at \((i+1)\) th round. For example, suppose \(x = 2\), row and column start at 0, then the entry (3, 2) at row 2 and column 3 means that the 3rd output bit of \(Sb^i_{10}\) maps to 2nd input bit of \(Sb^{i+1}_{14}\), thus \(P(43) = 58\) (see Table 6).

Table 7. PRESENT group mapping from \(Qx^{i}\) to \(Rx^{i+1}\).

PRESENT bit permutation can be realised in hardware with wires only (no logic gates required). Further, full diffusion is achieved in 3 rounds; from 1 bit to 4, then 4 to 16 and then 16 to 64. But, if there exists Hamming weight 1 to Hamming weight 1 differential transition, or \(1-1\) bit differential transition, then there exists consecutive single active bit transitions.

We define \(1-1\) bit DDT as a sub-table of the DDT containing Hamming weight 1 differences. Consider some Sbox with the following \(1-1\) bit DDT (see Table 8). \(\varDelta \mathbf {x}\) and \(\varDelta \mathbf {y}\) denote the differential in the input and output of Sbox respectively. It is evident that this Sbox has differential branch number 2.

Table 8. \(1-1\) bit DDT Example 1
Table 9. \(1-1\) bit DDT Example 2

It is trivial to see that there exists a single active bit path which results in a differential characteristic with single active Sboxes in each round. Let the input differences be at 3rd bit of \(Sb_{15}^{(i)}\). According to \(1-1\) bit DDT (Table 8), there exists a transition from 1000 to 1000. From the group mapping (Table 7), 3rd output bit of \(Sb_{15}^{(i)}\) maps to 3rd input bit of \(Sb_{15}^{(i+1)}\). And then the differential continues from 3rd output bit of \(Sb_{15}^{(i+1)}\) to 3rd input bit of \(Sb_{15}^{(i+2)}\) and so on. Not only that, if there exists any \(1-1\) bit transition (not necessarily \(1000 \rightarrow 1000\)), one can verify that there always exists some differential characteristic with single active Sbox per round for at least 4 consecutive rounds.

To overcome this problem, we propose a new construction paradigm, “Bad Output must go to Good Input” or BOGI in short. We explain this in the context of the differential of an Sbox, but the analysis is same for linear case also.

Bad Output Must Go to Good Input (BOGI). The existence of the single active bit path is because the bit permutation allows \(1-1\) bit transition from some Sbox in i th round to propagate to some Sbox in \((i+1)\) th round that again would produce \(1-1\) bit transition. To overcome such problem, it must be ensured that such path does not exist. In \(1-1\) bit DDT, let us define \(\varDelta \mathbf {x} = \mathtt {x}_3 \mathtt {x}_2 \mathtt {x}_1 \mathtt {x}_0\) be a good input if the corresponding row has all zero entries, else a bad input. Similarly, we define \(\varDelta \mathbf {y} = \mathtt {y}_3 \mathtt {y}_2 \mathtt {y}_1 \mathtt {y}_0\) be a good output if the corresponding column has all zero entries, else a bad output. In Table 8, 1000 is both bad input and bad output, rest are good.

Consider another \(1-1\) bit DDT in Table 9. Let GIGOBIBO denote the set of good inputs, good outputs, bad inputs and bad outputs respectively. Then, in Table 9, \(GI = \{0100,0010\}\), \(GO = \{1000,0001\}\), \(BI = \{1000,0001\}\) and \(BO = \{0100,0010\}\). Or, if we represent these binary strings by integers considering the position of the “1” (rightmost position is 0) in these strings, we may rewrite \(GI = \{2,1\}\), \(GO = \{3,0\}\), \(BI = \{3,0\}\) and \(BO = \{2,1\}\).

An output belonging to BO (bad ouput) could potentially come from a single bit transition through some Sbox in this round. Thus we want to map this active output bit to some GI (good input) in the next round, which guaranteed that it will not propagate to another \(1-1\) bit transition. As a result, it avoids single active bit path in 2 consecutive rounds.

BOGI: Let \(|BO| \le |GI|\) and \(\pi _1 : BO \rightarrow GI\) be an injective map. To ensure that \(\pi _1\) is an injective map, it is required that \(|BO| \le |GI|\) (the cardinality of the set BO must be less than or equal to the cardinality of the set GI). Let \(\pi _2 : GO \rightarrow \pi _1(BO)^C\) (the complement of \(\pi _1(BO)\)) be another injective map. The map \(\pi _1\) ensures that “Bad Output must go to Good Input”. A combined map \(\pi : BO \cup GO \rightarrow BI \cup GI\) is defined as \(\pi (e) = \pi _1(e)\) if and only if \(e \in BO\), otherwise \(\pi (e) = \pi _2(e)\). For example, consider the Table 9. The injective maps \(\pi _1 : \{2,1\} \rightarrow \{2,1\}\) and \(\pi _2 : \{3,0\} \rightarrow \{3,0\}\) both have 2 choices which altogether make 4 choices for the combined map \(\pi \). An example BOGI mapping would be \(\pi (0) = 0, \pi (1) = 1, \pi (2) = 2, \pi (3) = 3\), which happens to be an identity mapping.

Any choice of \(\pi \) may be used to define the bit permutation. We call these \(\pi \)s differential BOGI permutations as derived from \(1-1\) bit DDT.

Remark: Similar analysis is done for linear case also. Analogous to \(1-1\) bit DDT, analysis is done on the basis of \(1-1\) bit LAT and BOGI permutations are found for linear case too. We call them linear BOGI permutations. We can now choose any common permutation from the set of both differential and linear BOGI permutations.

BOGI Bit Permutation for GIFT . Let \(\pi : \{0,1,2,3\} \rightarrow \{0,1,2,3\}\) be a common permutation from the set of both differential and linear BOGI permutations. Table 10 shows the group mapping.

Table 10. BOGI Bit Permutation mapping from \(Qx^{i}\) to \(Rx^{i+1}\).

Note that we made some left rotations to the rows of the bit mapping, this is because we need the inputs to each Sbox in \((i+1)\) th round to be coming from 4 different bit positions.

In GIFT, we chose an Sbox that has a common BOGI permutation that is an identity mapping, that is \(\pi (i) = i\). Figure 2 illustrates the group mapping from Q0 to R0 in GIFT-64. The same BOGI permutation is applied to all the q group mappings to form the final n-bit permutation for both version of GIFT.

Fig. 2.
figure 2

Group mapping from Q0 to R0 in GIFT-64.

Some Results About Our Bit Permutation. To be concise, we leave the proofs for our results in the full version. Let \(Q0,Q1,\cdots ,Q(q-1)\) be q different Quotient groups and \(R0,R1,\cdots ,R(q-1)\) be q different Remainder groups. Then, for \(0 \le x \le q-1\),

  1. 1.

    The input bits of an Sbox in Rx come from 4 distinct Sboxes in Qx.

  2. 2.

    The output bits of an Sbox in Qx go to 4 distinct Sboxes in Rx.

  3. 3.

    The input bits of 4 Sboxes from the same Qx come from 16 different Sboxes.

  4. 4.

    The output bits of 4 Sboxes from the same Rx go to 16 different Sboxes.

Lemma 1

When the number of Sboxes in a round is 16 or 32, the proposed bit permutation achieves an optimal full diffusion which is achievable by a bit permutation.

Lemma 2

In the proposed bit permutation, there does not exist any single active bit transition for two consecutive rounds in both differential and linear characteristics.

Definition 2

The differential (resp. linear) score of an Sbox is \(|GI| + |GO|\) observed from \(1-1\) bit DDT (resp. LAT).

Lemma 3

There exists differential (resp. linear) BOGI permutation for an Sbox if and only if the differential (resp. linear) score of an Sbox is at least 4.

It is essential that our Sbox has at least score 4 for both differential and linear, and has some common BOGI permutation. These are 2 of the main criteria for the selection of GIFT Sbox.

Remark: BOGI permutation is a group mapping that is independent of the number of groups. Thus, this permutation design is scalable to any bit permutation size that is multiple of 16. This allows us to potentially design larger state size like 256-bit that is useful for designing hash functions.

3.3 Selection of GIFT Sbox

We first recall some Sbox properties and introduce a metric to estimate the hardware implementation cost of Sboxes.

Properties of Sbox. For the differential property, let \(S: \mathbb {F}^4_2 \rightarrow \mathbb {F}^4_2\) denote a 4-bit Sbox. Let \(\varDelta _I,\varDelta _O \in \mathbb {F}^4_2\) be the input and output differences, \( D_S(\varDelta _I, \varDelta _O) = \sharp \{ x \in \mathbb {F}^4_2 | S(x) \oplus S(x \oplus \varDelta _I) = \varDelta _O \} \), and \( D_{max}(S) = \max _{\varDelta _I,\varDelta _O \ne 0}D_S(\varDelta _I, \varDelta _O). \)

For the linear property, let \(\alpha ,\beta \in \mathbb {F}^4_2\) be the input and output masking, \( L_S(\alpha , \beta ) = |\sharp \{ x \in \mathbb {F}^4_2 | x \bullet \alpha = S(x) \bullet \beta \} - 8| \), and \( L_{max}(S) = \max _{\alpha ,\beta \ne 0}L_S(\alpha , \beta ). \)

Definition 3

([36]). Let \(M_i\) and \(M_o\) be two invertible matrices and \(c_i,c_o \in \mathbb {F}^4_2\). The Sbox \(S'\) defined by \(S'(x) = M_o S(M_i (x \oplus c_i))\oplus c_o\) belongs to the affine equivalence (AE) set of S.

It is known that both \(D_{max}\) and \(L_{max}\) are preserved under the AE class.

Definition 4

([36]). Let \(P_i\) and \(P_o\) be two bit permutation matrices and \(c_i,c_o \in \mathbb {F}^4_2\). The Sbox \(S'\) defined by \(S'(x) = P_o S(P_i (x \oplus c_i))\oplus c_o\) belongs to the permutation-xor equivalence (PE) set of S.

One is to note that the \(1-1\) bit differential and linear transition is preserved only under the PE class. That is to say that the score of an Sbox is preserved under the PE class but not the AE class.

Heuristic Sbox Implementation. We use a simplified metric to estimate the implementation cost of Sboxes. We denote \(\{ \mathtt{NOT}, \mathtt{NAND}, \mathtt{NOR}\}\) as N-operationsFootnote 1 and \(\{\mathtt{XOR}, \mathtt{XNOR}\}\) as X-operations, and estimate the cost of an N-operation to be 1 unit and X-operations to be 2 units. We consider the following 4 types of instruction for the construction of the Sboxes: \(a \leftarrow \mathtt{NOT}(a);\ a \leftarrow a\ \mathtt{X}\ b;\ a \leftarrow a\ \mathtt{X}\ (b\ \mathtt{N}\ c);\ a \leftarrow a\ \mathtt{X}\ ((b\ \mathtt{N}\ c)\ \mathtt{N}\ d)\), where abcd are distinct bits of an Sbox input. These so-called invertible instructions [23] allow us to implement the inverse Sbox by simply reversing the sequence of the instructions. In addition, the implementation cost of the inverse Sbox would be the same as the direct Sbox since the same set of instructions is used.

Under this metric, we found that PRESENT Sbox requires \(4 \mathtt{N} + 9 \mathtt{X}\) operations, a cost of 22 units. While RECTANGLE Sbox requires \(4 \mathtt{N} + 7 \mathtt{X}\) operations, a cost of 18 units. Hence, one of the criteria for our Sbox is to have implementation cost lesser than 18 unitsFootnote 2.

Search for GIFT  Sbox. Our primary design criteria for the GIFT Sbox are:

  1. 1.

    Implementation cost of at most 17 units.

  2. 2.

    With a score of at least 4 in both differential and linear. I.e. For both differential and linear, \(|GO| + |GI| \ge 4\).

  3. 3.

    There exists a common BOGI permutation for both differential and linear.

From the list of 302 AE Sboxes presented in [14], we generate the PE Sboxes and check its implementation cost. Our heuristic search shows that there is no optimal Sboxes [30] (\(D_{max}=4\) and \(L_{max}=4\)) that satisfies all 3 criteria, hence we extended our search to non-optimal Sboxes. For Sboxes with \(D_{max}=6\) and \(L_{max}=4\), we found some Sboxes with implementation cost of 16 units. For a cost of 15 units, the best possible Sboxes (in terms of \(D_{max}\) and \(L_{max}\)) that satisfies the criteria have \(D_{max}=12\) and \(L_{max}=6\). And Sboxes with cost of at most 14 units have either \(D_{max}=16\) or \(L_{max}=8\). To maximise the resistance against differential and linear attacks while satisfying the Sbox criteria, we consider Sboxes with \(D_{max}=6\), \(L_{max}=4\) and implementation cost of 16 units.

In order to reduce the occurrence of sub-optimal differential transition, we impose two additional criteria:

  1. 4.

    \(\sharp \{(\varDelta _I, \varDelta _O) \in \mathbb {F}^4_2 \times \mathbb {F}^4_2 | D_S(\varDelta _I, \varDelta _O) > 4 \} \le 2\).

  2. 5.

    For \(D_S(\varDelta _I, \varDelta _O) > 4\), \(wt(\varDelta _I) + wt(\varDelta _O) \ge 4\), where \(wt(\cdot )\) is the Hamming weight.

Criteria (5) ensures that when sub-optimal differential transition occurs, there is a total of at least 4 active Sboxes in the previous and next round.

Finally, we pick an Sbox with a common BOGI permutation for differential and linear that is an identity, i.e. \(\pi (i) = i\).

Properties of GIFT  Sbox. Our GIFT Sbox GS can be implemented with \(4 \mathtt{N} + 6 \mathtt{X}\) operations (smaller than the Sboxes in PRESENT and RECTANGLE), has a maximum differential probability of \(2^{-1.415}\) and linear bias of \(2^{-2}\), algebraic degree 3 and no fixed point. For the sub-optimal differential transitions with probability \(2^{-1.415}\), there are only 2 such transitions and the sum of Hamming weight of input and output differentials is 4. The implementation, differential distribution table (DDT) and linear approximation table (LAT) of GS are provided in the full version.

3.4 Designing of GIFT Key Schedule

Key State Update. One of our main goals when designing the key schedule is to minimize the hardware area, and thus we chose bit permutation which is just wire shuffle and has no hardware area at all. For it to be also software friendly, we consider the entire key state rotation to be in blocks of 16-bit, and bit rotations within some 16-bit blocks. Since it is redundant to apply bit rotations within key state blocks that have not been introduced to the cipher state, we update the key state blocks only after it has been extracted as a round key.

To introduce the entire key material into the cipher state as fast as possible, the key state blocks that are extracted as the round key are chosen such that all the key material are introduced into the cipher state in the least possible number of rounds.

Adding Round Keys. To optimize the hardware performances of GIFT, we XOR the round key to only half of the cipher state. This saves a significant amount of hardware area in a round-based implementation. For it to be software friendly too, we XOR the round key at the same i-th bit positions of each nibble. This makes the bitslice implementation more efficient. In addition, since all nibbles contains some key material, the entire state will be dependent on the key after a SubCells operation.

The choice of the positions for adding the round key and 16-bit rotations were chosen to optimize the related-key differential bounds. However, we would like to reiterate that more rounds is advised to resist related-key attacks.

Round Constants. For the round constants, but instead of using a typical decimal counter, we use a 6-bit affine LFSR (like in SKINNY [5]). It requires only a single XNOR gate per update which is probably has smallest possible hardware area for a counter. Each of the 6 bits is xored to a different nibble to break the symmetry. In addition, we add a “1” at the MSB to further increase the effect.

4 Security Analysis

In this section, we provide short summary of the various cryptanalysis that we had conducted on GIFT. All details are provided in the full version.

4.1 Differential and Linear Cryptanalysis

We use Mixed Integer Linear Programming(MILP) to compute the lower bounds for the number of active Sboxes in both differential cryptanalysis [9] (DC) and linear cryptanalysis [31] (LC), the results are summaries in Table 11. The MILP solution provide us the actual differential or linear characteristics, which allow us to compute the actual differential probability and correlation contribution.

Table 11. Lower bounds for number of active Sboxes.

Recall that one of our main goals is to match the differential bounds of PRESENT, that is having an average of 2 active Sboxes per round, but with a lighter Sbox and without the constraint of differential branching number of 3. In addition, we aim for same ratio for the linear bound which was not accomplished by PRESENT. These targets were achieved at 9-round of GIFT. Hence, our DC and LC analysis and discussion focus on 9-round.

Regarding the security against DC, GIFT-64 has a 9-round differential probability of \(2^{-44.415}\), taking the average per round and propagate forward, we expect that the differential probability will be lower than \(2^{-63}\) after 14 rounds. Therefore, we believe 28-round GIFT-64 is enough to resist against DC. For GIFT-128, it has a 9-round differential probability of \(2^{-46.99}\), which suggested that 26-round is sufficient to achieve a differential probability lower than \(2^{-127}\). Therefore, we believe 40-round GIFT-128 is enough to resist against DC.

Regarding LC, GIFT-64 has a 9-round linear hull effect of \(2^{-49.997}\), which expected to require 13-round to achieve correlation potential lower than \(2^{-64}\). Therefore, we believe 28-round GIFT-64 is enough to resist against LC. For GIFT-128, it has a 9-round differential probability of \(2^{-45.99}\), which means that we would need around 27 rounds to achieve a differential probability lower than \(2^{-128}\). Therefore, we believe 40-round GIFT-128 is enough to resist against LC.

Related-Key Differential Cryptanalysis. For GIFT-64, since it takes 4 rounds for the all the key material to be introduced into the cipher state, it is trivial to see that it is possible to have no active Sboxes from 1-round to 4-round. Thus we start our computation on the related-key differential bounds from 5-round onwards. From 5-round to 12-round, the probability of these differential characteristics are \(2^{-1.415},2^{-5},2^{-6.415},2^{-10},2^{-16},2^{-22},2^{-27},2^{-33}\) respectively. Even if we suppose that the probability of 12-round characteristic is lower bounded by \(2^{-33}\), it is doubtful that 28 rounds are secure against related-key differential cryptanalysis. Therefore, as we describe in Sect. 2, we strongly recommend to increase the number of rounds to achieve the security against the related-key attacks.

For GIFT-128, we start our computation from 3-round onwards. From 3-round to 9-round, the probabilities are \(2^{-1.415},2^{-5},2^{-7},2^{-11},2^{-20},2^{-25},2^{-31}\) respectively. Similar to GIFT-64, it is doubtful that 40 rounds are secure against related-key differential cryptanalysis.

4.2 Integral Attacks

We discuss the security against integral attacks [26]. Here the integral distinguisher is found by using the (bit-based) division property [40, 42] and the key recovery is executed by using the partial-sum technique [19]. As a result, the number of rounds that we can find integral distinguishers is 9 rounds for GIFT-64, and the following is an example.

$$\begin{aligned} (A^{60},ACAA) \xrightarrow {9R} ((UUBB)^{16}) \end{aligned}$$

Here, only 2nd bit in plaintext is constant, and bits \(\{ b_{4i}\}\) and \(\{ b_{4i+1}\}\) in 9-round ciphertexts are balanced. Note that there is no whitening key at the beginning. Therefore, we can trivially extend integral distinguishers by one round, and GIFT-64 has 10-round integral distinguishers, respectively. We can append four rounds to the 10-round integral distinguisher as the key recovery and attack 14-round GIFT-64. The attack complexity is about \(2^{97}\) with \(2^{63}\) chosen plaintexts.

We also evaluated the longest integral distinguisher for GIFT-128 by using the (bit-based) division property. As a result, we can find 11-round integral distinguisher. The number of rounds is improved by two rounds than that for GIFT-128. However, the number of bits in round key that is XORed every round increases from 32 bits to 64 bits. Therefore, we expect that GIFT-128 is also secure against integral attacks.

4.3 Impossible Differential Attacks

Impossible differential attacks [8, 25] exploits a pair of difference \(\varDelta _1\) and \(\varDelta _2\) in which \(\varDelta _1\) never reaches \(\varDelta _2\) after some rounds.

We searched for impossible differentials by using the MILP-based tool [38]. The results show that there does not exist any impossible differentials with 1-active nibble against 7 rounds of GIFT-64. Thus full rounds are sufficient to resist the impossible differential attack.

4.4 Meet-in-the-Middle Attacks

The meet-in-the-middle (MITM) attack discussed here is a rather classical one, which separates the encryption algorithm into two independent functions [13, 16].

GIFT-64-128 XORs only 32 bits out of 128 bits of the key to the state in every round. Given this property, along with splice-and-cut [1] and initial-structure (IS) [37] techniques, we choose that 8 bits of \((k_6,k_7)\) and 8 bits of \(k_2,k_3\) as sources of independent computations called neutral bits and separate 15 rounds as shown in Fig. 3. Note that when the backward computation reaches the plaintext, the attacker makes a query to obtain the corresponding ciphertext. Every details of the attack procedure will be explained in the full version.

Fig. 3.
figure 3

Chunk separation for 15-round MitM attack.

For each of \(2^{112}\) non-neutral bits, the attacker computes the forward and backward chunks for \(2^{8}\) choices of neutral bits. Therefore, the time complexity is \(2^{120}\) and the memory complexity is \(2^{8}\). This requires the knowledge of the full codebook, thus the data complexity is \(2^{64}\).

4.5 Invariant Subspace Attacks

Since the round constant is XORed only in the MSB of several S-boxes, invariant subspace attacks [20, 28, 29] can be a potential threat.

We exhaustively searched for the subspace transition through the GIFT S-box and confirmed that XORing the constant to MSB breaks the invariant subspace, thus GIFT resists the attack. The details are provided in the full version.

4.6 Nonlinear Invariant Attacks

Nonlinear invariant attacks [41] are weak-key attacks that can be applied when the round constant is XORed only to some particular bits of nibbles. The core idea is to find a nonlinear approximation of the round transformation with probability one. For the SPN structure, the attacks are mounted when (1) S-box has the quadratic nonlinear invariant and (2) the linear layer is represented by the multiplication with an orthogonal binary matrix.

The diffusion of GIFT (bit permutation) is orthogonal. However, it is not represented by the multiplication with an orthogonal binary matrix. Moreover, we searched for the quadratic nonlinear invariant for GIFT S-box, but there is no such invariant. Therefore, GIFT is secure against the nonlinear invariant attacks.

4.7 Algebraic Attacks

Algebraic attacks do not threaten GIFT, the analysis is provided in the full version.

5 Hardware Implementation

GIFT is surprisingly efficient and on ASIC platforms across various degrees of serialization. This is mainly due to the extremely lightweight round function that performs key addition on only half of the state and uses a bit permutation as the only diffusion mechanism. Due to page constraints, we leave the details in the full version of our paper and present the summary here.

5.1 Round Based Implementation

GIFT includes various design strategies in order to minimize gate count. GIFT employs key addition to only half of the state and so saves silicon area in the process. SKINNY uses the same mechanism, but it additionally uses an equal amount of XOR gates to add the tweak to the state, and so the number of XOR gates required to construct the roundkey addition layer is equal to that of any cipher employing full state addition.

In Table 12, we compare the hardware performances of GIFT  with other lightweight ciphers. In Fig. 4 we list the individual area requirements of the respective components in GIFT.

Table 12. Comparison of performance metrics for round based implementations synthesized with STM 90 nm Standard cell library
Fig. 4.
figure 4

Componentwise area requirements for GIFT-64-128 and GIFT-128-128

Fig. 5.
figure 5

Serial Implementation for GIFT-64-128 (The boxes in green denote scan flip-flops/registers)

We see that GIFT has the smallest area compared to the other ciphers. From the pie chart, we see that the storage area (which is a fixed cost) took up most of the area percentage, the cipher component (which is the variable) only make up a small percentage to the overall area.

5.2 Serial Implementation

The serial implementation of GIFT-64-128 uses a mixed datapath of size 4 bits on the stateside and 16 bits on the keyside. The architecture has been explained in Fig. 5.

GIFT-128-128 uses a similar architecture: a mixture of 4 bit datapath in the stateside and a 32 bit datapath on the keyside is employed. We also implemented bit serial versions of GIFT as per the techniques outlined in [24]. In Table 13, we list the performance comparisons of GIFT with other block ciphers. While the bit serial implementation of Simon is probably the most compact due to the nature of the design, but the performance of GIFT is comparable/better with other ciphers with similar level of serialization.

Table 13. Comparison of performance metrics for serial implementations synthesized with STM 90 nm Standard cell library

6 Software Implementation

In this section, we describe our software implementation of GIFT-64 and GIFT-128. Due to its inherent bitslice structure, it seems natural to consider that the most efficient software implementations of GIFTwill be bitslice implementations.

We leave the details of the packing/unpacking of the data and round function implementation in the full version.

Benchmarks. We have produced this bitslice implementation for AVX2 registers and we give in Table 14 the benchmarking results on a computer with a Intel Haswell processor (i5-4460U). We have benchmarked the bitslice implementations of SIMON and SKINNY (available online) on the same computer for fairness.

Table 14. Bitslice software implementations of GIFT and other lightweight block ciphers. Performances are given in cycles per byte, with messages composed of 2000 64-bit blocks to obtain the results.

Comments. Bitslice implementations can be used for any parallel mode (as it is the case for most modern operating modes), but can also be used for serial modes when several users are communicating in parallel. In this setting, the implementation would be exactly the same, as our key preparation does not assume that the keys have to be the same for all blocks. In the scenario of a serial mode for a single user, then a classical table-based or VPERM implementation will probably be the most efficient option [6].

For low-end micro-controllers, it is very likely that GIFT will perform very well on this platform. RECTANGLE is very good on micro-controllers and GIFT shares the same general strategy on this regard. The key schedule being even simpler, we believe that it will actually perform even better than RECTANGLE.