1 Introduction

Message authentication codes (or MACs) ensure the authenticity of messages in the secret-key setting. They are a core element of real-world security protocols such as TLS, SSH, or IPSEC. A MAC takes a message (and optionally a nonce) and a secret key to generate a tag that is sent with the message. Traditionally, they are classified into three types: deterministic, nonce-based, and probabilistic.

Deterministic MAC designs are the most popular, with widely used constructions based on block-cipher (CBC-MAC [4, 13], OMAC [18], PMAC [5], LightMAC [29], ...) and hash functions (HMAC [2], NMAC [2], NI-MAC [1], ...). However, there is a generic forgery attack against all deterministic iterated MACs, using collisions in the internal state, due to Preneel and van Oorschot [37]. Therefore, these MACs only achieve security up to the birthday bound, i.e. when the number of queries by the adversary is bounded by \(2^{n/2}\), with n the state size. This is equivalently called \(n{\text {/}}2\)-bit security.

One way to increase the security is to use a nonce, a unique value provided by the user (in practice, the nonce is usually a counter). This approach has been pioneered by Wegman and Carter [41] based on an earlier work by Gilbert et al. [15]. Later a few follow ups like EDM and EWCDM [7], and Dual EDM [30] have been proposed to achieve beyond birthday security.

Alternatively, a probabilistic MAC uses a random coin for the extra value, which is usually called a salt, and must be transmitted with the MAC. Probabilistic MACs have the advantage that they can stay secure when called with the same input twice, and don’t require a state to keep the nonce unique. Some popular probabilistic MAC constructions are XMACR [3], RMAC [22] and EHtM [31]. In particular, RMAC and EHtM have security beyond the birthday bound.

However, deterministic MACs are easier to use in practice, and there has been an important research effort to build deterministic MAC with security beyond the birthday bound, using an internal state larger than the primitive size. In particular, several constructions use a 2n-bit internal state so that collisions in the state are only expected after \(2^n\) queries. Yasuda first proposed SUM-ECBC [42], a beyond birthday bound (BBB) secure deterministic MAC that achieves \(2n{\text {/}}3\)-bit security. However, this construction has rate \(1{\text {/}}2\) and later Yasuda himself proposed one of the most popular BBB secure MAC PMAC+ [43] achieving rate 1. Later several other constructions like 3kf9 [44], LightMAC+ [33], GCM-SIV2 [20], and single key PMAC+ [9] have been proposed. Interestingly, all the above designs share a common structure: a double-block universal hash function outputs a 2n-bit hash value (seen as two n-bit halves), and a finalization function generates the tag by XORing encrypted values of the two n-bit hash values. This structure has been called double-block-hash-then-sum, and it will be the focus of our paper.

More recently, variants of PMAC+ based on tweakable block-cipher have also been proposed, such as PMAC_TBC [32], PMACx [27], ZMAC [21], and ZMAC+ [28].

Our results. We focus on the security of deterministic block-cipher based MACs with security beyond the birthday bound and double-block hash construction. Several previous works have been focused on security proofs, showing that they are secure up to \(2^{2n/3}\) queries [9, 20, 33, 42,43,44]. For most of these constructions, the advantage of an adversary making q short queries is bounded by \(\mathcal {O}(q^3 / 2^{2n})\). Recently, Naito [34] gave an improved security proof for LighMAC+, with advantage at most \(\mathcal {O}(q_t^2 q_v / 2^{2n})\), with \(q_t\) MAC queries and \(q_v\) verification queries. In particular, this would prove security up to \(2^n\) when the adversary can only do a single verification query.

In this work, we take the opposite approach and look for generic attacks against these modes. We use a cryptanalysis technique that can be seen as a generalisation of the collision attack of Preneel and van Oorschot [37]. Instead of looking for a pair of messages so that the full state collides, we look for a quadruple of messages, which can be seen either as two pairs colliding on the first half of the state, or two pairs colliding on the second half. Since the finalization function combines the halves with a sum, we can detect such a quadruple because the corresponding MACs sum to zero, and can usually amplify this filtering. Moreover, when the message are well constructed, the relations defining the four collisions create a linear system of rank only three, so that we expect one good quadruple out of \(2^{3n}\). Therefore, we only need four lists of \(2^{3n/4}\) queries, and we expect one good quadruple out of the \(2^{3n}\) choices in the four lists.

Table 1. Summary of the security for studied modes and our main results. q is the number of queries, \(\ell \) is maximum size of a query, \(\sigma \) is total number of processed blocks. The expected lower bound and attack complexity is in number of constant length queries (\(\ell = \mathcal {O}(1)\)). We use “U” for universal forgeries, and “E” for existential forgeries.

Table 1 shows a summary of our main results and how they compare with their respective provable security claims. In particular, we have forgeries attacks with \(\mathcal {O}(2^{3n/4})\) MAC queries against SUM-ECBC, GCM-SIV2, PMAC+, LightMAC+, 1kPMAC+, and 3kf9. As far as we know, these are the first attacks with less than \(2^n\) queries against these constructions. Our attack against LighMAC+ contradicts the recent security bound for LighMAC+ [34], because we have an attack with \(\mathcal {O}(2^{3n/4})\) MAC queries, and a single verification query. The other attacks do not contradict the security proofs, but they make an important step towards understanding the actually security of these modes: we now have a lower bound of \(2^{2n/3}\) queries from the proofs, and an upper bound of \(2^{3n/4}\) from our attacks.

The attacks have a complexity of \(2^{3n/4}\) in the information theoretic model (the model used for most MAC security proofs), but we note that an attacker needs more than \(2^n\) operations to create a forgery. However, we have found a variant of our attack against SUM-ECBC and GCM-SIV2 with total complexity below \(2^n\), using \(\mathcal {O}(2^{6n/7})\) queries and \(\tilde{\mathcal {O}}(2^{6n/7})\) operations.

We have also found an attack with only \(\mathcal {O}(2^{n/2})\) queries and \(\tilde{\mathcal {O}}(2^{n/2})\) operations against 1kf9 [8], a single key variant of 3kf9 with claimed security up to \(2^{2n/3}\) queries. 1kf9 has been withdrawn due to issues with its security proof, but no attack was known previously.

Related works. There has been extensive work on security proofs for modes of operations, with a recent focus on security beyond the birthday bound. An interesting example is the encryption mode CENC by Iwata [17]: the initial proof was only up to \(2^{2n/3}\) queries, but a later proof showed that it actually remains secure close to \(2^n\) queries [19]. Our results show that in the case of double-block-hash-then-sum MACs, the security is lower than n-bit security.

Similarly, the initial proof of the randomized MAC EHtM only gave security up to \(2^{2n/3}\), but a later proof showed security up to \(2^{3n/4}\) [11]. This result also includes a matching attack, using a technique similar to ours based on looking for quadruples. However in the case of EHtM the attacker can observe part of the state, which allows him to find a right quadruple in \(\mathcal {O}(2^{3n/4})\) time and memory. In our case we can’t observe the internal state at all, thus we need to use different tricks tailored to each construction in order to amplify the filtering and avoid the many false-positives. In particular, this significantly increases the time and memory complexity.

There has also been intensive work on generic attacks to complement the security proof results. After the generic collision attack of Preneel and van Oorschot [37], more advanced attacks against MACs have been described, with stronger outcomes than existential forgeries, starting with a key-recovery attack against the envelop MAC by the same authors [38]. In particular, a series of attacks against hash-based MACs [10, 16, 26, 36] led to universal forgery attacks against long challenges, and key-recovery attacks when the hash function has an internal checksum (like the GOST family). Against PMAC, Lee et al. showed a universal forgery attack in 2006 [25]. Later, Fuhr, Leurent and Suder gave a key-recovery attack against the PMAC variant used in AEZv3 [14]. Issues with GCM authentication with truncated tags were also pointed out by Ferguson [12]. These attacks don’t contradict the security proofs of the schemes, but they are important results to understand the security degradation after the birthday bound.

Organization of the paper. We first explain our attack technique using quadruples of messages in Sect. 2, and give three concrete attacks using this technique: an attack against SUM-ECBC and GCM-SIV2 in Sect. 3, an attack against PMAC+ and related constructions in Sect. 4, and an attack against 3kf9 in Sect. 5. Finally, we show a variant of the technique using special properties of the single-key constructions of [8, 9] in Sect. 6.

Notations. We denote the concatenation of messages blocks x and y as \(x\Vert y\). When x and y fit together in one block, we use x|y to denote their concatenation. We use L[i] to denote element i of list L, \(x_{[i]}\) to denote bit i of x, and \(x_{[i:j]}\) to denote bits i to \(j-1\). Finally, we use a curly brace for systems of equations.

2 Generic Attack Against Double-Block-Hash MACs

We first explain our attacks in a generic way, and leave the specific details to later sections focused on concrete MAC constructions.

We consider MACs where the 2n-bit internal state is divided in two n-bit parts, that we denote \(\varSigma \) and \(\varTheta \), and the final MAC is computed as:

where E and \(E'\) denote the block cipher with potentially different keys. The functions \(\varSigma \) and \(\varTheta \) can be seen as two n-bit universal hash functions computed on the message, hence the name double-block-hash-then-sum MAC.

Our attacks exploit the fact that the two halves are combined with a sum, where one side depends only on \(\varSigma \), and the other side depends only on \(\varTheta \). They do not seem applicable to constructions with more intricate finalization functions, such as LightMAC+2 [33], or the tweakable block-cipher based constructions PMAC_TBC [32], PMACx [27], ZMAC [21], or ZMAC+ [28].

2.1 Using Quadruples

Our strategy consists in looking for a quadruple of messages (XYZT) such that pairs of values collide for one half of the state. More precisely, we look for quadruples satisfying a relation \(\mathcal {R}(X,Y,Z,T)\) defined as:

$$ \mathcal {R}(X,Y,Z,T) := {\left\{ \begin{array}{ll} \varSigma (X) = \varSigma (Y) \\ \varTheta (Y) = \varTheta (Z) \\ \varSigma (Z) = \varSigma (T) \\ \varTheta (T) = \varTheta (X) \end{array}\right. } $$

In particular, since the MAC is computed as , it follows that:

(1)

In addition, if the messages XYZT are well constructed, the relation \(\mathcal {R}\) reduces to a linear system of rank only three, i.e.

$$\big [\varSigma (X) = \varSigma (Y) \text { and } \varTheta (Y) = \varTheta (Z) \text { and } \varSigma (Z) = \varSigma (T)\big ] \implies \varTheta (T) = \varTheta (X).$$

Therefore, we expect to find one quadruple satisfying the relation out of \(2^{3n}\), and we can construct \(2^{3n}\) quadruples with just \(4 \times 2^{3n/4}\) queries. This gives an attack with data complexity \(\mathcal {O}(2^{3n/4})\).

In practice, we consider lists of \(2^{3n/4}\) messages, generated with two message injection functions \(\phi \) and \(\psi \). These functions are different in every attack, but they mostly correspond to adding two distinct prefixes, as in the following example:

$$\begin{aligned} \phi (i)&= 0 \Vert i&\psi (i)&= 1 \Vert i \\ X = \phi (x)&= 0 \Vert x&Y = \psi (y)&= 1 \Vert y \\ Z = \phi (z)&= 0 \Vert z&T = \psi (t)&= 1 \Vert t, \end{aligned}$$

In particular, the pairs (XY), (YZ), (ZT) and (TX) that we consider always contain a message built with \(\phi \) and message built with \(\psi \). Therefore, we will have the required collisions in \(\varSigma \) or \(\varTheta \) if the difference introduced in the half-state by the second block cancels the difference found after processing the first block.

This type of attack has some similarities with a higher order differential attack. Indeed, in the easiest case (e. g. our attack against SUM-ECBC), the relation \(\mathcal {R}\) can be written as \(\mathcal {R}(x,y,z,t) \iff \big [ x \oplus y = z \oplus t = \varDelta _1 \text { and } x \oplus t = y \oplus z = \varDelta _3 \big ]\) for some secret values \(\varDelta _1\) and \(\varDelta _3\). This idea of looking for quadruples is also very similar to the attack on EHtM [11], but the full attack will turn out quite different. Indeed, in the case of EHtM, the attacker can observe the salt R which represent half of the 2n-bit internal state. Here this would be the equivalent of observing \(\varSigma (m)\) for all processed messages m. This is clearly not possible for the studied constructions and we need something more to discriminate and find a good quadruple that satisfies \(\mathcal {R}\).

2.2 Detecting Quadruples: Generalized Birthday Algorithms

To finish the attack we usually need to locate one good quadruple. The relation in itself is too weak because we expect one quadruple out of \(2^n\) to satisfy it randomly, but we can usually amplify the filtering using related quadruples that satisfy \(\mathcal {R}\) simultaneously (the exact details depend on the MAC construction).

In most of our attacks, we can express the search for a quadruple as an instance of the 4-sum problem, and solve it using variants of Wagner’s generalized birthday algorithm [40]. This reduces the time complexity of the attacks (compared to a naive search), and provides trade-offs between the query, memory and time complexities.

More precisely, our problem can be stated as follow:

Definition 1

(4-sum problem). Given four lists \(L_1, L_2, L_3, L_4\) of \(2^s\) elements, with on average \(2^p\) quadruples \((x,y,z,t) \in L_1 \times L_2 \times L_3 \times L_4\) such that \(x \oplus y \oplus z \oplus t = 0\), find one of them.

Note that if the lists contain random n-bit words, we expect to have \(2^p = 2^{4s-n}\) solutions, but in some of our instances there are more solutions because of the structure of the lists.

We denote the join operator as \(\bowtie \); it computes the pairwise sum of two lists, and keeps the initial values attached to the sum. In addition, the join operator with filtering \(\bowtie _t^\alpha \) only keeps values such that the t least significant bits of the sum agree with the value \(\alpha \):

In particular, we have \(\mathord {\bowtie } = \mathord {\bowtie _0^0}\). We also denote as \(\bowtie _\infty \) the joint operator with filtering over the full input values. The filtered joint operator is the basis of Wagner’s algorithm, and it can be computed in almost linear time by sorting the two input lists, and stepping through them simultaneously.

Direct algorithm. While a naive algorithm for our 4-sum instances would take time \(2^{4s}\) to examine all quadruples, there is a simple improvement with time and memory \(\tilde{\mathcal {O}}(2^{2s})\). First, the attacker builds \(L_{12} = L_1 \bowtie L_2\) and \(L_{34} = L_3 \bowtie L_4\). Then, he looks for a collision between the first component of \(L_{12}\) and \(L_{34}\). A collision directly yields a solution. This always finds a solution if it exists in \(\tilde{\mathcal {O}}(2^{2s})\) operations but it also takes \(\mathcal {O}(2^{2s})\) memory.

Fig. 1.
figure 1

Generalized Birthday algorithm to find good quadruples.

Memory efficient algorithm. We can reduce the memory complexity of the algorithm if we avoid constructing the full lists \(L_{12}\) and \(L_{34}\). An algorithm with low memory complexity was first described by Chose et al. [6], but we use the description given by Wagner in the full version of [40].

Instead of building the full lists \(L_{12}\) and \(L_{34}\), we filter values such that s least significant bits differ by some fixed value \(\alpha \). This reduces the expected size of the lists to only \(2^s\): \(E[|L^\alpha _{34}|] = E[|L^\alpha _{12}|] = |L_1| \cdot |L_2| / 2^s = 2^s\). If this algorithm is repeated for every s-bit value \(\alpha \), it will eventually find all solutions.

Actually, one run of the algorithm detects the solutions whose least significant bits of \(x \oplus y\) are equal to \(\alpha \). If there are \(2^p\) solutions in total, there is one such solution with probability \(2^{p-s}\), and this algorithm will find the first solution after trying \(2^{s-p}\) values of \(\alpha \) on average. Therefore, the expected time complexity of the algorithm given by Fig. 1 is only \(\tilde{\mathcal {O}}(2^{2s-p})\).

Related work. In a 2016 work, Nikolic and Sasaki [35] investigate the 4-sum where we need to find 4 different inputs xyzt to a function f such that \(f(x) \oplus f(y) \oplus f(z) \oplus f(t) = 0\). They also mention that their algorithm is adaptable to pairwise identical functions, i. e. \(f(x) \oplus g(y) \oplus f(z) \oplus g(t) = 0\).

Most of our attacks can be written in this way; concretely, they are equivalent to instances of random functions with 3n-bit outputs. In this setting our algorithm takes time \(\tilde{\mathcal {O}}(2^{3n/2})\) and memory \(\mathcal {O}(2^{3n/4})\), while Nikolic and Sasaki’s work can reach \(\tilde{\mathcal {O}}(2^{9n/8})\) time and \(\mathcal {O}(2^{3n/4})\) memory. Unfortunately, their algorithm requires \(\tilde{\mathcal {O}}(2^{9n/8})\) queries to the functions; this would translate to \(\tilde{\mathcal {O}}(2^{9n/8})\) queries to the MAC, which is not interesting in our context.

3 Attacking SUM-ECBC-like constructions

We start with attacks against SUM-ECBC [42] and GCM-SIV2 [20]; while the constructions are quite different, they have a similar structure and the same attacks can be used in both cases. We give a universal forgery attack with \(\mathcal {O}(2^{3n/4})\) queries and \(\tilde{\mathcal {O}}(2^{3n/2})\) operations (using memory \(\mathcal {O}(2^{3n/4})\)), and a variant with total complexity below \(2^n\), with \(\mathcal {O}(2^{6n/7})\) queries and \(\tilde{\mathcal {O}}(2^{6n/7})\) operations.

Fig. 2.
figure 2

Diagram for SUM-ECBC with a \(\ell -\)block message.

3.1 Attacking SUM-ECBC

SUM-ECBC was designed by Yasuda in 2010 [42], inspired by MAC constructions summing two CBC-MACs in the ISO 9797-1 standard. The scheme uses a block cipher keyed with four independent keys, denoted as \(E_1\), \(E_2\), \(E_3\), \(E_4\). The message M is first padded with \(10^{*}\) padding, and divided into n-bit blocks. In the following we ignore the padding and consider the padded message as the input: this makes our description easier, and any padded message whose last block is non-zero can be “un-padded” to generate a valid input message. The construction is defined as follows (see also Fig. 2):

Attack. Following the framework of Sect. 2, we consider quadruple of messages, built with two message injection functions:

$$\begin{aligned} \phi (i)&= 0 \Vert i&\psi (i)&= 1 \Vert i \end{aligned}$$

In particular, we have

Next, we build quadruples of messages XYZT with

$$\begin{aligned} X&= \phi (x)&Y&= \psi (y)&Z&= \phi (z)&T&= \psi (t), \end{aligned}$$

and we look for a quadruple with partial state collisions for the underlying pairs, i. e. a quadruple following the relation:

$$\begin{aligned} \mathcal {R}(x,y,z,t)&:= {\left\{ \begin{array}{ll} \varSigma _{0}(x) = \varSigma _{1}(y) \\ \varSigma _{0}(z) = \varSigma _{1}(t) \\ \varTheta _{0}(z) = \varTheta _{1}(y) \\ \varTheta _{0}(x) = \varTheta _{1}(t). \end{array}\right. } \end{aligned}$$

We have

$$\begin{aligned} \mathcal {R}(x,y,z,t)&\Leftrightarrow {\left\{ \begin{array}{ll} x \oplus E_1(0) = y \oplus E_1(1) \\ z \oplus E_3(0) = y \oplus E_3(1) \\ z \oplus E_1(0) = t \oplus E_1(1) \\ x \oplus E_3(0) = t \oplus E_3(1) \end{array}\right. } \Leftrightarrow {\left\{ \begin{array}{ll} x \oplus y \oplus z \oplus t = 0 \\ x \oplus y = E_1(0) \oplus E_1(1) \\ x \oplus t = E_3(0) \oplus E_3(1) \end{array}\right. } \end{aligned}$$

As promised in Sect. 2, \(\mathcal {R}\) defines a \(3n-\)bit relation. We can easily observe when \(x \oplus y \oplus z \oplus t = 0\), and we can also detect the relation on the sum of the MACs following Eq. (1):

Moreover, we observe that \(\mathcal {R}(x,y,z,t)\) is satisfied if and only if \(\mathcal {R}(x \oplus c,y \oplus c,z \oplus c,t \oplus c)\) is satisfied for any constant c. We use this relation to build several quadruples that satisfy \(\mathcal {R}\) simultaneously:

$$\begin{aligned} \mathcal {R}(x,y,z,t) \iff \forall c,\, \mathcal {R}(x \oplus c,y \oplus c,z \oplus c,t \oplus c) \end{aligned}$$
(2)

This leads to an attack with \(\mathcal {O}(2^{3n/4})\) queries: we consider four sets \(\mathcal {X}, \mathcal {Y}, \mathcal {Z}, \mathcal {T}\) of \(2^{3n/4}\) values, and we look for a quadruple \((x, y, z, t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\) with:

(3)

Because we need a fair distribution of values \(x \oplus y\) and \(x \oplus t\) to find the good quadruple we build the sets as:

With this construction, there is exactly one quadruple \((x,y,z,t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\) that respects \(\mathcal {R}\), given by:

$$\begin{aligned} x&= v_1 | w_2 | u_3 | 0&y&= w_1 | v_2 | 0 | u_4&z&= u_1 | 0 | v_3 | w_4&t&= 0 | u_2 | w_3 | v_4, \end{aligned}$$

where:

$$\begin{aligned} E_1(0) \oplus E_1(1)&=: u_1 | u_2 | u_3 | u_4&\\ E_3(0) \oplus E_3(1)&=: v_1 | v_2 | v_3 | v_4&\\ E_1(0) \oplus E_1(1) \oplus E_3(0) \oplus E_3(1)&=: w_1 | w_2 | w_3 | w_4. \end{aligned}$$

We expect on average one random quadruple satisfying (3) (with \(2^{3n}\) potential quadruples, and a 3n-bit filtering), in addition to the quadruple satisfying \(\mathcal {R}\). The correct quadruple can easily be checked with a few extra queries.

In practice, we use the generalized birthday algorithms of Sect. 2.2 in order to optimize the complexity of the attack. We consider four lists:

Notice that we can build those lists with \(5 \cdot 2^{3n/4}\) queries as, by construction, for any element i of \(\mathcal {Y}, \mathcal {Z}, \mathcal {T}\) the element \((i \oplus 1)\) also belongs to \(\mathcal {Y}, \mathcal {Z}, \mathcal {T}\), respectively. We use the algorithm of Sect. 2.2 to find \((x, y, z, t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\) such that \(L_1[x] \oplus L_2[y] \oplus L_3[z] \oplus L_4[t] = 0\) with \(\tilde{\mathcal {O}}(2^{3n/2})\) operations, using a memory of size \(\mathcal {O}(2^{3n/4})\). After finding a collision, we verify that it is not a false positive by testing the relation for another value c. As there are on average \(\mathcal {O}(1)\) random quadruples the attack is indeed using a total of \(5 \cdot 2^{3n/4} + \mathcal {O}(1) = \mathcal {O}(2^{3n/4})\) queries.

Universal Forgeries. This attack can be extended to a universal forgery. Indeed, the fixed prefix 0 and 1 can be replaced by v and \(v \oplus 1\) for any block v, and when we identify a right quadruple (xyzt) we deduce the value \(\varDelta _1 = E_1(v) \oplus E_1(v \oplus 1)\) and \(\varDelta _3 = E_3(v) \oplus E_3(v \oplus 1)\). There is also a length extension property: if (xyzt) is a right quadruple, then for any suffix s.

Therefore if we want to forge a MAC for any message m of size \(\ell \ge 2\) blocks we parse it as \(m = v \Vert w \Vert s\) (where s has zero, one, or several blocks) and perform the attack to recover \(\varDelta _1\) and \(\varDelta _3\). Then we can forge using the previous relation, and Eq. (2):

Optimizing the time complexity. Equation (2) can also be used to reduce the time complexity below \(2^n\), at the cost of more oracle queries. Indeed, if we consider a subset \(\mathcal {C}\) of \(\{0,1\}^n\), we have:

(4)

If we select \(\mathcal {C}\) as a linear subspace, then the last expression does not depend on the full (xyzt), but only on their projection on the orthogonal of \(\mathcal {C}\). Concretely, we use , so that the value is independent of bits 0 to \(3n/7-1\) of x.

Therefore, we consider the rewritten MAC function

the following message injections, with a \(4n{\text {/}}7\)-bit input

$$\begin{aligned} \phi '(i)&= 0 \Vert i|0&\psi '(i)&= 1 \Vert i|0, \end{aligned}$$

and a reduced relation over \(4n{\text {/}}7\)-bit values:

$$\begin{aligned} \mathcal {R}'(x,y,z,t)&:= {\left\{ \begin{array}{ll} x \oplus y = (E_1(0) \oplus E_1(1))_{[3n/7:n]} \\ y \oplus z = (E_3(0) \oplus E_3(1))_{[3n/7:n]} \\ z \oplus t = (E_1(0) \oplus E_1(1))_{[3n/7:n]} \\ t \oplus x = (E_3(0) \oplus E_3(1))_{[3n/7:n]} \end{array}\right. }\\&\Leftrightarrow {\left\{ \begin{array}{ll} x \oplus y \oplus z \oplus t = 0 \\ x \oplus y = (E_1(0) \oplus E_1(1))_{[3n/7:n]} \\ x \oplus t = (E_3(0) \oplus E_3(1))_{[3n/7:n]} \end{array}\right. } \end{aligned}$$

Thanks to Eq. 4, we still have:

Since the relation \(\mathcal {R}'\) is now only a \(12n{\text {/}}7\)-bit condition, we can use shorter lists than before, with just \(2^{3n/7}\) elements. We can also increase the filtering using the same trick as previously, considering the following lists:

Finally, using the algorithm of Sect. 2.2 with \(s = 3n/7\) and \(p = 0\), we can locate a right quadruple using \(\tilde{\mathcal {O}}(2^{6n/7})\) queries, \(\tilde{\mathcal {O}}(2^{6n/7})\) operations, and \(\mathcal {O}(2^{3n/7})\) memory. This recovers only \(4n{\text {/}}7\) bits of \(E_1(0) \oplus E_1(1)\) and \(E_3(0) \oplus E_3(1)\), but we can easily recover the remaining bits, either by brute force, or by repeating the attack with a different set \(\mathcal {C}\).

3.2 Attacking GCM-SIV2

GCM-SIV2 is an authenticated encryption mode designed by Iwata and Minematsu [20] as a double-block-hash version of GCM-SIV (in the following, we consider GCM-SIV2 with GHASH as the underlying universal hash function). For simplicity, we focus on the authentication part of GCM-SIV2, using inputs with a non-empty associated data, and an empty message. In this case, GCM-SIV2 becomes a nonce-based MAC. The message M (considered as associated data for the mode) is zero-padded, divided into n-bit blocks, and the length is appended in an extra block. Then the construction is defined as follows, with \(\odot \) a finite field multiplication (see also Fig. 3):

Fig. 3.
figure 3

Diagram for authentication in GCM-SIV2 using GHASH with a \(\ell \)-block message, a nonce N, hash keys \(H_1\) and \(H_2\).

Attack. The structure of the authentication part of GCM-SIV2 is essentially the same as the structure of SUM-ECBC, where the block cipher calls \(E_1\) and \(E_3\) are replaced by multiplication by \(H_1\) and \(H_2\). The finalization function has a 2n-bit output , but quadruples following \(\mathcal {R}\) will collide on both outputs. Thus, we can essentially repeat the SUM-ECBC attack, but there is an important difference: GCM-SIV2 is a nonce-based MAC, rather than a deterministic one. Therefore, all queries must include a nonce N, and we should not query two different messages with the same nonce. We adapt the previous attack using message injection functions that output both a nonce and a message, so that we use two fixed messages, 0 and 1, with variable nonces:

$$\begin{aligned} \phi (i)&= (i, 0)&\psi (i)&= (i, 1) \end{aligned}$$
figure a

We consider quadruples of nonce/messages XYZT with

$$\begin{aligned} X&= \phi (x)&Y&= \psi (y)&Z&= \phi (z)&T&= \psi (t), \end{aligned}$$

and we have the same kind of relations as in the previous attack:

Since the MAC output is 2n-bit long, we can directly build an attack with \(\mathcal {O}(2^{3n/4})\) queries: we consider four distinct sets \(\mathcal {X}, \mathcal {Y}, \mathcal {Z}, \mathcal {T}\) of \(2^{3n/4}\) values, and we look for a quadruple \((x, y, z, t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\), such that

(5)

we expect to find one good quadruple that respects \(\mathcal {R}\) along with \(\mathcal {O}(1)\) quadruples that randomly satisfy the observable filter (5). This leads to an attack with \(\mathcal {O}(2^{3n/4})\) queries and time \(\tilde{O}(2^{3n/2})\). Since we recover \(H_1\) and \(H_2\) (from \(H_1^2 = x \oplus y\) and \(H_2^2 = x \oplus t\)), we can do universal forgeries. In addition, we can also easily adapt the attack with \(\mathcal {O}(2^{6n/7})\) queries and time \(\tilde{O}(2^{6n/7})\).

4 Attacking PMAC-like Constructions

We now describe attacks against PMAC+ [43] and related constructions: 1kMAC+ [9], and LightMAC+ [33]. We have an existential forgery attack with \(\mathcal {O}(2^{3n/4})\) queries and \(\tilde{\mathcal {O}}(2^{3n/2})\) operations (using memory \(\mathcal {O}(2^{3n/4})\)), with a range of time-memory trade-offs with \(\mathcal {O}(2^t)\) queries, with \(3n/4< t < n\), and \(\tilde{\mathcal {O}}(2^{3n-2t})\) operations (using memory \(\mathcal {O}(2^{t})\)).

4.1 Attacking PMAC+

PMAC+ was designed by Yasuda in 2011 [43], as a variant of PMAC [5] with a larger internal state. The scheme internally uses a tweakable block cipher construction inspired by the XE construction [39], that we denote as \(\tilde{E}_i\). The message M is first padded with \(10^{*}\) padding, and divided into n-bit blocks, but for simplicity we ignore the padding in our description. The construction is shown in Fig. 4 Footnote 1:

Fig. 4.
figure 4

Diagram for PMAC+ with a \(\ell \)-block message where \(\varDelta _0 = E_{1}(0)\) and \(\varDelta _1 = E_{1}(1)\).

Attack. As in the previous attack, we use message injection functions with two different prefixes, but we include an extra block u to define related quadruples:

$$\begin{aligned} \phi _u(i)&= u \Vert 0 \Vert i&\psi _u(i)&= u \Vert 1 \Vert i \end{aligned}$$

Next, we build quadruples of messages XYZT with

$$\begin{aligned} X&= \phi _u(x)&Y&= \psi _u(y)&Z&= \phi _u(z)&T&= \psi _u(t), \end{aligned}$$

and we look for a quadruple with partial state collisions for the underlying pairs, i.e. a quadruple following the relation:

$$\begin{aligned} \mathcal {R}(x,y,z,t)&:= {\left\{ \begin{array}{ll} \varSigma _{u,0}(x) = \varSigma _{u,1}(y) \\ \varSigma _{u,0}(z) = \varSigma _{u,1}(t) \\ \varTheta _{u,0}(z) = \varTheta _{u,1}(y) \\ \varTheta _{u,0}(x) = \varTheta _{u,1}(t). \end{array}\right. } \end{aligned}$$

We have

$$\begin{aligned} \mathcal {R}(x,y,z,t)&\Leftrightarrow {\left\{ \begin{array}{ll} \tilde{E}_3(x) \oplus \tilde{E}_2(0) = \tilde{E}_3(y) \oplus \tilde{E}_2(1) \\ \tilde{E}_3(z) \oplus \tilde{E}_2(0) = \tilde{E}_3(t) \oplus \tilde{E}_2(1) \\ \tilde{E}_3(y) \oplus \mathtt {2} \tilde{E}_2(1) = \tilde{E}_3(z) \oplus \mathtt {2} \tilde{E}_2(0) \\ \tilde{E}_3(t) \oplus \mathtt {2} \tilde{E}_2(1) = \tilde{E}_3(x) \oplus \mathtt {2} \tilde{E}_2(0) \end{array}\right. } \\&\Leftrightarrow {\left\{ \begin{array}{ll} \tilde{E}_3(x) \oplus \tilde{E}_3(y) \oplus \tilde{E}_3(z) \oplus \tilde{E}_3(t) = 0 \\ \tilde{E}_3(x) \oplus \tilde{E}_3(y) = \tilde{E}_2(0) \oplus \tilde{E}_2(1) \\ \tilde{E}_3(t) \oplus \tilde{E}_3(x) = \mathtt {2}\tilde{E}_2(0) \oplus \mathtt {2}\tilde{E}_2(1) \end{array}\right. } \end{aligned}$$

Again, \(\mathcal {R}\) defines a \(3n{-}\)bit relation, and we can detect it through the sum of the MACs following Eq. (1):

In addition, the relation \(\mathcal {R}\) is independent of the value u, so that we can easily build several quadruples that satisfy \(\mathcal {R}\) simultaneously. This leads to an attack with \(\mathcal {O}(2^{3n/4})\) queries: we consider four sets \(\mathcal {X}, \mathcal {Y}, \mathcal {Z}, \mathcal {T}\) of \(2^{3n/4}\) random values, and we look for a quadruple \((x, y, z, t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\), such that

We expect on average one random quadruple (with \(2^{3n}\) potential quadruples, and a 3n-bit filtering), and one quadruple satisfying \(\mathcal {R}\) (also a 3n-bit condition). The correct quadruple can easily be checked with a few extra queries.

In practice, we use the generalized birthday algorithms of Sect. 2.2 in order to optimize the complexity of the attack. We consider four lists:

and we look for a quadruple \((x, y, z, t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\) such that \(L_1[x] \oplus L_2[y] \oplus L_3[z] \oplus L_4[t] = 0\). This can be done with \(\tilde{\mathcal {O}}(2^{3n/2})\) operations, using a memory of size \(\mathcal {O}(2^{3n/4})\). Finally, once a quadruple (xyzt) satisfying \(\mathcal {R}(x,y,z,t)\) has been detected, it can be used to generate forgeries. Indeed, we can predict the MAC of a new message by making three new queries using Eq. (1):

Time-Query Trade-offs. As opposed to the SUM-ECBC attack, we don’t have an analogue to Eq. (2) that can be used to reduce the time complexity. However, the time complexity of the algorithm can be slightly reduced when using more than \(\mathcal {O}(2^{3n/4})\) queries. If we consider sets \(\mathcal {X}, \mathcal {Y}, \mathcal {Z}, \mathcal {T}\) of size \(2^t\) with \(3n/4< t < n\), the resulting 4-sum is slightly easier, because there are \(2^{4t-3n}\) expected solutions. Using the algorithm of Sect. 2.2, this can be solved in time \(\tilde{\mathcal {O}}(2^{3n-2t})\), using a memory of size \(\mathcal {O}(2^t)\).

4.2 Attacking LightMAC+

LightMAC+ was designed by Naito [33] using ideas from PMAC+ [43] and LightMAC [29]. If we consider it as based on a tweakable block cipher \(\tilde{E}\), it follows the same structure as PMAC+ (see Fig. 5), but \(\tilde{E}\) takes a message block smaller than n bits:

Fig. 5.
figure 5

Diagram for LightMAC+ with \((n-z)\)-bit blocks of a \(\ell \)-block message where \((v)_z\) is the value v written over z bits.

Since the structure of LightMAC+ is the same as the structure of PMAC+, we can use the same attack. The only difference from our point of view is that the message blocks are shorter than the block-size. As long as one message block is big enough to fit \(2^{3n/4}\) different values, our attack will succeed.

This attack violates the improved security proof recently published at CT-RSA [34], with a security bound of \(\mathcal {O}(q_t^2 q_v / 2^{2n})\) (with \(q_t\) MAC queries and \(q_v\) verification queries). Indeed, our attack reaches a constant success probability with \(q_t = \mathcal {O}(2^{3n/4})\) and \(q_v = 1\). We have shared our attack with Naito and he agreed that his proof is flawed.

4.3 Attacking 1kPMAC+

1kPMAC+ is a single-key variant of PMAC+ [43] designed by Datta et al. [9], shown in Fig. 8.

Since the structure of 1kPMAC+ is the same as the structure of PMAC+, we can use the same attack. Alternatively, we can take advantage of the functions to mount a more straightforward attack, as shown in Sect. 6.

5 Attacking f9-like Constructions

Our third attack is applicable to 3kf9 [44] and similar constructions. We have a universal forgery attack with \(\mathcal {O}(2^{3n/4})\) queries and \(\tilde{\mathcal {O}}(2^{5n/4})\) operations using memory \(\mathcal {O}(2^{n})\), with a possible time-memory trade-offs.

5.1 Attacking 3kf9

3kf9 [44], designed by Xhang, Wu, Sui and Wang, is a three-key variant of the f9 mode used in 3G telephony. While the original f9 does not have security beyond the birthday bound [24], 3kf9 is secure up to \(2^{2n/3}\) queries. We describe 3kf9 in Fig. 6:

Attack. Our attack follows the same structure as the previous attacks. We start with messages of the form:

$$\begin{aligned} \phi (i)&= 0 \Vert i&\psi (i)&= 1 \Vert i, \end{aligned}$$

and the corresponding MACs:

We use quadruples of messages XYZT with

$$\begin{aligned} X&= \phi (x)&Y&= \psi (y)&Z&= \phi (z)&T&= \psi (t), \end{aligned}$$

and we look for a quadruple with partial state collisions for the underlying pairs, i. e. a quadruple following the relation:

As in the previous attacks, \(\mathcal {R}\) defines a \(3n{-}\)bit relation. Moreover, we can easily observe when \(x \oplus y \oplus z \oplus t = 0\), and the relation \(x \oplus y = E_1(0) \oplus E_1(1)\) can be verified across several quadruples. We don’t have related quadruples satisfying \(\mathcal {R}\) simultaneously as in the previous attacks, but we can use those properties to detect right quadruples. This leads to an attack with \(\tilde{\mathcal {O}}(2^{3n/4})\) queries: we consider four sets \(\mathcal {X}, \mathcal {Y}, \mathcal {Z}, \mathcal {T}\) of \(\root 4 \of {n} \times 2^{3n/4}\) random values, and we look for quadruples \((x, y, z, t) \in \mathcal {X} \times \mathcal {Y} \times \mathcal {Z} \times \mathcal {T}\), such that:

(6)

Since this a 2n-bit condition, we expect on average \(n \cdot 2^n\) quadruples (xyzt) satisfying (6). In order to filter out the right ones, we look at the value \(x \oplus y\) for all these quadruples. While the wrong quadruples should have a random \(x \oplus y\), the right ones have \(x \oplus y = E_1(0) \oplus E_1(1)\). Therefore, with high probability, the most frequent value for \(x \oplus y\) is equal to \(E_1(0) \oplus E_1(1)\), and quadruples satisfying this extra relation are right quadruples with probability \(\nicefrac 12\). More precisely, we expect on average n wrong quadruples for each value of \(x \oplus y\), and n right quadruples with \(x \oplus y = E_1(0) \oplus E_1(1)\).

Fig. 6.
figure 6

Diagram for 3kf9 with a \(\ell -\)block message.

Optimizing the time complexity. While the algorithm of Sect. 2.2 would take time \(\tilde{\mathcal {O}}(2^{3n/2})\) with \(\tilde{\mathcal {O}}(2^{3n/4})\) queries, we can reduce the time complexity using sets \(\mathcal {X}, \mathcal {Y}, \mathcal {Z}, \mathcal {T}\) with some structure. More precisely, we use:

so that quadruples can be written as

$$\begin{aligned} x =: x_3|x_2|x_1|0&\in \mathcal {X}&y =: y_3|y_2|0|y_0&\in \mathcal {Y} \\ z =: z_3|z_2|z_1|0&\in \mathcal {Z}&t =: t_3|t_2|0|t_0&\in \mathcal {T}. \end{aligned}$$

In particular, right quadruples satisfy \(x \oplus y \oplus z \oplus t = 0\), therefore \(x_1 = z_1\), \(y_0 = t_0\), and \(x_3|x_2 \oplus z_3|z_2 = y_3|y_2 \oplus t_3|t_2\). We use these properties to adapt the algorithm of Sect. 2.2 and locate the quadruples efficiently. First we guess the \(n{\text {/}}2\)-bit value \(\alpha _3|\alpha _2 := x_3|x_2 \oplus z_3|z_2 = y_3|y_2 \oplus t_3|t_3\). Then, for each \(x = x_3|x_2|x_1|0\), there is a single candidate \(z = (x_3 \oplus \alpha _3) | (x_2 \oplus \alpha _2) | x_1 | 0\) that could be part of a right quadruple. Similarly, every \(y = y_3|y_2|0|y_0\) can be paired with a single \(t = (y_3 \oplus \alpha _3) | (y_2 \oplus \alpha _2) | 0 | y_0\). Therefore, we consider the two following lists:

After sorting the lists, we look for matches, and the corresponding quadruples xyzt are exactly the quadruples satisfying

(7)

More precisely, a match \(L_1[x] = L_2[y]\) suggests \(z = x \oplus \alpha _3|\alpha _2|0|0\) and \(t = y \oplus \alpha _3|\alpha _2|0|0\), but there are four corresponding quadruples: (xyzt), (zyxt), (xtzy), (ztxy), and two candidate values for \(E_1(0) \oplus E_1(1)\): \(x \oplus y\) and \(x \oplus y \oplus \alpha _3|\alpha _2|0|0\).

We need \(\tilde{\mathcal {O}}(2^{3n/4})\) operations to generate those quadruples. We repeat this \(2^{n/2}\) times to exhaust all \(n{\text {/}}2\)-bit values \(\alpha _3|\alpha _2\) and generate all quadruples satisfying (6). Finally, we use an array to count the number of occurrences of each possible value of \(x \oplus y\). Each counter receives an average two values, but the counter corresponding to \(E_1(0) \oplus E_1(1)\) will receive three values on average. After repeating all the operations \(\mathcal {O}(n)\) times, with some arbitrary constants in place of the zero bits, the highest counter corresponds to \(E_1(0) \oplus E_1(1)\) with high probability, as proved in Sect. 5.2. This gives an attack with \(\tilde{\mathcal {O}}(2^{3n/4})\) queries, \(\tilde{\mathcal {O}}(2^{5n/4})\) operations, and \(\mathcal {O}(2^n)\) memoryFootnote 2.

Time-Memory Trade-offs. We can reduce the memory usage if we store only a subset of the counters, and repeat the whole algorithm until the whole set has been covered. Concretely, we store only the counters with a fixed value for bits \([0:n{\text {/}}8]\) and \([n{\text {/}}4:3n{\text {/}}8]\) of \(x \oplus y\). Because of the way the lists \(L_1\) and \(L_2\) are constructed, we have actually fixed \(n{\text {/}}8\) bits of \(y_0\) and \(x_1\), and we can reduce the lists to size \(2^{5n/8}\). Therefore we evaluate \(2^{3n/4}\) counters in time \(\tilde{\mathcal {O}}(2^{n/2} \cdot 2^{5n/8})\), using only \(\mathcal {O}(2^{3n/4})\) memory. We repeat iteratively over the full counter set, so we need time \(\tilde{\mathcal {O}}(2^{n/4} \cdot 2^{n/2} \cdot 2^{5n/8}) = \tilde{\mathcal {O}}(2^{11n/8})\). More generally, we have a time-memory trade-off with time \(\tilde{\mathcal {O}}(2^{5n/4+t/2})\) and memory \(\mathcal {O}(2^{n-t})\) for \(0<t<n/4\).

Forgeries. Once we found a quadruple (xyzt) that respects \(\mathcal {R}(x,y,z,t)\) we know that after processing message \(\phi (x) = 0 \Vert x\) and \(\psi (t) = 1 \Vert t\), there is no difference in the \(\varTheta \) part of the state (\(\varTheta _0(x) = \varTheta _1(t)\)). Moreover we have \(\varTheta _0(x) = \varSigma _0(x) \oplus E_1(0)\) and \(\varTheta _1(t) = \varSigma _1(x) \oplus E_1(1)\); this implies that there is a difference \(E_1(0) \oplus E_1(1) = x \oplus y\) in the \(\varSigma \) part of the state. Therefore, we can build a full state collision with message \(0 \Vert x \Vert 0\) and \(1 \Vert t \Vert x \oplus y\). In particular, the following relation can be used to create forgeries with an arbitrary message m (of any length):

Universal Forgeries. We can even forge the tag of an arbitrary message of length at least \((2n+2)\) blocks with complexity only \(n+1\) times the complexity of the simple forgery attack. The technique is more advanced and inspired by the multi-collision attack described by Joux [23]. For ease of notation we’ll show how to forge the signature for a message starting with \(2n+2\) blocks of zero, but this can be trivially adapted for any message.

First, we find a quadruple \((x_1,y_1,z_1,t_1)\) as before. Then we consider messages \(0 \Vert 0\) and \(1 \Vert x_1 \oplus y_1\). Since \(x_1 \oplus y_1 = E_1(0) \oplus E_1(1)\), we have \(\varSigma (0 \Vert 0) = \varSigma (1 \Vert x_1 \oplus y_1)\), i. e. the \(\varSigma \) part of the state collides. Moreover, we know the difference in the \(\varTheta \) part: \(\varTheta (0 \Vert 0) \oplus \varTheta (1 \Vert x_1 \oplus y_1) = x_1 \oplus y_1\).

More generally, at step i we use message injection functions

$$\begin{aligned} \phi _i(x)&= \underbrace{0 \Vert 0 \Vert \ldots \Vert 0}_{\times 2(i-1)} {}\Vert 0 \Vert x&\psi _i(x)&= \underbrace{0 \Vert 0 \Vert \ldots \Vert 0}_{\times 2(i-1)} {}\Vert 1 \Vert x, \end{aligned}$$

to look for a quadruple of messages

$$\begin{aligned} X_i&= \phi _i(x_i)&Y_i&= \psi _i(y_i)&Z_i&= \phi _i(z_i)&T_i&= \psi _i(t_i). \end{aligned}$$

When a right quadruple \((x_i, y_i, z_i, t_i)\) has been identified, we can deduce that the MACs for \(0 \Vert 0 \Vert \ldots \Vert 0 \Vert 0 \Vert 0\) and \(0 \Vert 0 \Vert \ldots \Vert 0 \Vert 1 \Vert x_i \oplus y_i\) will match on the \(\varSigma \) branch and differ by \(x_i \oplus y_i\) in their \(\varTheta \) branch.

After several iterations, we have actually built a multi-collision: all the messages \(h_1 \Vert h_2 \Vert \ldots \Vert h_n \Vert h_{n+1}\) with \(h_i \in \left\{ (1 \Vert x_i \oplus y_i), (0 \Vert 0) \right\} \) collide on the \(\varSigma \) branch. In addition, we also know the difference in the \(\varTheta \) branch for those messages: it is equal to .

After at most \(n+1\) steps, we can find a non empty subset \(\mathcal {I} \subseteq [1 : n+1]\) such that \( \bigoplus _{i \in \mathcal {I}} (x_i \oplus y_i) = 0 \) by simple linear algebraFootnote 3. This gives a collision on the full state, using messages \(m_0 = 0 \Vert 0 \Vert \ldots \Vert 0\) (with \(2(n+1)\) blocks) and \(h = h_1 \Vert h_2 \Vert \ldots \Vert h_n \Vert h_{n+1}\) with \(h_i = 1 \Vert x_i \oplus y_i\) if \(i \in \mathcal {I}\), \(h_i = 0 \Vert 0\) otherwise. Since the full state collides, we have for any message m (of any length):

5.2 Detailed Complexity Analysis

We want to prove the claim that one will need to find \(\mathcal {O}(n \cdot 2^n)\) quadruples in order to finish the attack on 3kf9 described in Sect. 5.1. We say the attack finishes when we recover the target value \(T = E(0) \oplus E(1)\).

Assuming that each quadruple we find respects \(\mathcal {R}\) with probability \(1/2^n\), we fill a list of counters for every suspected values of T; a random quadruple gives two random values and a right one gives one value equal to T and one random value. Therefore we sum up the distribution of an observable value x as:

$$ x {\left\{ \begin{array}{ll} {\mathop {\longleftarrow }\limits ^{\$}} \{0,1\}^n &{} \text {with probability } 1-1/2^{n+1}\\ \longleftarrow T &{} \text {with probability } 1/2^{n+1} \\ \end{array}\right. } $$

Let N be the number of observed values, and \(X^c_i\) represents the indicator that the \(i^{th}\) value equals c (following a Bernoulli distribution), so that the counter corresponding to c is \(X^c = \sum _{i=1}^N X^c_i\). Now we have to discriminate between the distributions of \(X^c\) with \(c \ne T\), and the distribution of \(X^T\):

$$\begin{aligned} \Pr (X^{T}_i=1) = \Pr (x = T)&= (1-1/2^{n+1})/2^n + 1/2^{n+1} = (3/2-1/2^{n+1})/2^n&\\&\implies \mathbf {E}[X^{T}]=N(3/2-1/2^{n+1})/2^n \\ \Pr (X^c_i=1) = \Pr (x = c)&= (1-1/2^{n+1})/2^n&\\&\implies \mathbf {E}[X^c]=N(1-1/2^{n+1})/2^n \\&\implies \mathbf {E}[X^{T}] \ge 3/2 \cdot \mathbf {E}[X^c] \end{aligned}$$

We use the Chernoff bound to get a lower bound on the probability that a given counter is higher than the average value of \(X^T\):

$$\begin{aligned} \Pr (X^c \ge \mathbf {E}[X^T])&\le Pr(X^c \ge 3/2 \cdot \mathbf {E}[X^c]) \le e^{-N(1-1/2^{n+1})/2^{n+1}} \end{aligned}$$

and assuming the counters are independent:

$$\begin{aligned} Pr(X^c< \mathbf {E}[X^T])&\ge 1- e^{-N(1-1/2^{n+1})/2^{n+1}}\\ Pr(\forall c \ne T : X^c < \mathbf {E}[X^T])&\ge (1- e^{-N(1-1/2^{n+1})/2^{n+1}})^{2^n} \end{aligned}$$

This expression will asymptotically converge to a strictly positive constant when \(e^{-N(1-1/2^{n+1})/2^{n+1}} \simeq 2^{-n}\). Therefore, we use

$$N \simeq n \ln (2) \cdot \frac{2^{n+1}}{(1-1/2^{n+1})} = \mathcal {O}(n\cdot 2^n).$$

Since we observe 2 values per quadruples, this makes \(\mathcal {O}(n\cdot 2^n)\) quadruples. Moreover, the event ‘\(X^T \ge \mathbf {E}[X^T]\)’ has a probability close to 0.5, therefore after \(\mathcal {O}(n\cdot 2^n)\) quadruples, we indeed have a \(\varOmega (1)\) probability that \(X^T\) is greater than all of the other counters, which allows to recover the value T. Performing the attack until the end with probability \(\varOmega (1)\) also requires \(\mathcal {O}(n\cdot 2^n)\) quadruples.

To get to this result some assumptions have been made, like the independence of the counters, but they all tend to be either conservative or asymptotically true.

5.3 Attacking 1kf9

1kf9 is a single-key variant of 3kf9 suggested in [8], and later withdrawn. Since the structure of 1kf9 is the same as the structure of 3kf9, we can use the same attack. However, in the next section, we give an attack with birthday complexity using properties of the functions.

6 Attacks Using Collision in Functions

Finally, we show attacks against single key variant of beyond-birthday-bound MACs based on functions, as defined by Datta et al. [8, 9]. The functions just fix the least significant bit an n-bit value to zero or one, and are used for domain separation:

Datta et al. used those function to build a single-key variant of PMAC+ called 1kPMAC+ [9], and a single-key variant of 3kf9 called 1kf9 [8], both with security up to \(2^{2n/3}\) queries. However, 1kf9 has been withdrawn because of issues in its security proof. In this section, we exploit trivial collisions in the functions to build colliding pairs or quadruples more easily:

This allows a more straightforward attack against 1kPMAC+ with the same complexity as the attacks in Sect. 4, and an attack against 1kf9 [8] with birthday complexity, violating its security claims.

6.1 Attacking 1kf9

The 1kf9 mode uses the function for domain separation to build a single-key variant of 3kf9, as shown in Fig. 7:

Fig. 7.
figure 7

Diagram for 1kf9 with a \(\ell -\)block message.

Attack. Because of a mistake in the proof of 1kf9, we can use pairs of messages instead of quadruples. More precisely, instead of looking for a quadruple with pairwise collisions in \(\varSigma \) and \(\varTheta \), we look for a pair of message XY colliding on \(\varSigma '\), and with a difference in \(\varTheta '\) that will be absorbed by the function. Therefore, we define the relation \(\mathcal {R}\) as:

We build the messages with different postfixes, parametrized by u:

$$\begin{aligned} X = \phi _u(x)&= x \Vert u&Y = \psi _u(y)&= y \Vert u \oplus d, \end{aligned}$$

where d is the inverse of 2 in the finite field. With this construction, we have

$$\begin{aligned} \varSigma '(\phi _u(x))&= E\big (u \oplus E(x \oplus E(0))\big ) \\ \varTheta '(\phi _u(x))&= E\big (u \oplus E(x \oplus E(0))\big ) \oplus E\big (x\oplus E(0)\big ) \oplus E\big (0\big ) \\ \varSigma '(\psi _u(y))&= E\big (u \oplus d \oplus E(y \oplus E(0))\big ) \\ \varTheta '(\psi _u(y))&= E\big (u \oplus d \oplus E(y \oplus E(0))\big ) \oplus E\big (y\oplus E(0)\big ) \oplus E\big (0\big ) \end{aligned}$$

In particular, we observe

(8)

From this observation, we construct a birthday attack against 1kf9. We build two lists:

and we look for a match between the lists. We expect on average one pair to match randomly, and one pair to match because of (8). Moreover, when we have a collision candidate \(L_0[x], L_1[y]\), we can verify whether it is a right pair by comparing .

Therefore, we find a pair satisfying \(\mathcal {R}(X,Y)\) with complexity \(2^{n/2}\), and this leads to simple forgeries using (8). This contradicts the security proof of 1kf9 given in [8]. Note that this attack is still valid if we use different multiplications for the two branches in the finalization function.

6.2 Attacking 1kPMAC+

The 1kPMAC+ mode uses the function for domain separation to build a single-key variant of PMAC+, as shown in Fig. 8.

Fig. 8.
figure 8

Diagram for 1kPMAC+ with a \(\ell \)-block message where \(\varDelta _1 = E(1)\) and \(\varDelta _2 = E(2)\).

Attack. Since the functions used in the finalization have collisions, we can build a variant of the attacks from Sect. 4 using differences in \(\varSigma '\) and/or \(\varTheta '\) that are absorbed by the functions. More precisely, we use the following relation \(\mathcal {R}\) on quadruple of messages:

We can find quadruple of messages satisfying \(\mathcal {R}\) using a single message injection function:

$$\begin{aligned}\begin{gathered} \phi _u(i) = u \Vert i \\ \begin{aligned} X = \phi _u(x)&= u \Vert x&Y = \psi _u(y)&= u \Vert y&Z = \phi _u(z)&= u \Vert z&T = \psi _u(t)&= u \Vert t \end{aligned} \end{gathered}\end{aligned}$$

Indeed we have

We observe that:

$$\begin{aligned} \mathcal {R}(x,y,z,t)&\Leftrightarrow {\left\{ \begin{array}{ll} \tilde{E}_2(x) = \tilde{E}_2(y) \oplus 1 \\ \tilde{E}_2(z) = \tilde{E}_2(t) \oplus 1 \\ \mathtt {2}\tilde{E}_2(x) = \mathtt {2}\tilde{E}_2(z) \oplus 1 \\ \mathtt {2}\tilde{E}_2(y) = \mathtt {2}\tilde{E}_2(t) \oplus 1 \end{array}\right. }\\&\Leftrightarrow {\left\{ \begin{array}{ll} \tilde{E}_2(x) \oplus \tilde{E}_2(y) \oplus \tilde{E}_2(z) \oplus \tilde{E}_2(t) = 0 \\ \tilde{E}_2(x) = \tilde{E}_2(y) \oplus 1 \\ \tilde{E}_2(x) = \tilde{E}_2(z) \oplus d \end{array}\right. } \end{aligned}$$

Therefore, \(\mathcal {R}\) defines a \(3n{-}\)bit relation that is independent of the value u. This can be used for attacks in the same way as in the previous sections, using a single list

We can find a quadruple of four distinct values (xyzt) such that \(L[x] \oplus L[y] \oplus L[z] \oplus L[t] = 0\) with \(\tilde{\mathcal {O}}(2^{3n/2})\) operations, using a memory of size \(\mathcal {O}(2^{3n/4})\), and this easily leads to forgeries.

7 Conclusion

In this paper we have introduced a cryptanalysis technique to attack double-block-hash MACs using quadruples of messages. We show three variants of the technique, with attacks with \(\mathcal {O}(2^{3n/4})\) queries against SUM-ECBC, GCM-SIV2, PMAC+, LightMAC+, 1kPMAC+ and 3kf9. All these modes have a security proof up to \(2^{2n/3}\) queries, but no attacks with fewer than \(2^n\) queries were known before our work.

Our main attacks are in the information theoretic model, and an attacker would need more than \(2^n\) operations to perform a forgery. On the other hand, we also have a variant of the attack against SUM-ECBC and GCM-SIV2 with time complexity \(\tilde{O}(2^{6n/7})\). This opens the path for attack with total complexity below \(2^n\) for other double-block-hash MACs.

We believe that studying generic attacks is important in order to understand the security of these MACs, and is needed in addition to security proofs. In particular our results show that they do not reach full security, and we invalidate a recent proof for LightMAC+. However, there is still a gap between the \(2^{2n/3}\) bound of the proofs, and our attacks with \(\mathcal {O}(2^{3n/4})\) queries. Further work is needed to determine whether the attacks can be improved, or whether better proofs are possible.