1 Introduction

With the rise of the Internet of Things, connected devices are being placed everywhere, resulting in a wide variety of efficiency, robustness, and feature requirements for communication. Securing the communcation remains important, and as a result, many block ciphers have been created to work efficiently in constrained environments. These block ciphers offer a range of block and key sizes, from 128 to 32 bits; see Table 1 for a sample.

The key size is often chosen carefully to ensure a sufficiently high security level, resulting in the block size becoming the dominant factor in determining security. As is well known, reducing block size can increase the chance of an inner state collision when block ciphers are used in so-called modes of operation: constructions which repeatedly apply a block cipher to achieve functionality beyond what a block cipher offers.

Consider MAC (Message Authentication Code) modes of operation, which aim to provide data authenticity for long messages. Common MAC modes, such as CBC-MAC [5], OMAC [24], and PMAC [10] have security bounds which degrade relative to both the number of messages tagged, q, and the length of the messages measured in blocks, \(\ell \); see Table 2 for a list of modes with their dependence on \(\ell \). For many modes, an adversary which is able to tag q messages of length \(\ell \) blocks will have a success probability of roughly

$$\begin{aligned} \frac{q^2\ell }{2^n}, \end{aligned}$$
(1)

where n is the block size of the underlying block cipher. With a 32 bit block size and a guarantee that adversaries do not forge with probability more than one in a million, one gets a restriction of the form

$$\begin{aligned} \frac{q^2\ell }{2^{32}} \le \frac{1}{2^{20}} \quad \text {or}\quad q^2\ell \le 2^{12}, \end{aligned}$$
(2)

meaning 64 one-block messages can be tagged under the same key. But what if the messages are longer than one block? With conventional MACs only 32 four-block messages can be tagged, corresponding to \(32\cdot 2^2\cdot 32 = 2^{12}\) bits, or 512 Bytes of data per key. If the messages are sixteen blocks long, only 16 messages can be tagged, which is \(16\cdot 2^4\cdot 32 = 2^{13}\) bits, or 1 KiB of data per key. Figure 1 displays how much data the various modes from Table 2 can process per key, when the threshold success probability is set to \(1/2^{20}\).

Table 1. Supported block sizes are often small, and can be as low as 32 bits.
Table 2. The table below contains the coefficients of the powers of \(\ell \) contained in the security bounds for adversaries making q queries of length \(\ell \), with block size n bits. References are to papers proving the bounds. In the bound for EMAC, the function \(d'(\ell )\) has been replaced by \(\ell \).
Fig. 1.
figure 1

A plot of message block lengths per key versus the number of queries that can be made in order to achieve the threshold success probability of \(2^{-20}\). In other words, if (xy) is a point on the graph, then \(x\cdot y\) represents the number of blocks that can be processed per key. The blocksize is set to 32 bits.

1.1 Contributions

We present a MAC mode, LightMAC, which enables one to tag much longer messages than typically possible. LightMAC is depicted in Fig. 2 and Algorithm 1.

The security upper bound for LightMAC is

$$\begin{aligned} (1+\epsilon )\cdot \frac{q^2}{2^n}, \quad \text {where}\, \epsilon \in O\left( \frac{1}{2^{n/2}-1}\right) , \end{aligned}$$
(3)

which is independent of the message length (see Sect. 4). In other words, with a 32 bit block size, and setting the message-length parameter s to 16, roughly 64 messages can be tagged with length up to \(2^{15}\) blocks. Note that keys are used most efficiently when the messages are as long as possible: up to \(64 \cdot 2^{15} \cdot 32 = 2^{26}\) bits, or 8 MiB of data can be tagged per key. LightMAC uses two independent keys, but even after normalizing by the number of keys, the amount of data processed per key is still 4 MiB, a significant improvement over 1 KiB.

Figure 1 compares LightMAC to the other published modes from Table 2. The figure shows that LightMAC starts with a factor \(2^4\) improvement over many of the modes, which grows to roughly \(2^{10}\) as the number of queries increases. Modes such as PMAC with Parity and PMACX were designed to handle long message lengths and offer competitive bounds, at the cost of increased design complexity. LightMAC’s advantage over these modes is its simplicity and low overhead.

Like PMAC [10], LightMAC allows block cipher calls to be made in parallel, but unlike PMAC, LightMAC is based on Bernstein’s protected counter sum [8], and hence should not suffer from patent issues.

A disadvantage of LightMAC is that its rate is low. In order to tag messages of length up to \(2^{n/2-1}\) blocks, n / 2 bits of the block must be sacrificed for a counter, hence two block cipher calls must be called per block of data. However, the rate can be improved: if the maximum message length that will be communicated is known to be less than \(2^s(n-s)\) bits, then the rate can be set to \((n-s)/n\) blocks per block cipher call. For example, using a 32 bit block cipher, if the message lengths are less than \(2^9\) blocks, then the rate can be set to 2 / 3 blocks per call. Therefore, unlike other modes, LightMAC can be optimized according to the application: the shorter the messages, the more efficient LightMAC is, while allowing the same number of message to be queried. Section 5 presents implementation results for LightMAC instantiated with the AES [15] and PRESENT [11], and discusses LightMAC’s efficiency in more detail.

1.2 Related Work

In 1995, Bellare et al. [4] described the XOR MACs, which XORed together finite-input-length pseudorandom functions (PRF) to create stateful and randomized MACs. In 1999, Bernstein [8] introduced the protected counter sum, which composes an XOR MAC with an independent PRF call to create a stateless, deterministic MAC. In 2012, Yasuda [46] explained the basic idea for LightMAC in his paper’s introduction, which can be viewed as an adaptation of Bernstein’s protected counter sum using block ciphers.

Another MAC algorithm designed for lightweight use is Chaskey [30]. The Chaskey paper includes a block cipher and a permutation mode, but both have bounds which deteriorate quadratically with respect to message length.

In certain cases the bounds in Table 2 can be improved. For example, for \(\ell \le 2^{n/8}\) and \(q\ge \ell ^2\), EMAC’s bound becomes \(\frac{16q^2}{2^n} + \frac{128q^2\ell ^8}{2^{2n}}\) as shown by Pietrzak [34]. For the sum of CBCs, Yasuda [44] also showed that if \(\ell \le 2^{2n/5}\), the advantage becomes \(\frac{40\ell ^3q^3}{2^{2n}}\).

2 Preliminaries

The set \(\left\{ 0,1\right\} ^n\) represents all bit-strings of length n; the set \(\left\{ 0,1\right\} ^{\le n}\) is all bit-strings of length less than or equal to n. For two bit-strings A and B, we write \(A\Vert B\) and AB interchangeably for the concatenation of A and B. Let r be an integer, then \(M[1]M[2]\cdots M[\ell ] \xleftarrow {r} M\) represents splitting M into r-bit blocks with the length of the last block, \(M[\ell ]\), being anywhere from zero to \(r-1\) bits.

A block cipher is a function \(E:\left\{ 0,1\right\} ^k\times \left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) where \(E(K,\cdot )\) defines a permutation for all \(K\in \left\{ 0,1\right\} ^k\). The integer n is the block length of E and we write \(E_K(X)\) to mean E(KX). Given a block length n, concatenation of \(10^*\) to a string means appending a one followed by the minimum number of zeros to make the total string length a multiple of n bits.

The symbol \(0^n\) represents the n-bit string consisting of only zeros. Given a string A of length n, and an integer \(t\le n\), then \(\lfloor A\rfloor _t\) denotes the t least significant bits of A.

For an integer \(1\le i\le 2^s\), \(i_s\) represents some s-bit constant with the property that if \(1\le i < j\le 2^s\) then \(i_s\ne j_s\). For example, \(i_s\) could be an s-bit representation of the integer i, or the ith s-bit Gray code.

3 LightMAC

Let \(E:\left\{ 0,1\right\} ^k\times \left\{ 0,1\right\} ^n\rightarrow \left\{ 0,1\right\} ^n\) be a block cipher. Let s and t be integers not greater than n / 2 and n, respectively, and fix some representation for \(i_s\) (see Sect. 2). LightMAC accepts two independent and uniformly generated keys \(K_1\) and \(K_2\) from \(\left\{ 0,1\right\} ^k\), and a message M of length at most \(2^s(n-s)\) bits. LightMAC produces an output of length t bits. Figure 2 and Algorithm 1 depict how the output is produced.

LightMAC can be used as either a pseudorandom function (PRF) or a MAC (see Sects. 4.2 and 4.3 for definitions). When used as a PRF, LightMAC is fully described by Algorithm 1. When used as a MAC, tags are generated using Algorithm 1, and verification of a message-tag pair (MT) is done by comparing LightMAC (M) with T: if the two are equal, verification succeeds, otherwise not.

The parameters of LightMAC are the integers s and t, the representation of \(i_s\), and the block cipher E, which implicity fixes k and n. The parameters must be agreed upon before a session starts, and remain constant during.

Fig. 2.
figure 2

LightMAC evaluated on a message \(M[1]\,M[2]\,M[3]\,M[4] \xleftarrow {n-s} M\). The rounded squares represent block cipher calls and the trapezium is truncation to t bits.

figure a

4 Security

Although Bellare, Guérin, and Rogaway [4] describe how to instantiate an XOR MAC using the Data Encryption Standard, they only provide proofs for pseudorandom functions, not pseudorandom permutations. Hence, even though the XOR MACs were proven to have bounds with no message length dependence, subsequent application of the PRP-PRF switching lemma would establish quadratic message length dependence. A similar explanation applies to the protected counter sum’s security bound. Therefore a direct security proof is necessary for LightMAC.

The XOR MACs and protected counter sum did not exhibit any message length dependence because the XOR of independent, uniformly distributed random variables is still uniformly distributed. In this section we use the fact that roughly the same applies to the XOR of distinct block cipher outputs to achieve message length independence for LightMAC.

4.1 Block Cipher Security

The security of LightMAC is reduced to that of its underlying block cipher, that is, if an attack is found against LightMAC, then the attack can be reduced to an attack against the block cipher. The quality of the reduction is measured by the security bounds computed in Theorems 1 and 2.

The statements of the theorems include terms describing the quality of the underlying block cipher, which is measured as follows.

Definition 1

Let \(E:\mathsf {K}\times \mathsf {X}\rightarrow \mathsf {X}\) be a block cipher, and let \(\pi \) be a uniformly distributed random permutation over the set of permutations on \(\mathsf {X}\). Then the PRP-advantage against E of adversaries \(\mathcal {A}\) making q queries and running in time \(\tau \) is

$$\begin{aligned} \mathsf {PRP}(q,\tau ) :=\sup _{A\in \mathcal {A}}\left|\mathbf {P}_{}\left[ A^{E_K} = 1\right] - \mathbf {P}_{}\left[ A^{\pi } = 1\right] \right|, \end{aligned}$$
(4)

where \(A^O = 1\) is the event that A outputs 1 when given access to oracle O, and K is uniformly distributed over \(\mathsf {K}\).

4.2 LightMAC as a PRF

A PRF \(\varPhi :\mathsf {K}\times \mathsf {M}\rightarrow \mathsf {T}\) is a construction which should be computationally indistinguishable from a uniformly distributed random function (URF), that is, a uniformly distributed random variable over the set of all functions from \(\mathsf {M}\) to \(\mathsf {T}\). The quality of the PRF is measured via the PRF-advantage of adversaries.

Definition 2

The PRF-advantage of an adversary A in distinguishing the PRF \(\varPhi :\mathsf {K}\times \mathsf {M}\rightarrow \mathsf {T}\) from the URF \(\$:\mathsf {M}\rightarrow \mathsf {T}\) is

$$\begin{aligned} \left|\mathbf {P}_{}\left[ A^{\varPhi _K} = 1\right] - \mathbf {P}_{}\left[ A^{\$} = 1\right] \right|, \end{aligned}$$
(5)

where \(A^O = 1\) is the event that A outputs 1 when given access to oracle O, and K is uniformly distributed over \(\mathsf {K}\).

Theorem 1

The PRF-advantage against LightMAC of any adversary running in time \(\tau \) and making at most q queries of length at most \(2^s(n-s)\) bits is bounded above by

$$\begin{aligned} \left( 1 + \frac{1}{2^{n/2}-1} + \frac{1}{2(2^{n/2}-1)^2}\right) \cdot \frac{q^2}{2^n} +\mathsf {PRP}(q\cdot (2^s-1), \tau _1) + \mathsf {PRP}(q, \tau _2), \end{aligned}$$
(6)

where n is the block size in bits, \(\tau _1 \in \tau + O(q\cdot (2^s-1))\), and \(\tau _2 \in \tau + O(q)\).

Proof

Let A be a PRF-adversary against LightMAC running in time \(\tau \) and making at most q queries of length at most \(2^s(n-s)\) bits. Construct the PRP adversary \(B_1\) against \(E_{K_1}\) as follows: \(B_1\) simulates \(E_{K_2}\) by uniformly randomly choosing key \(K_2\), runs A, and responds to A’s queries using a combination of its own oracle and the simulated \(E_{K_2}\); \(B_1\) forwards A’s response as its own. Construct the PRP adversary \(B_2\) against \(E_{K_2}\) similarly. Then A’s PRF-advantage against LightMAC is bounded above by

$$\begin{aligned} \alpha + \mathsf {PRP}(q\cdot (2^s-1), \tau _1) + \mathsf {PRP}(q, \tau _2), \end{aligned}$$
(7)

where \(\alpha \) is A’s PRF-advantage against LightMAC with its \(E_{K_1}\) and \(E_{K_2}\) calls replaced with \(\pi _1\) and \(\pi _2\) calls, respectively, where \(\pi _1\) and \(\pi _2\) are independent, uniformly distributed random permutations.

We replace \(\pi _2\) with a uniformly distributed random function \(\phi \) using the PRP-PRF switching lemma, at a cost of \(q^2/2^{n+1}\) in advantage. The PRF we are left with is

$$\begin{aligned} \varPhi (M) = \phi \left( M[\ell ]10^* \oplus \bigoplus _{i=1}^{\ell -1} \pi _1(i_sM[i]) \right) , \end{aligned}$$
(8)

which is LightMAC instantiated with \(\pi _1\) and \(\phi \), and

$$\begin{aligned} \alpha \le \alpha ' + \frac{q^2}{2^{n+1}}, \end{aligned}$$
(9)

where \(\alpha '\) is A’s PRF-advantage against \(\varPhi \).

Let F denote the function contained in the call to \(\phi \) in Eq. 8. Then, as long as F’s outputs are distinct, each input to \(\phi \) is unique, meaning \(\varPhi \) will be indistinguishable from \(\$\). In other words,

$$\begin{aligned} \alpha ' \le \sum _{i < j}\mathbf {P}_{}\left[ F(M_i) = F(M_j)\right] \le \frac{q^2}{2}\max _{M_i\ne M_j}\mathbf {P}_{}\left[ F(M_i) = F(M_j)\right] , \end{aligned}$$
(10)

where \(M_i\) for \(i = 1,\ldots , q\) are the messages queried by A. The maximum on the right hand side is computed in Sect. 4.4, resulting in the bound

$$\begin{aligned} \alpha ' \le \frac{q^2}{2}\cdot \frac{1}{2^n - 2^{s+1} + 1}. \end{aligned}$$
(11)

Therefore, using the fact that \(s\le n/2\), we have

$$\begin{aligned} \alpha&\le \frac{q^2}{2^{n+1}} + \frac{q^2}{2}\cdot \frac{1}{2^n-2^{s+1}+1}\end{aligned}$$
(12)
$$\begin{aligned}&\le \frac{q^2}{2^n}\left( 1 + \frac{1}{2^{n/2}-1} + \frac{1}{2(2^{n/2}-1)^2}\right) , \end{aligned}$$
(13)

giving us our desired bound.    \(\square \)

4.3 LightMAC as a MAC

A MAC consists of a tagging and a verification algorithm. The tagging algorithm accepts messages from some message set \(\mathsf {M}\) and produces tags from a tag set \(\mathsf {T}\). The verification algorithm receives message-tag pairs (MT) as input, and outputs 1 if the pair (MT) is valid, and 0 otherwise. The insecurity of a MAC is measured as follows.

Definition 3

Let A be an adversary with access to a MAC. The advantage of A in breaking the MAC is the probability that A is able to produce a message-tag pair (MT) for which the verification algorithm outputs 1, where M has not been previously queried to the tagging algorithm.

Theorem 2

The MAC-advantage against LightMAC of any adversary running in time \(\tau \) and making at most q tagging queries and v verification queries of length at most \(2^s(n-s)\) bits, is bounded above by

$$\begin{aligned}&\left( 1 + \frac{2}{2^{n/2}-1} + \frac{1}{(2^{n/2}-1)^2}\right) \cdot \left( \frac{q^2}{2^n} + \frac{v}{2^t}\right) + \nonumber \\&\qquad \qquad \qquad \quad \mathsf {PRP}(q\cdot (2^s-1), \tau _1) + \mathsf {PRP}(q, \tau _2) + \mathsf {PRP}(v2^s, \tau _3), \end{aligned}$$
(14)

where n is the block size in bits, \(\tau _1\in \tau + O(q\cdot (2^s-1))\), \(\tau _2\in \tau + O(q)\), and \(\tau _3\in \tau + O(v2^s)\).

Proof

We apply the same reduction as in the proof of Theorem 1 to replace LightMAC’s \(E_{K_1}\) and \(E_{K_2}\) calls with \(\pi _1\) and \(\pi _2\) calls, respectively. As a MAC, LightMAC follows the hash-then-encrypt paradigm as described by Dodis and Pietrzak [16], with the function F from Sect. 4.4 as the “hash” part, hence applying Proposition 1 from their paper we get an upper bound of

$$\begin{aligned} \left( 1 + \frac{2}{2^{n/2}-1} + \frac{1}{(2^{n/2}-1)^2}\right) \cdot \left( \frac{q^2}{2^n} + \frac{v}{2^t}\right) . \end{aligned}$$
(15)

   \(\square \)

4.4 Collision Probability of F

Proposition 1

Let \(m = 2^s(n-s)\). Let \(M[1]M[2]\cdots M[\ell ]\xleftarrow {n-s} M\) for \(M\in \left\{ 0,1\right\} ^{\le m}\), and define F to be

$$\begin{aligned} F(M) = M[\ell ]10^* \oplus \bigoplus _{i=1}^{\ell -1} \pi (i_s\,M[i])\,, \end{aligned}$$
(16)

where \(\pi \) is a uniformly distributed random permutation over \(\left\{ 0,1\right\} ^n\), then the probability that two distinct messages \(M_1,M_2\in \left\{ 0,1\right\} ^{\le m}\) collide is

$$\begin{aligned} \mathbf {P}_{}\left[ F(M_1) = F(M_2)\right] \le \frac{1}{2^n - \ell _1-\ell _2 + 1}\,, \end{aligned}$$
(17)

where \(\ell _i\) is the length of \(M_i\) in \((n-s)\)-bit blocks rounded up.

Proof

The equation \(F(M_1) = F(M_2)\) can be rewritten as

$$\begin{aligned} \bigoplus _{i=1}^{\ell _1}\pi (i_s M_1[i])\oplus \bigoplus _{i=1}^{\ell _2}\pi (i_s M_2[i]) = M_1[\ell _1]10^* \oplus M_2[\ell _2]10^*. \end{aligned}$$
(18)

Since \(M_1\ne M_2\) there are two cases:

  1. 1.

    \(\ell _1 = \ell _2\), \(M_1[\ell _1]10^* \ne M_2[\ell _2]10^*\), and \(M_1[i] = M_2[i]\) for all i, or

  2. 2.

    either \(\ell _1\ne \ell _2\) or there exists an i such that \(M_1[i] \ne M_2[i]\).

In the first case there is no collision, hence we focus on the second case. Without loss of generality we can assume that \(M_1[i]\ne M_2[i]\) for all i, and we can simplify the problem to calculating the probability that

$$\begin{aligned} \bigoplus _{i=1}^\ell \pi (x_i) = c, \end{aligned}$$
(19)

where \(\ell = \ell _1+\ell _2\), \(c = M_1[\ell _1]10^*\oplus M_2[\ell _2]10^*\), and \(x_i\ne x_j\) for \(i\ne j\).

Let \(N = 2^n\), then \(\mathbf {P}_{}\left[ \bigoplus _{i=1}^\ell \pi (x_i) = c\right] \) equals

(20)

By Lemma 1 we have that the probability is bounded above by \(N!/(N-\ell +1)\), giving us our desired result.   \(\square \)

Lemma 1

Let \(c\in \left\{ 0,1\right\} ^n\) and let \(N = 2^n\). The number of sequences \((y_1,y_2,\ldots , y_N)\in (\left\{ 0,1\right\} ^n)^N\) with \(y_i\ne y_j\) for \(i\ne j\) such that

$$\begin{aligned} \bigoplus _{i=1}^\ell y_i = c, \end{aligned}$$
(21)

is not greater than \(N!/(N-\ell +1)\).

Proof

We start by fixing \(y_1\), for which there are N possibilities. Since \(y_2\) cannot equal \(y_1\), there are \(N-1\) possibilities for \(y_2\). Continuing this way, we have that there are \(N-i\) possibilities for \(y_{i+1}\), with \(i \le \ell -2\). For \(y_{\ell }\) there is at most one possibility, namely \(c\oplus y_1\oplus y_2\oplus \cdots y_{\ell -1}\). All \(y_j\) for \(j > \ell \) must be distinct from all preceding \(y_i\), hence in total there are at most

$$\begin{aligned} N\cdot (N-1)\cdot \cdots \cdot (N-\ell +2)\cdot (N-\ell )! = \frac{N!}{N-\ell +1} \end{aligned}$$
(22)

possible sequences.    \(\square \)

5 Implementation

In this section, we discuss the implementation characteristics of LightMAC and compare it to the serial two-key CBC-MAC with last block encryption, EMAC [6], and to PMAC with Parity (PMAC/P) [46], which provides a parallelizable rate 2/3 construction and can be considered its main competitor.

5.1 Implementation Characteristics of LightMAC

LightMAC is a mode with very low overhead: besides the block cipher calls, it only requires an s-bit counter generator and one additional n-bit state for summing the block cipher outputs.

This means that the code size (for embedded software or microcontrollers) and area requirements (for hardware implementations) of LightMAC can be estimated as roughly equivalent to CBC-MAC with encryption of the last block by a second key. Compared to PMAC with Parity, LightMAC uses only two keys instead of four. In comparison to all PMAC variants, the absence of finite field doubling further improves its implementation characteristics on embedded platforms or hardware.

In terms of throughput, a compact serial implementation of LightMAC will give a performance of about \(n/(n-s)\) block cipher call equivalents per message block of \(n-s\) bits, which means that the serial performance of LightMAC on a given platform can readily be evaluated based on the performance of the best available implementation of the chosen underlying block cipher. Except for very short messages, the overhead imposed by the final block cipher call is negligible.

Like PMAC and its derivatives, LightMAC has the advantage that the individual block cipher calls can be parallelized. While this is typically less important on lightweight platforms, where compactness and power/energy consumption are the prime concerns, this property enables high-performance implementations for the server side: since exactly the same lightweight algorithms used on small devices will also have to be used by the servers communicating with them, they should ideally also have good implementation characteristics in high-performance software environments. The importance of this was for instance pointed out in [29]. Many lightweight algorithms and modes of operation are inherently serial in nature and therefore inefficient in software. Our implementation study therefore focuses on this scenario.

5.2 The Setting

We explore the high-performance parallel software implementation possibilities for LightMAC, with the following choices regarding platform and instantiation parameters:

Underlying Block Ciphers. We use the block ciphers PRESENT [11] and AES [15] for our implementations. PRESENT is a lightweight 64-bit block cipher that was recently standardised by ISO, and AES serves as a baseline.

Choice of s and t. We always use full tag lengths \(t=n\), meaning 64-bit tags for PRESENT and 128-bit tags for AES. We furthermore instantiate LightMAC with the following values of s:

  1. 1.

    \(s=n/2\) for the maximum supported message length (and correspondingly lowest rate 1 / 2);

  2. 2.

    \(s=n/3\), rounded to the nearest multiple of 8, for a mode with rate 2 / 3;

  3. 3.

    \(s=8\), for a short maximum message length with the highest rate (\(1-8/n\)).

Altogether, these parameter choices illustrate a wide spectrum of use cases.

Platform. We implement LightMAC on Intel’s recent Skylake microarchitecture, using the 256-bit AVX2 instruction set. PRESENT was implemented in a bitsliced fashion processing 8 blocks in parallel. Other implementation strategies are known to yield a significantly lower performance, see [7] for a comprehensive study. For the AES, the AES-NI instruction set [20] was used. The key scheduling was precomputed for both ciphers. Since byte-aligned s-bit addition is inexpensive on this platform, the counters \(i_s\) are implemented as the s-bit representation of the integer i.

Message Lengths. We provide performance data for all message lengths of \(\ell =2^b\) bytes, with \(7 \le b \le 13\), wherever \(8\ell \le 2^s(n-s)\).

5.3 Performance Measurements

All measurements were taken on a single core of an Intel Core i7-6700 CPU at 3.4 GHz with Turbo Boost disabled, and averaged over 200000 repetitions. The performance of the block ciphers AES and PRESENT, both in serial and parallel implementations, is provided as a reference point in Table 3. Our findings on the performance of LightMAC and related MACs are summarised in Table 4. All performance numbers are given in cycles per byte (cpb).

Table 3. Baseline performance of ciphers PRESENT and AES on Skylake (AVX2, AES-NI).
Table 4. Software performance of LightMAC, EMAC and PMAC with Parity (PMAC/P), instantiated with PRESENT and AES on the Intel Skylake platform (AVX2, AES-NI). All numbers are given in cycles per byte (cpb). Data is provided for message lengths smaller than \(2^{s}(n-s)\) bits.

Discussion. One can observe that with both PRESENT and the AES as the underlying block ciphers, LightMAC provides a performance of about the inverse of its rate times the baseline block cipher speed. This confirms that LightMAC imposes very low overhead in addition to the block cipher invocations.

In contrast to the serial EMAC, LightMAC provides significantly greater performance despite featuring a smaller rate. This demonstrates the advantage of parallelisability over a sequential algorithm.

Comparing the LightMAC instantiations with rate 2/3 to PMAC with Parity (PMAC/P), we note that the use of the same key throughout the message processing (as opposed to three different keys in PMAC/P) significantly improves the performance for the PRESENT-based implementation: LightMAC is consistently around 50 % faster. This is largely due to the fact that the parts of each subkey of PMAC/P’s three bitsliced keys have to be interleaved in an appropriate way. The effect is less pronounced for the AES where no conversion to bitsliced format is needed, and due to the AES-NI instructions which freely accept both registers and memory locations for the subkeys. Still, LightMAC is about 20 % faster, while additionally providing a flexible range of trade-offs between rate and maximum message length.

6 Conclusions

We proposed LightMAC, a new MAC mode of operation specifically suited to lightweight applications. Its security bound was shown in Sect. 4 to not depend on the message length, allowing an order of magnitude more data to be processed per key.

Featuring a simple design with very low overhead over the block cipher, it not only offers compact authentication for resource-constrained platforms, but also allows high-performance parallel implementations, as demonstrated by the implementation study of LightMAC instantiated with PRESENT and the AES in Sect. 5. Furthermore, the implementation results show how the s-parameter translates directly to a trade-off between rate and maximum message length.

Unlike PMAC and its many derivatives, LightMAC is not covered by patents. Altogether, this makes it a promising authentication solution for a wide range of platforms and use cases.