Keywords

1 Introduction

In order to protect embedded systems against side-channel attacks, countermeasures need to be implemented. Masking and shuffling are the most investigated solutions for this purpose [18]. Intuitively, masking aims at increasing the order of the statistical moments (in the leakage distributions) that reveal sensitive information [8, 15], while shuffling aims at increasing the noise in the adversary’s measurements [14]. As a result, an important challenge is to develop sound tools to understand the security of these countermeasures and their combination [31]. For this purpose, the usual strategy is to consider template attacks for which one can split the evaluation goals into two parts: offline profiling (building an accurate leakage model) and online attack (recovering the key using the leakage model). As far as profiling is concerned, standard methods range from non-parametric ones (e.g., based on histograms or kernels) of which the cost quite highly suffers from the curse of dimensionality (see e.g., [2] for an application of these methods in the context of non-profiled attacks) to parametric methods, typically exploiting the mixture nature of shuffled and masked leakage distributions [16, 17, 25, 27, 33], which is significantly easier if the masks (and permutations) are known during the profiling phase. Our premise in this paper is that an adversary is able to obtain such a mixture model via one of these means, and therefore we question its efficient exploitation during the online attack phase.

In this context, a starting observation is that the time complexity of template attacks exploiting mixture models increases exponentially with the number of masks (when masking) and permutation length (when shuffling [37]). So typically, the time complexity of an optimal template attack exploiting Q traces against an implementation where each n-bit sensitive value is split into \(\varOmega \) shares and shuffled over \(\varPi \) different positions is in \(\mathcal {O}\left( Q\cdot (2^n)^{\varOmega -1}\cdot \varPi !\right) \), which rapidly turns out to be intractable. In order to mitigate the impact of this high complexity, we propose a small, well-controlled and principled relaxation of the optimal distinguisher, based on its Taylor expansion (already mentioned in the field of side-channel analysis in [6, 11]) of degree L. Such a simplification leads to various concrete advantages. First, when applied to masked implementations, it allows us to perform the (mixture) computations corresponding to the \((2^n)^{\varOmega }\) factor in the complexity formula only once (thanks to precomputation) rather than Q times. Second, when applied to shuffled implementations, it allows us to replace the \(\varPi !\) factor in this formula by \({\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) } = {\varPi \atopwithdelims ()L}\), thanks to the bounded degree L.

Additionally it can be noticed that an attacker will only build, during the offline profiling, the leakage models needed for the attack. By applying the Taylor expansion of the optimal distinguisher the complexity of the offline profiling is significantly reduced. In general the complexity of the offline profiling becomes equivalent to the complexity of the online attack.

The resulting “rounded template attacks” additionally carry simple intuitions regarding the minimum degree of the Taylor expansion needed for the attacks to succeed. Namely, this degree L needs to be at least equal to the security order O of the target implementation, defined as the smallest statistical moment in the leakage distributions that are key-dependent.

We then show that these attacks only marginally increase the data complexity (for a given success rate) when applied against a masked (only) implementation. More importantly, we finally exhibit that rounded template attacks are especially interesting in the context of high-dimensional higher-order side-channel attacks, and put forward the significant improvement of the attacks against the masked implementations with shuffled table recomputations from CHES 2015 [7].

Introduction to Shuffled Table Recomputation. Masking the linear parts of a block cipher is straightforward whereas protecting the non-linear parts is less obvious. To solve this issue different methods have been proposed. One can cite algebraic methods [3, 30], using Global Look-Up Table (GLUT) [28] and table recomputation [1, 8, 10, 19]. Table recomputation methods are often used in practice as they represent a good tradeoff between memory consumption and execution time since they precompute a masked substitution box (S-Box) that is stored in a table.

However, some attacks still manage to recover the mask during the table recomputation [6, 36]. As a further protection the recomputation can be shuffled. This protection uses a random permutation which is drawn over \(S_{2^n}\), the set of all the permutation of \(\mathbb F^n_2\). Therefore, some random masks are uniformly drawn over \(\mathbb F^n_2\) to ensure the security against first-order attacks.

Contributions. We show that the expansion of the likelihood allows attacks with a very high computational efficiency, while remaining very effective from a key recovery standpoint. This means that the expanded distinguisher requires only little more traces to reach a given success rate, while being much faster to compute.

We also show how to grasp in a multivariate setting several leakages of different orders. In particular, we present an attack on shuffled table recomputation which succeeds with less traces than [7]. Notice that the likelihood attack cannot be evaluated in this setting because it is computationally impossible to average over both the mask and the shuffle (the sole number of shuffles is \(2^n! \approx 2^{1684}\) with \(n=8\)).

Finally, we show that are our rounded version of the maximum likelihood allows better attacks than the state-of-the-art. Namely, our attack is better than the classical \({\text {2O-CPA}}\) and the recent attack of CHES’15 [7] in all noise variance settings.

Outline. The remainder of the paper is organized as follows. Section 2 provides the necessary notations and mathematical definitions. The theoretical foundation of our method is presented in Sect. 3. The case-study (shuffled table recomputation) is shown in Sect. 4. Section 5 evaluates the complexity of our method. The performance results are presented in Sect. 6. Conclusions and perspectives are presented in Sect. 7. Some technical results are deferred to the appendices.

2 Notations

2.1 Parameters

Randomization countermeasures consist in masking and shuffling protections. When evaluating randomized implementations, there are a number of important parameters to consider. First, the number of shares and the shuffle length in the scheme, next denoted as \(\varOmega \) and \(\varPi \), are algorithmic properties of the countermeasure. These numbers generally influence the tradeoff between the implementation overheads and the security of the countermeasures. Second, the order of the implementation protected by a randomization countermeasure, next denoted as O, which is a statistical property of the implementation. It corresponds to the smallest key-dependent statistical moment in the leakage distributions. When only masking is applied and the masked implementation is “perfect” (meaning that the leakage of each share is independent of each other), the order O equals to \(\varOmega \) at best. Finally, the number of dimensions (or dimensionality) used in the traces, next denoted as D, is a property of the adversary. In this respect, adversaries may sometimes be interested by using the lowest possible D (since it makes the detection of POIs in the traces easier). But from the measurement complexity point of view, they have a natural incentive to use D as large as possible. A larger dimension D allows to increase the signal to noise ratio [5].

In summary, our notations are:

  • \(\varOmega \): number of shares in the masking countermeasure,

  • \(\varPi \): length of the shuffling countermeasure,

  • O: order of the implementation,

  • D: dimensionality of the leakages.

Examples. Existing masking schemes combine these four values in a variety of manners. For example, in a perfect hardware masked implementation case with three shares, we may have \(\varOmega =3\), \(O=3\) and \(D=1\) (since the three shares are manipulated in parallel). If this implementation is not perfect, we may observe lower order leakages (e.g. \(\varOmega =3\), \(O=1\) and \(D=1\), that is a first-order leakage). And in order to prevent such imperfections, one may use a Threshold Implementation [24], in which case one share will be used to prevent glitches (so \(\varOmega =3\), \(O=2\) and \(D=1\)). If we move to the software case, we may then have more informative dimensions, e.g. \(\varOmega =3\), \(O=3\), \(D=3\) if the adversary looks for a single triple of informative POIs. But we can also have a number of dimensions significantly higher than the order (which usually corresponds to stronger attacks). Let us also give an example of S-boxes masking with one mask, where the masking process of the S-box (often called recomputation) is shuffled. A permutation \(\varPhi \) of \(\varPi =2^n\) values is applied while computing the masked table. If the attacker ignores the recomputation step, he can carry out an attack on the already computed table. Hence parameters \(\varOmega =2\), \(O=2\), \(D=2\) (also known as “second-order bivariate CPA”). But the attacker can also exploit the shuffled recomputation of the S-box in addition to a table look-up, as presented in [7]; the setting is thus highly multivariate: \(\varOmega =2\), \(\varPi =2^n\), \(O=2\), \(D=2\cdot 2^n+1\). Interestingly, the paper [7] shows an attack at degree \(L=3\) which succeeds in less traces than attacks at minimal degree \(L=O=2\).

In general, a template attack based on mixture distributions (often used in parametric estimation) would require a summation over all random values of the countermeasure, that is \(\mathcal {R}\), which consists in the set of masks and permutations. One can represent \(\mathcal {R}\) as the Cartesian product of the set of mask and the set of permutations. Let us denote by \(\mathcal {M}\) the set of mask and \(\mathcal {S}\) the set of permutations. Then \(\mathcal {R}= \mathcal {M} \times \mathcal {S}\). Therefore, the cardinality of \(\mathcal {R}\) is \(2^{n(\varOmega -1)} \varPi !\).

Eventually, the security of a masked implementation depends on its order and noise level. More precisely, the security increases exponentially with the order (with the noise as basis) [12]. So for the designer, there is always an incentive to increase the noise and order. And for adversary, there is generally an incentive to use the largest possible D (given the time constraints of his attack), so that he decreases the noise.

2.2 Model

We characterize the protection level in terms of the most powerful attacker, namely an attacker who knows everything about the design, except the masks and the noise. This means that we consider the case where the templates are known. How the attacker got the templates is related with security by obscurity, somehow he will know the model. Of course depending on the learning phase these estimations can be more or less accurate. For the sake of simplicity we assume in this paper the better scenario where all the estimations are exactFootnote 1.

Besides, we assume that the noise is independently distributed over each dimension. This is the least favorable situation for the attacker (as there is in this case the most noise entropy). For the sake of simplicity, we assume that the noise variance is equal to \(\sigma ^2\) at each point \(d=1,2,\ldots ,D\). This allows for a simple theoretical analysis. Let us give an index \(q=1,2,\ldots , Q\) to each trace. For one trace q, the model is written as:

$$\begin{aligned} X = y(t, k^*, R) + N, \end{aligned}$$
(1)

where for notational convenience the dependency in q and d has been dropped. Here X is a leakage measurement; \(y=y(t,k^*,R)\) is the deterministic part of the model that depends on the correct key \(k^*\), some known text (plaintext or ciphertext) t, and the unknown random values (masks and permutations) R. Each sample (of index d) of N is a random noise, which follows a Gaussian distribution \(p_N(z) = \frac{1}{\sqrt{2\pi \sigma ^2}} \exp \left( - \frac{z^2}{2\sigma ^2} \right) \).

Uppercase letters are generally used for random variables and the corresponding lowercase letters for their realizations. Bold symbols are used to denote vectors that have length Q, the number of measurements. Namely, \(\mathbf {X}\) denotes a set of Q random variables i.i.d. with the same law as X. So, \(\mathbf {X}\) is a \(Q\times D\) matrix; \(\mathbf {R}\) denotes a set of random variables i.i.d. with the same law as R; \(\mathbf {t}\) denotes the set of input texts of the measurements \(\mathbf {X}\); \(y(\mathbf {t}, k, \mathbf {R})\) denotes the set of leakage models, where k is a key guess, \(k^*\) being the correct key value.

Notations \(\mathbf {X}_d\) and \(\mathbf {X}^{\left( q \right) }\) are used to denote the d-th column and the q-th line of the matrix \(\mathbf {X}\), respectively.

We are interested in attacks where each intermediate data is a n-bit vector. In particular, we target S-boxes, denoted by S. Regarding the transduction from the intermediate variable to the real-valued leakage, we take the example of the Hamming weight \(w_H\) defined by \(w_H(z)=\sum _{i=1}^n z_i\) where \(z_i\) is the ith bit of z.

3 A Generic Log-Likelihood for Masked Implementations

In this section we derive a rounded version of Template Attack. Namely we expand a particular instantiation of the template attack the so-called optimal distinguisher using its Taylor Expansion. By rounding this expansion at the Lth degree we are able to build a rounded version of the optimal distinguisher (later defined as \({\text {ROPT}}_{L}\)). This attack features two advantages: it allows to combine different statistical moments and its complexity becomes manageable.

3.1 Maximum Likelihood (ML) Attack

The most powerful adversary knows exactly the leakage model (but the actual key, the masks, and the noise are unknown during the online step) and computes a likelihood. In the case of masking the optimal distinguisher which maximize the success rate is given by [6]:

Theorem 1 (Maximum Likelihood)

When the \(y\left( t,k,R \right) \) are known and the Gaussian noise N is i.i.d. across the queries (measurements) and independent across the dimension, then the optimal distinguisher is:

(2)

where the expectation operator \(\mathbb {E}\) is applied with respect to the random variable \(R\in \mathcal {R}\), and the norm is the Euclidean norm \(\Vert x^{(q)}-y(t^{(q)},k,R)\Vert ^2 = \sum _{d=1}^D (x^{(q)}_d-y_d(t^{(q)},k,R))^2\).

Proof

It is proven in [6] that the Maximum Likelihood distinguisher is:

Applying (1) for Gaussian noise and taking the logarithm yields (2).    \(\square \)

In the sequel, we denote by \(LL^{(q)} = \log \mathbb {E}_R \exp \frac{-\Vert x^{(q)}-y(t^{(q)},k,R)\Vert ^2 }{2\sigma ^2}\) the contribution of one trace q of the Log-Likelihood full distinguisher \(LL=\sum _{q=1}^Q LL^{(q)}\).

Remark 1

Notice that for each trace q, the Maximum Likelihood distinguisher involves a summation over \(\#\mathcal {R}\) values, which correspond to \(\#\mathcal {R}\) accesses to precharacterized templates.

If \(D=1\), then the signal-to-noise ratio (SNR) is defined in a natural way as the ratio between the variance of the model Y and the variance of the noise N. But when the setup is multivariate, it is more difficult to quantify a notion of SNR. For this reason, we use the following quantity

$$\begin{aligned} \gamma = \frac{1}{2\sigma ^2}, \end{aligned}$$
(3)

which is actually proportional to an SNR, in lieu of SNR. In practice, we assume that \(\gamma \) is small. It is indeed a condition for masking schemes to be efficient (see for instance [12]).

Proposition 1 (Taylor Expansion of Optimal Attacks in Gaussian Noise)

The attack consists in maximizing the sum over all traces \(q=1,\ldots ,Q\) of

$$\begin{aligned} \sum _{\ell =1}^{+\infty } \frac{\kappa _\ell }{\ell !} (-\gamma )^\ell , \end{aligned}$$
(4)

where \(\kappa _\ell \) is the \(\ell \)th-order cumulant of the random variable \(\Vert x-y(t,k,R)\Vert ^2\), which can be found inductively from \(\ell \)th-order moments:

$$\begin{aligned} \mu _\ell = \mathbb {E}_R \bigl ( \Vert x-y(t,k,R)\Vert ^{2\ell }\bigr ) , \end{aligned}$$
(5)

using the relation:

$$\begin{aligned} \kappa _\ell = \mu _\ell - \sum _{\ell '=1}^{\ell -1} \left( {\begin{array}{c}\ell -1\\ \ell '-1\end{array}}\right) \kappa _{\ell '} \mu _{\ell -\ell '} \qquad (\ell \ge 1) . \end{aligned}$$
(6)

Proof

The log-likelihood can be expanded according to the increasing powers of the SNR as:

$$\begin{aligned} \log \mathbb {E}\exp \bigl (-\gamma \Vert x-y(t,k,R)\Vert ^2 \bigr ) = \sum _{\ell =1}^{+\infty } \frac{\kappa _\ell }{\ell !} (-\gamma )^\ell , \end{aligned}$$
(7)

where we have recognized the cumulant generating function [34]. The above relation (6) between cumulants and moments is well known [39].    \(\square \)

Definition 1

The Taylor expansion of the log-likelihood truncated to the Lth degree \(\mathrm {LL}_L\) in SNR is

$$\begin{aligned} \mathrm {LL}_L = \sum _{\ell =1}^L (-1)^\ell \kappa _\ell \frac{\gamma ^\ell }{\ell !} . \end{aligned}$$
(8)

Put differently, we have \(\mathrm {LL}=\mathrm {LL}_L + o(\gamma ^L)\) (using the Landau notation). The optimal attack can now be “rounded” in the following way:

Definition 2

(Rounded OPTimal Attack of Degree L in \(\gamma \) ). The rounded optimal Lth-degree attack consists in maximizing over the key hypothesis the sum over all traces of the Lth order Taylor expansion \(\mathrm {LL}_L\) in the SNR of the log-likelihood :

(9)

Proposition 2

If the degree L is smaller than the order O of the countermeasure then the attack fails to distinguish the correct key.

Proof

One can notice that \(\mu _{\ell }\) combines (by a product) a most \(\ell \) terms following the formula:

$$\begin{aligned} \mu _{\ell } = \sum _{k_1+\ldots +k_D=\ell } { \ell \atopwithdelims ()k_1,\ldots ,k_D} \ \mathbb {E}{\prod _{0<i<D+1} (x_i-y_i)^{2 \cdot k_i}}, \end{aligned}$$

with \(k_1+\ldots +k_d=\ell \). It implies that it exits at most \(\ell \) different \(k_i > 0\) and as a consequence there are at most \(\ell \) different variables in the expectation. Therefore by definition of a perfect masking scheme \(\mu _L\) does not depend on the key. As a consequence \(\mathrm {LL}_{L}\) with \(L < O\) neither depends on the key.    \(\square \)

Theorem 2

Let an implementation be secure at order O. The lowest-degree successful attack is the one at degree \(L=O\) which maximizes \(\mathrm {LL}_{L}\). This is equivalent to summing

$$ \mu _{L} = \mathbb {E}_R \bigl ( \Vert x-y(t,k,R)\Vert ^{2L}\bigr ) , $$

over all traces and

  • maximize the result over the key hypotheses, if L is even;

  • minimize the result over the key hypotheses, if L is odd.

Proof

Since \(\kappa _\ell \) is independent of k for all \(\ell \le L\), the first sensitive contribution to the log-likelihood is

$$ (-1)^{L} \kappa _{L} \frac{\gamma ^{L}}{L!} . $$

Now, \(\kappa _{L} = \mu _{L} +\) lower order terms (which do not depend on the key as the implementation is secure at order O), and removing constants independent of k the contribution to the log-likelihood reduces to \((-1)^{L} \mu _{L}\).   \(\square \)

Theorem 3

(Mixed Degree Attack). Assuming an implementation secure at order O, the next degree successful attack is the one at degree \(L+1=O+1\) which maximizes \(\mathrm {LL}_{L+1}\). This is equivalent to summing

$$ \mu _{L} (1+\gamma \mu _1) - \gamma \frac{\mu _{L+1}}{L+1} , $$

over all traces and

  • maximize the result over the key hypotheses, if L is even;

  • minimize the result over the key hypotheses, if L is odd.

Proof

The \((L+1)\)th-order term in the log-likelihood becomes

$$ (-1)^{L} \kappa _{L} \frac{\gamma ^{L}}{L!} +(-1)^{L+1} \frac{\kappa _{L+1}}{(L+1)!} \gamma ^{L+1} . $$

Now from (6) we have, for \(L>0\)

$$ \kappa _{L+1} = \mu _{L+1} - (L+1) \mu _{L} \mu _1 + \text { lower-order terms} . $$

Removing terms that do not depend on k, we obtain:

$$ (-1)^{L}\gamma ^{L}\Bigl ( \mu _{L} - \gamma ( \frac{\mu _{L+1}}{L+1} - \mu _{L} \mu _1) \Bigr ) . $$

Compared to a Lth-degree attack, we see that \(\mu _{L}\) is replaced by a corrected version:

$$ \mu _{L} (1+\gamma \mu _1) - \gamma \frac{\mu _{L+1}}{L+1} , $$

where \(\mu _1\) is independent of k. However, \(\mu _1\) cannot be removed as it scales the relative contribution of \(\mu _{L}\) and \(\mu _{L+1}\) in the distinguisher.   \(\square \)

Remark 2

In contrast to \(\mathrm {LL}_{L}\), implementing \(\mathrm {LL}_{L+1}\) requires knowledge of the SNR parameter \(\gamma =1/2\sigma ^2\).

Remark 3

In general, when \(L \ge O\) the rounded optimal attack \({\text {ROPT}}_{L}\) exploits all key dependent terms of degree \(\ell \), where \(O\le \ell \le L\), whereas an LO-CPA [8] or MCP-DPA [22] only exploit the term of degree L.

4 Case Study: Shuffled Table Recomputation

In this section we apply the \({\text {ROPT}}_{L}\) formula of Eq. (9) of Definition 2 to the particular case of a block cipher with a shuffled table recomputation stage. We show that in this scenario our new method allows to build a better attack than that from the state-of-the-art. By combining the second and the third cumulants we construct an attack which is better than:

  • any second-order attack;

  • the attack presented at CHES 2015. Following the notations of [7] we denote this attack by \({\text {MVA}}_{TR}\) (which stands for Multi-Variate Attack on Table Recomputation) in the rest of this article. This is a third-order attack that achieves better results than \({\text {2O-CPA}}\) when the noise level \(\sigma \) is below a given threshold (namely \(\sigma ^2 \le 2^{n-2} - n/2\)).

4.1 Parameters of the Randomization Countermeasure

In order to validate our results we take as example a first order (\(O=2\)), masking scheme where the sensitive variables are split into two shares (\(\varOmega =2\)). The nonlinear part of this scheme is computed using a table recomputation stage. This step is shuffled (\(\varPi =2^n\)) for protection against some known attacks [26, 36]. The beginning of this combined countermeasure is given in Algorithm 1. The table is recomputed in a random order from line 3 to line 7.

figure a

We used lower case letter (e.g., m, \(\varphi \)) for the realizations of random variables, written upper-case (e.g., M, \(\varPhi \)). For the sake of simplicity in the rest of this case study, we assume that \(m = m'\).

An overview of the leakages over time is given in Fig. 1.

Fig. 1.
figure 1

Leakages of the shuffled table recomputation scheme

We detail below the mathematical expression of these leakages. The randomization consists in one mask M chosen randomly in \(\{0,1\}^n\), and one shuffle (random permutation of \(\{0,1\}^n\)) denoted by \(\varPhi \). Thus, we denote \(R=(M,\varPhi )\), which is uniformly distributed over the Cartesian product \(\{0,1\}^n \times S_{2^n}\) (i.e. \(\mathcal {M} =\{0,1\}^n\) and \(\mathcal {S}=S_{2^n}\)), where \(S_{m}\) is the symmetric group of m elements. We have \(D=2^{n+1}+2\) leakage models, namely:

  • \(X_0=y_0\left( t,k,R\right) +N_0 \) with \(y_0\left( t,k,R\right) = w_H(M)\),

  • \(X_1=y_1\left( t,k,R\right) +N_1\) with \(y_1\left( t,k,R\right) = w_H(S[T\oplus k]\oplus M)\),

  • \(X_i=y_i\left( t,k,R\right) +N_i\), for \(i=2,\ldots ,2^n+1\) with \(y_i\left( t,k,R\right) = w_H(\varPhi (i-2)\oplus M)\),

  • \(X_j=y_j\left( t,k,R\right) +N_j\), for \(j=2^n+2,\ldots ,2^{n+1}+1\) with \(y_j\left( t,k,R\right) = w_H(\varPhi (j-2^n-2))\).

We recall that we assume the noises N are i.i.d. Clearly, there is a second-order leakage, as the pair \((X_0,X_1)\) does depend on the key. But there is also a large multiplicity of third-order leakages, such that \((X_1,X_i,X_{j=i+2^n})\), as will be analyzed in this case-study.

The following side-channel attacks are applied on a set of Q realizations. Let us define I and J as \(I=\llbracket 2, 2^n + 1 \rrbracket \) and \(J = \llbracket 2^n+2, 2 \times 2^n + 1 \rrbracket \). Then the maximal dimensionality is \(D=2+2\times 2^n\), and we denote a sample d as \(d\in \{0,1\}\cup I\cup J\). The Q leaks (resp. models) at sample d are denoted as \(\mathbf {x}_d\) and \(\mathbf {y}_d = y_d(\mathbf {t},k,R)\).

In order to simplify the notations we introduce

$$\begin{aligned} f_d^{\left( q \right) } = \left( x_d^{\left( q \right) }- y_d\left( t^{\left( q \right) },k,R\right) \right) ^2 , \end{aligned}$$
(10)

with \(d \in \left\{ 0,1 \right\} \cup I \cup J\). The \(^{\left( q \right) } \) can be omitted where there is no ambiguity.

4.2 Second-Order Attacks

As any other high order masking scheme, our example can be defeated by High Order Attacks [8, 20, 29, 38]. As our scheme is a first order masking scheme with two shares it can be defeated using a second order attack [8, 20] which combines the leakages of the two shares using a combination function [8, 20, 25] such as the second order CPA (\({\text {2O-CPA}}\)) with the centered product as combination function.

Using our notation it implies \(D=2\).

Definition 3

( \({\text {2O-CPA}}\) [29]). We denote by \({\text {2O-CPA}}\) the \(\mathsf {CPA}\) using the centered product as combination function. Namely:

(11)

where \(\mathbf {y} =\mathbb {E}_M \left( y_0\left( \mathbf {t}, k, R \right) \circ y_1\left( \mathbf {t}, k, R \right) \right) \), \(\circ \) is the element wise product and \(\widehat{\rho }\) is an estimator of the Pearson coefficient. It can be noticed that as the terms \( y_0\left( \mathbf {t}, k, R \right) \) and \( y_1\left( \mathbf {t}, k, R \right) \) only depend on M the expectation is only computed over \(\mathcal {M}\).

Remark 4

Here we have assumed without loss of generality that the leakages and the model are centered.

An attacker can restrict himself in order to ignore the recomputation stage. Since such attacker ignores the table recomputation no random shuffle is involved. As a consequence the optimal distinguisher restricted to these leakages becomes computable. Nevertheless as we will see in Sect. 6 this approach is not the best. Indeed a lot of exploitable information is lost by not taking into account the table recomputation.

Definition 4

( \({\text {OPT}}_{{\text {2O}}}\) Distinguisher — Eq. (2) for \(D=2\) ). We define by \({\text {OPT}}_{{\text {2O}}}\) the optimal attack which targets the mask and the masked sensitive value.

(12)

with \(f_d^{\left( q \right) }\) as defined in Eq. (10).

4.3 Exploiting the Shuffled Table Recomputation Stage

It is known that the table recomputation step can be exploited to build better attacks than second order attacks [6, 36]. Recently a new attack has been presented which remains better than the \({\text {2O-CPA}}\) even when the recomputation step is protected [7]. Let us recall the definition of this attack:

Definition 5

( \({\text {MVA}}_{TR}\) [7]). The MultiVariate Attack (MVA) exploiting the leakage of the table recomputation (TR) is given by the function:

(13)

where, like for Definition 3, \(\mathbf {y} =\mathbb {E}_M \left( y_0\left( \mathbf {t}, k, R \right) \circ y_1\left( \mathbf {t}, k, R \right) \right) \), \(\circ \) is the element wise product and \(\widehat{\rho }\) is an estimator of the Pearson coefficient.

Let us now apply our new \({\text {ROPT}}_{L}\) on a block cipher protected with a shuffled table recomputation. In this case the lower moments are given by:

$$\begin{aligned} \mu _\ell = \mathbb {E}\!\left[ { \bigg (\sum _d f_d \bigg )^\ell } \right] = \mathbb {E}\!\left[ { \Big ( \underbrace{f_0}_{S[t \oplus k] \oplus M} + \underbrace{f_1}_{M} + \sum _{i \in I }\underbrace{f_i}_{\varPhi \left( \omega \right) \oplus M } + \sum _{j \in J }\underbrace{f_j}_{\varPhi \left( \omega \right) } \Big )^\ell } \right] . \end{aligned}$$

Proposition 3

The second degree rounded optimal attack on the table recomputation is:

(14)

Proof

Combine Theorem 2 and Eq. (30) of Appendix A.2.   \(\square \)

Remark 5

The \({\text {ROPT}}_{2}\) which targets the second order moment happens not to take into account the terms of the recomputation stage. Naturally the only second order leakages are also the ones used by \({\text {2O-CPA}}\) and \({\text {OPT}}_{{\text {2O}}}\) distinguishers.

Proposition 4

The third degree rounded optimal attack on the table recomputation is:

(15)

where the values of \( \mu _1^{\left( q \right) }\), \(\mu _2^{\left( q \right) }\) and, \( \mu _3^{\left( q \right) }\) are respectively provided in Eq. (22) of Appendix A.1, Eq. (30) of Appendix A.2 and Eq. (33) of Appendix A.3.

Proof

Combining Theorem 2 and Appendix A.   \(\square \)

Proposition 5

To compute \(\mu _1\), \(\mu _2\) and \(\mu _3\) an attacker does not need to compute the expectation over \(S_{2^n}\).

Proof

Proof given in Appendix A.   \(\square \)

5 Complexity

In this section we give the time complexity needed to compute \({\text {OPT}}\) and \({\text {ROPT}}_{L}\). We also show that when \(L \ll D\) the complexity of \({\text {ROPT}}_{L}\) remains manageable whereas the complexity of \({\text {OPT}}\) is prohibitive. In this section all the complexities are computed for one key guess.

5.1 Complexity in the General Case

Let us first introduce an intermediate lemma.

Lemma 1

The complexity of computing \(\mu _{\ell }\) (for one trace) is lower than:

$$\begin{aligned} \mathcal {O} \left( { D + \ell -1\atopwithdelims ()\ell } \cdot 2^{\left( \varOmega -1\right) n} \cdot {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , \ell \right) } \right) . \end{aligned}$$
(16)

Proof

See Appendix B.1.   \(\square \)

Proposition 6

The complexity of \({\text {OPT}}\) is:

$$\begin{aligned} \mathcal {O} \left( Q \cdot (2^n)^{\varOmega -1}\cdot \varPi ! \cdot D \right) . \end{aligned}$$
(17)

The complexity of \({\text {ROPT}}_{L}\) is lower than:

$$\begin{aligned} \mathcal {O} \left( Q \cdot L \cdot { D + L-1\atopwithdelims ()L } \cdot 2^{\left( \varOmega -1\right) n} \cdot {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) } \right) . \end{aligned}$$
(18)

Proof

The proof is given in Appendix B.2.   \(\square \)

Proposition 6 allows to compare the complexity of the two attacks. One can notice that there are still terms with \(\varPi !\) or D! in \({\text {ROPT}}_{L}\) such as \({ D + L-1\atopwithdelims ()L }\) or \({\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) }\). Nevertheless these two terms can be seen as constants where \(L \ll D\). As a consequence we have the following remark.

Important Remark. When the degree L of the attack \({\text {ROPT}}_{L}\) is such that \(L \ll D\) the complexity of \({\text {OPT}}\) is much higher than the complexity of \({\text {ROPT}}_{L}\). Indeed the main term for \({\text {OPT}}\) is \(\varPi !\) whereas the one for \({\text {ROPT}}_{L}\) is \(2^{\left( \varOmega -1\right) n}\).

Proposition 7

The complexity of \({\text {ROPT}}_{L}\) can be reduced to \(\mathcal {O} \left( \! Q \!\cdot \! L \! \cdot \! { D + L -1\atopwithdelims ()L } \! \right) \) with a precomputation in \(\mathcal {O} \left( L \cdot { D + L-1\atopwithdelims ()L } \cdot 2^{\left( \varOmega -1\right) n} \cdot {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) } \right) \).

Proof

See Appendix B.3.   \(\square \)

This means that for Q large enough i.e. when \(\gamma \) is low enough this computational “trick” allows a speed-up factor of \(2^{\left( \varOmega -1\right) n} {\varPi \atopwithdelims ()\min \left( \left\lceil { \frac{\varPi }{2}} \right\rceil , L \right) }\). The idea is to output the values depending on the queries from the computation of the expectations. These expectations only depend on the model which can be computed only once.

5.2 Complexity of Our Case Study

Let us now compute the complexity of these two distinguishers applied to our case study. Of course an approach could be to use the formula of the previous Sect. 5.1. But one can notice that a lot of terms could be independent of the key and as consequence not needed in an attack. Another approach is to use the formula of the distinguisher.

Proposition 8

The complexity of \({\text {OPT}}\) is:

$$\begin{aligned} \mathcal {O}\left( Q \cdot (2^n)\cdot 2^n! \cdot \left( 2^{n+1}+2 \right) \right) . \end{aligned}$$
(19)

The complexity of \({\text {ROPT}}_{2}\) is:

$$\begin{aligned} \mathcal {O}\left( Q \cdot 2^n \right) . \end{aligned}$$
(20)

The complexity of \({\text {ROPT}}_{3}\) is lower than:

$$\begin{aligned} \mathcal {O}\left( Q \cdot 2^{4n} \right) . \end{aligned}$$
(21)

Proof

See Appendix B.4.   \(\square \)

Remark 6

As already mentioned an attacker can ignore the leakages of the table recomputation and only target the two shares. In such case the complexity of \({\text {OPT}}_{{\text {2O}}}\) (Definition 4) is \(\mathcal {O}\left( Q \cdot (2^n) \right) \). With the result of Proposition 7 the complexity of \({\text {ROPT}}_{2}\) reduces to \(\mathcal {O}\left( Q \right) \).

Remark 7

Using the result of Proposition 7 the complexity of \({\text {ROPT}}_{3}\) can be reduced to \(\mathcal {O}\left( Q \cdot 2^{2n} \right) \) with a precomputation step of \(\mathcal {O}\left( 2^{2n} \right) \).

Remark 8

A summary of the complexity, and the computation time of the distinguishers are provided in Appendix B.5 in Table 1.

6 Simulation Results

In this section we validate in simulation the soundness of our approach for the case study described in Sect. 4.1. The results of these simulations are expressed in success rate (defined in [32] and denoted by SR). All simulations are computed using the Hamming weight model as a leakage model. As we assume an attacker with a perfect knowledge, the leakages are the model (denoted by y) plus some noise. The noise is Gaussian with a standard deviation of \(\sigma \).

In Subsect. 6.1 we assume that the attacker does not take into account the table recomputation stage. He only targets the leakages of the mask and the masked share (the leakage of masked S-Box). Namely the leakages which occurs in lines 1 and 10 of Algorithm 1. This approach allows to compute the restricted version of the maximum likelihood. We compare the results of the maximum likelihood, our rounded version and the high order attacks.

In Subsect. 6.2 we present our main results. In this subsection the attacker can exploit the leakage of the mask, the masked share and all the leakages of the table recomputation. In this scenario we show that our rounded version of the optimal distinguisher outperforms all the attacks of the state-of-the-art.

6.1 Exploiting only Leakage of the Mask and the Masked Share

In this subsection all the attacks are computed using only the leakages of the line 1 and the line 10 of Algorithm 1.

In this case study we assume a perfect masking scheme with: \(Y_0=w_H(M)\) and \(Y_1=w_H(S[T\oplus k] \oplus M)\).

It can be seen in Fig. 2 that even for small noise (\(\sigma = 1\), Fig. 2a) the \({\text {2O-CPA}}\) and \({\text {ROPT}}_{2}\) are equivalent. Indeed the two curves superimpose almost perfectly (in order to better highlight a difference, as many as 1000 attacks have been carried out for the estimation of the success rate). Moreover these two attacks are nearly equivalent to the optimal distinguisher (we recover here the results of [6]). We can notice that for both \(\sigma = 1\) and \(\sigma = 2\), \({\text {ROPT}}_{4}\) is not as good as \({\text {ROPT}}_{2}\). This means that the noise standard deviation is not large enough for approximations of higher degrees to be accurate. Indeed when the noise is not low enough the weight of each term of the decomposition can be such that some useful terms vanish due to the alternation of positive and negative terms in the Taylor expansion.

Let us recall that the decomposition of Eq. (8) is valid only for low \(\gamma =1/(2\sigma ^2)\) i.e. high noise. The error term ( \(o(\gamma ^L)\)) in the Taylor expansion gives the asymptotic evolution of this error when the noise increases but does not provide information about the error for a fixed value of noise variance. This means that the noise is too small for \({\text {ROPT}}_{4}\) to be a good approximation of \({\text {OPT}}\) although \({\text {ROPT}}_{2}\) is nearly equivalent to \({\text {OPT}}\).

For \(\sigma =2\) the noise is high enough to have a good approximation of \({\text {OPT}}\) by \({\text {ROPT}}_{4}\). For this noise all the attacks are close to \({\text {OPT}}\) (Fig. 2b).

In the context where only the mask and the masked share are used it is equivalent to compute the \({\text {2O-CPA}}\), \({\text {ROPT}}_{2}\) and \({\text {OPT}}\). As a consequence in the rest of this article only the \({\text {2O-CPA}}\) will be displayed.

To conclude our \({\text {ROPT}}_{L}\) is in this scenario at least as good as the HO-CPA of order L, which validates the optimality of state-of-the-art attacks against perfect masking schemes of order \(O=L\).

Fig. 2.
figure 2

Bivariate attacks

Fig. 3.
figure 3

Attack on shuffled table recomputation

6.2 Exploiting the Shuffled Table Recomputation

In this subsection the attacker can target the leakage of the mask, the masked share and all the leakages occurring during the table recomputation. As a consequence the attacks of Subsect. 6.1 remain possible. It has been shown in [6, 33] that the \({\text {2O-CPA}}\) with the centered product becomes close to the \({\text {OPT}}_{{\text {2O}}}\) (the Maximum Likelihood) when the noise becomes high. It is moreover confirmed by our simulation results as it can be seen in Fig. 2. We choose as attack reference for the Fig. 3 the \({\text {2O-CPA}}\) and not the \({\text {OPT}}_{{\text {2O}}}\) because it performs similarly Fig. 2 and it is much faster to compute (see Table 1) which is mandatory for attacks with high noise (e.g. for \(\sigma =12\)) which involve many traces.

Following the formulas provided previously empirical validations have been done. For \(\sigma \le 8\) the attacks have been redone 1000 times to compute the SR. For \(\sigma > 8\) the attacks have been done 250 times. Results are plotted in Fig. 3. In these figures the results of the \({\text {2O-CPA}}\), the \({\text {MVA}}_{TR}\) and \({\text {ROPT}}_{3}\) are plotted. Noticed that the likelihood is not represented because we cannot average over R.

Recall that the cardinality of the support of R is \(2^n \times 2^n!\). It can be first noticed that for all the noises \({\text {ROPT}}_{3}\) is the best attack.

Let us analyze how much better \({\text {ROPT}}_{3}\) is than \({\text {2O-CPA}}\) and \({\text {MVA}}_{TR}\). The comparison with our new attack can be divided in three different categories. For low noise \(\sigma = 3 \) (see Fig. 3b) the results of \({\text {ROPT}}_{3}\) are similar to the results of \({\text {MVA}}_{TR}\). This means that the leakage of the shuffled table recomputation is the most leaking term in this case. At the opposite when the noise is high (for \(\sigma =12 \) see Fig. 3g) \({\text {ROPT}}_{3}\) becomes close to \({\text {2O-CPA}}\) which means that as expected the most informative part is the second order term. For medium noise \(7 \le \sigma \le 9 \) (see Fig. 3d, e and f) the results of \({\text {ROPT}}_{3}\) are much better than the result of \({\text {2O-CPA}}\) and \({\text {MVA}}_{TR}\). Moreover, the gain compared to the second best attack is maximum when the results of \({\text {2O-CPA}}\) and \({\text {MVA}}_{TR}\) are the same. Indeed for \(\sigma = 7\) (see Fig. 3d), \({\text {ROPT}}_{3}\) needs 35000 traces to reach 80 % of success whereas \({\text {MVA}}_{TR}\) (the second best attack) needs 60000 traces. This represents a gain of 71 %. For \(\sigma = 8\) (see Fig. 3e), \({\text {ROPT}}_{3}\) needs 65000 traces to reach 80 % of success whereas the \({\text {MVA}}_{TR}\) and the \({\text {2O-CPA}}\) needs 120000 traces. This represents a gain of 85 %. And when the noise increases to \(\sigma = 9 \) (see Fig. 3f), \({\text {ROPT}}_{3}\) needs 120000 traces to reach 80 % of success whereas \({\text {2O-CPA}}\) (the second best attack) needs 200000 traces, which is a gain of 66 %.

These results can be interpreted as follows: The \({\text {MVA}}_{TR}\) is a third order attack which depends on the third order moment. The \({\text {2O-CPA}}\) is a second order attack which depends on the second order moment. The new \({\text {ROPT}}_{3}\) attack combines these two moments. When the noise is low the \({\text {MVA}}_{TR}\) and the \({\text {ROPT}}_{3}\) performs similarly; this shows that the dominant term in the Taylor expansion is the third order one. At the opposite when the noise increases the \({\text {ROPT}}_{3}\) becomes close to the \({\text {2O-CPA}}\) which indicates that the important term in the Taylor expansion is the second order one. As \({\text {ROPT}}_{3}\) combines the second and the third order moment weighted by the SNR it is always better than any attack exploiting only one moment.

7 Conclusions and Perspectives

In this article, we derived new attacks based on the Lth degree Taylor expansion in the SNR of the optimal Maximum Likelihood distinguisher. We have shown that this Lth degree truncation allows to target a moment of order L. The new attack outperforms the optimal distinguisher with respect to time complexity. In fact as we have theoretically shown, the Taylor approximation can be effectively computed whereas the fully optimal maximum likelihood distinguisher, was not computationally tractable.

We have illustrated this property by applying our new method in a complex scenario of “shuffled table recomputation” and have compared the time complexity of the new attack and the optimal distinguisher. In addition, we have shown that in this context our attack has a higher success rate than all the attacks of the state-of-art over all possible noise variances.

An open question is how to quantify the accuracy of the approximation \(\mathrm {LL} \longrightarrow \mathrm {LL}_\ell \) as a function of the noise. In other words, what is the optimal degree of the Taylor expansion of the likelihood for a given SNR? Another interesting extension of this framework would be on hardware devices which are known to leak at various orders (see the real-world examples in [2123]).