Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Side-channel analysis is a class of cryptanalytic attacks that exploit the physical environment of a cryptosystem to recover some leakage about its secrets. To secure implementations against this threat, security developers usually apply techniques inspired from secret sharing [Bla79, Sha79] or multi-party computation [CCD88]. The idea is to randomly split a secret into several shares such that the adversary needs all of them to reconstruct the secret. For these schemes, the number of shares \(n\) in which the key-dependent data are split plays the role of a security parameter.

A common countermeasure against side-channel attacks consists in using the masking scheme originally introduced by Ishai, Sahai and Wagner (ISW) [ISW03]. The countermeasure achieves provable security in the so-called probing security model [ISW03], in which the adversary can recover a limited number of intermediate variables of the computation. This model has been argued to be practically relevant to address so-called higher-order side-channel attacks and it has been the basis of several efficient schemes to protect block ciphers.

More recently, it has been shown in [DDF14] that the probing security of an implementation actually implies its security in the more realistic noisy leakage model introduced in [PR13]. More precisely, if an implementation obtained by applying the compiler in [ISW03] is secure at order \(n\) in the probing model, then [DFS15, Theorem3] shows that the success probability of distinguishing the correct key among \(|\mathbb {K}|\) candidates is bounded above by \(|\mathbb {K}|\cdot 2^{-n/9}\) if the leakage \(L_i\) on each intermediate variable \(X_i\) satisfies:

$$\begin{aligned} \mathrm {I}(X_i;L_i) \leqslant 2\cdot (|\mathbb {K}|\cdot (28n+ 16))^{-2}, \end{aligned}$$

where \(\mathrm {I}(\cdot ; \cdot )\) denotes the mutual information and where the index i ranges from 1 to the total number of intermediate variables.

In this paper we investigate what happens when the above condition is not satisfied. Since the above mutual information \(I(X_i;L_i)\) can be approximated by \(k/(8\sigma ^2)\) in the Hamming weight model in \({\mathbb F}_{2^k}\), where \(\sigma \) is the noise in the measurement (see the full version of this paper [BCPZ16]), this amounts to investigating the security of Ishai-Sahai-Wagner’s (ISW) implementations when the number of shares n satisfies:

$$ n> c\cdot \sigma $$

As already observed in previous works [VGS14, CFG+10], the fact that the same share (or more generally several data depending on the same sensitive value) is manipulated several times may open the door to new attacks which are not taken into account in the probing model. Those attacks, sometimes called horizontal [CFG+10] or (Template) algebraic [ORSW12, VGS14] exploit the algebraic dependency between several intermediate results to discriminate key hypotheses.

In this paper, we exhibit two (horizontal) side channel attacks against the ISW multiplication algorithm. These attacks show that the use of this algorithm (and its extension proposed by Rivain and Prouff in [RP10]) may introduce a weakness with respect to horizontal side channel attacks if the sharing order \(n\) is such that \(n > c \cdot \sigma ^2\), where \(\sigma \) is the measurement noise. While the first attack is too costly (even for low noise contexts) to make it applicable in practice, the second attack, which essentially iterates the first one until achieving a satisfying likelihood, shows very good performances. For instance, when the leakages are simulated by noisy Hamming weights computed over \({\mathbb F}_{2^8}\) with \(\sigma =1\), it recovers all the shares of a 21-sharing. We also confirm the practicality of our attack with a real life experiment on a development platform embedding the ATMega328 processor (see the full version of this paper [BCPZ16]). Actually, in this context where the leakages are multivariate and not univariate as in our theoretical analyses and simulations, the attack appears to be more efficient than expected and recovers all the shares of a \(n\)-sharing when \(n\geqslant 40\).

Eventually, we describe a variant of Rivain-Prouff’s multiplication that is still provably secure in the original ISW model, and also heuristically secure against our new attacks. Our new countermeasure is similar to the countermeasure in [FRR+10], in that it can be divided in two steps: a “matrix” step in which starting from the input shares \(x_i\) and \(y_j\), one obtains a matrix \(x_i \cdot y_j\) with \(n^2\) elements, and a “compression” step in which one uses some randomness to get back to a n-sharing \(c_i\). Assuming a leak-free component, the countermeasure in [FRR+10] is proven secure in the noisy leakage model, in which the leakage function reveals all the bits of the internal state of the circuit, perturbed by independent binomial noise. Our countermeasure does not use any leak-free component, but is only heuristically secure in the noisy leakage model (see Sect. 8.2 for our security analysis).

2 Preliminaries

For two positive integers \(n\) and d, a \((n,d)\) -sharing of a variable x defined over some finite field \({\mathbb F}_{2^k}\) is a random vector \((x_1, x_2, \ldots , x_{n})\) over \({\mathbb F}_{2^k}\) such that \(x=\sum _{i=1}^{n} x_i\) holds (completeness equality) and any tuple of \(d-1\) shares \(x_i\) is a uniform random vector over \(({\mathbb F}_{2^k})^{d-1}\). If \(n=d\), the terminology simplifies to \(n\) -sharing. An algorithm with domain \(({\mathbb F}_{2^k})^{n}\) is said to be \((n-1)\)th-order secure in the probing model if on input an \(n\)-sharing \((x_1, x_2, \ldots , x_{n})\) of some variable x, it admits no tuple of \(n-1\) or fewer intermediate variables that depends on x.

We refer to the full version of this paper [BCPZ16] for the definitions of Signal to Noise Ratio (SNR), Gaussian distribution, entropy and differential entropy.

3 Secure Multiplication Schemes

In this section, we recall the secure multiplication scheme over \({\mathbb F}_2\) introduced in [ISW03] and its extension to any field \({\mathbb F}_{2^k}\) proposed in [RP10].

Ishai-Sahai-Wagner’s Scheme [ISW03]. Let \(x^{\star }\) and \(y^{\star }\) be binary values from \({\mathbb F}_2\) and let \({(x_i)}_{1\le i \le n}\) and \({(y_i)}_{1\le i \le n}\) be \(n\)-sharings of \(x^{\star }\) and \(y^{\star }\) respectively. To securely compute a sharing of \(c = x^{\star }\cdot y^{\star }\) from \({(x_i)}_{1\le i \le n}\) and \({(y_i)}_{1\le i \le n}\), the ISW method works as follows:

  1. 1.

    For every \(1 \le i < j \le n\), pick up a random bit \(r_{i,j}\).

  2. 2.

    For every \(1 \le i < j \le n\), compute \(r_{j,i} = (r_{i,j} + x_i \cdot y_j) + x_j \cdot y_i\).

  3. 3.

    For every \(1 \le i \le n\), compute \(c_{i} = x_i \cdot y_i + \sum _{j\ne i} r_{i,j}\).

The above multiplication scheme achieves security at order \(\lfloor n/2\rfloor \) in the probing security model [ISW03].

The Rivain-Prouff Scheme. The ISW countermeasure was extended to \({\mathbb F}_{2^k}\) by Rivain and Prouff in [RP10]. As showed in [BBD+15], the SecMult algorithm below is secure in the ISW probing model against t probes for \(n \ge t+1\) shares; the authors also show that with some additional mask refreshing, the Rivain-Prouff countermeasure for the full AES can be made secure with \(n \ge t+1\) shares.

figure a

In Algorithm 1, one can check that each share \(x_i\) or \(y_j\) is manipulated \(n\) times, whereas each product \( x_iy_j\) is manipulated a single time. This gives a total of \(3n^2\) manipulations that can be observed through side channels.

4 Horizontal DPA Attack

4.1 Problem Description

Let \((x_i)_{i\in [1..n]}\) and \((y_i)_{i\in [1..n]}\) be respectively the \(n\)-sharings of \(x^{\star }\) and \(y^{\star }\) (namely, we have \(x^{\star }= x_1 + \cdots + x_n \) and \(y^{\star }= y_1 + \cdots + y_n\)). We assume that an adversary gets, during the processing of Algorithm 1, a single observation of each of the following random variables for \(1 \le i,j \le n\):

$$\begin{aligned} L_{i}= & {} \varphi (x_i)+B_{i} \end{aligned}$$
(1)
$$\begin{aligned} L'_{j}= & {} \varphi (y_j)+B'_{j} \end{aligned}$$
(2)
$$\begin{aligned} L''_{ij}= & {} \varphi (x_i \cdot y_j) + B''_{ij} \end{aligned}$$
(3)

where \(\varphi \) is an unknown function which depends on the device architecture, where \(B_{i}\), \(B_{j}'\) are Gaussian noise of standard deviation \(\sigma /\sqrt{n}\), and \(B_{ij}''\) is Gaussian noise with standard deviation \(\sigma \). Namely we assume that each \(x_i\) and \(y_j\) is processed n times, so by averaging the standard deviation is divided by a factor \(\sqrt{n}\), which gives \(\sigma /\sqrt{n}\) if we assume that the initial noise standard deviation is \(\sigma \). The random variables associated to the ith share \(x_i\) and the jth share \(y_j\) are respectively denoted by \(X_i\) and \(Y_j\). Our goal is to recover the secret variable \(x^{\star }\) (and/or \(y^{\star }\)).

4.2 Complexity Lower Bound: Entropy Analysis of Noisy Hamming Weight Leakage

For simplicity, we first restrict ourselves to a leakage function \(\varphi \) equal to the Hamming weight of the variable being manipulated. In that case, the mutual information \(\mathrm {I}(X;L)\) between the Hamming weight of a uniform random variable X defined over \({\mathbb F}_{2^k}\) and a noisy observation \(L\) of this Hamming weight can be approximated as:

$$\begin{aligned} \mathrm {I}(X;L) \simeq \frac{k}{8\sigma ^2}, \end{aligned}$$
(4)

if the noise being modeled by a Gaussian random variable has standard deviation \(\sigma \). This approximation, whose derivation is given in the full version of this paper [BCPZ16], is only true for large \(\sigma \).

To recover a total of 2n shares (n shares of \(x^{\star }\) and \(y^{\star }\) respectively) from \(3n^2\) Hamming weight leakages (namely each manipulation leaks according to (1)-(3) with \(\varphi =\mathrm {HW}\)), the total amount of information to be recovered is \(2n \cdot k\) if we assume that the shares are i.i.d. with uniform distribution over \({\mathbb F}_{2^k}\). Therefore, since we have a total of \(3n^2\) observations during the execution of Algorithm 1, we obtain from (4) that the noise standard deviation \(\sigma \) and the sharing order \(n\) must satisfy the following inequality for a side channel attack to be feasible:

$$\begin{aligned} 3 \cdot n^2 \cdot \frac{k}{8\sigma ^2} > 2n \cdot k. \end{aligned}$$
(5)

We obtain an equality of the form \(n > c \cdot \sigma ^2\) for some constant c, as in a classical (vertical) side channel attack trying to recover \(x^{\star }\) from \(n\) observations of intermediate variables depending on \(x^{\star }\) [CJRR99]. This analogy between horizontal and vertical attacks has already been noticed in previous papers like [CFG+10] or [BJPW13]. Note that in principle the constant c is independent of the field degree k (which has also been observed in previous papers, see for instance [SVO+10]).

4.3 Attack with Perfect Hamming Weight Observations

In the full version of this paper [BCPZ16], we consider the particular case of perfect Hamming weight measurements (no noise), using a maximum likelihood approach. We show that even with perfect observations of the Hamming weight, depending on the finite-field representation, we are not always guaranteed to recover the secret variable \(x^{\star }\); however for the finite field representation used in AES the attack enables to recover the secret \(x^{\star }\) for a large enough number of observations.

4.4 Maximum Likelihood Attack: Theoretical Attack with the Full ISW State

For most field representations and leakage functions, the maximum likelihood approach used in the previous section recovers the i-th share of \(x^{\star }\) from an observation of \(L_i\) and an observation of \((L_j',L_{ij}'')\) for every \(j\in [1..n]\). It extends straightforwardly to noisy scenarios and we shall detail this extension in Sect. 5.1. However, the disadvantage of this approach is that it recovers each share separately, before rebuilding \(x^{\star }\) and \(y^{\star }\) from them. From a pure information theoretic point of view this is suboptimal since (1) the final purpose is not to recover all the shares perfectly but only the shared values and (2) only 3n observations are used to recover each share whereas the full tuple of \(3n^2\) observations brings more information. Actually, the most efficient attack in terms of leakage exploitation consists in using the joint distribution of \((L_i,L_j',L_{ij}'')_{i,j\in [1..n]}\) to distinguish the correct hypothesis about \(x^{\star }=x_1+x_2+\cdots +x_{n}\) and \(y^{\star }= y_1+y_2+\cdots +y_{n}\).

As already observed in Sect. 3, during the processing of Algorithm 1, the adversary may get a tuple \((\ell _{ij})_{j\in [1..n]}\) (resp. \((\ell _{ij}')_{i\in [1..n]}\)) of \(n\) observations for each \(L_i\) (resp. each \(L'_j\)) and one observation \(\ell ''_{ij}\) for each \(L''_{ij}\). The full tuple of observations \((\ell _{ij},\ell _{ij}',\ell ''_{ij})_{i,j}\) is denoted by , and we denote by \({{\varvec{L}}}\) the corresponding random variableFootnote 1. Then, to recover \((x^{\star },y^{\star })\) from , the maximum likelihood approach starts by estimating the pdfs \(f_{{{\varvec{L}}}\mid X^{\star }=x^{\star },Y^{\star }=y^{\star }}\) for every possible \((x^{\star },y^{\star })\), and then estimates the following vector of distinguisher values for every hypothesis ( xy):

(6)

The pair (xy) maximizing the above probability is eventually chosen.

At a first glance, the estimation of the pdfs \(f_{{{\varvec{L}}}\mid X^{\star }=x^{\star },Y^{\star }=y^{\star }}\) seems to be challenging. However, it can be deduced from the estimations of the pdfs associated to the manipulations of the shares. Indeed, after denoting by \(\mathrm {p}_{x,y}\) each probability value in the right-hand side of (6), and by using the law of total probability together with the fact that the noises are independent, we get:

$$\begin{aligned}&\,\, {2^{2kn}}\cdot \mathrm {p}_{x,y}=\\&\sum _{x_1,\cdots ,x_{n} \in {\mathbb F}_{2^k}\atop x=x_1+\cdots +x_{n}}\sum _{y_1,\cdots ,y_{n} \in {\mathbb F}_{2^k}\atop y=y_1+\cdots +y_{n}} \prod _{i,j=1}^{n} {f}_{L_i\mid X_i}(\ell _{ij},x_i)\cdot {f}_{L_j'\mid Y_j}(\ell _{ij}',y_j) \cdot {f}_{L_{ij}''\mid X_iY_j}(\ell _{ij}'',x_iy_j). \end{aligned}$$

Unfortunately, even if the equation above shows how to deduce the pdfs \({f}_{{{\varvec{L}}}\mid (X^{\star },Y^{\star })}(\cdot ,(x^{\star },y^{\star }))\) from characterizations of the shares’ manipulations, a direct processing of the probability has complexity \(O(2^{2nk})\). By representing the sum over the \(x_i\)’s as a sequence of convolution products, and thanks to Walsh transforms processing, the complexity can be easily reduced to \(O(n2^{n(k+1)})\). The latter complexity stays however too high, even for small values of \(n\) and \(k\), which led us to look at alternatives to this attack.

5 First Attack: Maximum Likelihood Attack on a Single Matrix Row

5.1 Attack Description

In this section, we explain how to recover each share \(x_i\) of \(x^{\star }\) separately, by observing the processing of Algorithm 1. Applying this attack against all the shares leads to the full recovery of the sensitive value \(x^{\star }\) with some success probability, which is essentially the product of the success probabilities of the attack on each share separately.

Given a share \(x_i\), the attack consists in collecting the leakages on \((y_j,x_i \cdot y_j)\) for every \(j\in [1..n]\). Therefore the attack is essentially a horizontal version of the classical (vertical) second-order side-channel attack, where each share \(x_i\) is multiplicatively masked over \({\mathbb F}_{2^k}\) by a random \(y_j\) for \(j\in [1..n]\).

The most efficient attack to maximize the amount of information recovered on \(X_i\) from a tuple of observations consists in applying a maximum likelihood approach [CJRR99, GHR15], which amounts to computing the following vector of distinguisher values:

(7)

and in choosing the candidate \(\hat{x}_i\) which maximizes the probability. We refer to the full version of this paper [BCPZ16] for the derivation of each score in (7); we obtain:

$$\begin{aligned} {f}_{(L'_j,L''_{ij})\mid X_i}((\ell _j',\ell _{ij}''),\hat{x}_i) = \sum _{y \in {\mathbb F}_{2^k}} {f}_{(L'_j,L''_{ij})\mid (X_i,Y_j)}((\ell '_j,\ell ''_{ij}),(\hat{x}_i,y))\cdot \mathrm {p}_{Y_j}(y), \end{aligned}$$
(8)

and similarly:

$$\begin{aligned} {f}_{(L_i,L''_{ij})\mid Y_j}((\ell _i,\ell _{ij}''),\hat{y}_j) = \sum _{x \in {\mathbb F}_{2^k}} {f}_{(L_i,L''_{ij})\mid (X_i,Y_j)}((\ell _i,\ell ''_{ij}),(x,\hat{y}_j))\cdot \mathrm {p}_{X_i}(x)\!. \end{aligned}$$
(9)

5.2 Complexity Analysis

As mentioned previously, given a share \(x_i\), the attack consists in collecting the leakages on \((y_j,x_i \cdot y_j)\) for every \(j\in [1..n]\). Therefore the attack is essentially an horizontal version of the classical (vertical) second-order side-channel attack. In principle the number \(n\) of leakage samples needed to recover \(x_i\) with good probability (aka the attack complexity) should consequently be \(n=\mathcal{O}(\sigma ^{4})\) [CJRR99, GHR15, SVO+10]. This holds when multiplying two leakages both with noise having \(\sigma \) as standard deviation. However here the leakage on \(y_j\) has a noise with a standard deviation \(\sigma /\sqrt{n}\) instead of \(\sigma \) (thanks to the averaging step). Therefore the noise of the product becomes \(\sigma ^2/\sqrt{n}\) (instead of \(\sigma ^2\)), which gives after averaging with n measurements a standard deviation of \(\sigma ^2/n\), and therefore an attack complexity satisfying \(n=\mathcal{O}(\sigma ^2)\), as in a classical first-order side-channel attack.

5.3 Numerical Experiments

The attack presented in Sect. 5.1 has been implemented against each share \(x_i\) of a value x, with the leakages being simulated according to (1)–(3) with \(\varphi =\mathrm {HW}\). For the noise standard deviation \(\sigma \) and the sharing order \(n\), different values have been tested to enlighten the relation between these two parameters. We stated that an attack succeeds iff the totality of the \(n\) shares \(x_i\) have been recovered, which leads to the full recovery of \(x^{\star }\). We recall that, since the shares \(x_i\) are manipulated \(n\) times, measurements for the leakages \(L_i\) and \(L_j'\) have noise standard deviations \(\sigma /\sqrt{n}\) instead of \(\sigma \). For efficiency reasons, we have chosen to work in the finite field \(\mathbb {F}_{2^{4}}\) (namely \(k=4\) in previous analyses).

For various noise standard deviations \(\sigma \) with \(\mathrm {SNR}= k(2\sigma )^{-2}\) (i.e. \(\mathrm {SNR}=\sigma ^{-2}\) for \(k=4\)), Table 1 gives the average minimum number \(n\) of shares required for the attack to succeed with probability strictly greater than 0.5 (the averaging being computed over 300 attack iterations). The attack complexity \(n=\mathcal{O}(\sigma ^2)\) argued in Sect. 5.2 is confirmed by the trend of these numerical experiments. Undeniably, this efficiency is quickly too poor for practical applications where \(n\) is small (clearly lower than 10) and the \(\mathrm {SNR}\) is high (smaller than 1).

Table 1. First attack: number of shares n as a function of the noise \(\sigma \) to succeed with probability \(> 0.5\)

6 Second Attack: Iterative Attack

6.1 Attack Description

From the discussions in Sect. 4.4, and in view of the poor efficiency of the previous attack, we investigated another strategy which targets all the shares simultaneously. Essentially, the core idea of our second attack described below is to apply several attacks recursively on the \(x_i\)’s and \(y_j\)’s, and to refine step by step the likelihood of each candidate for the tuple of shares. Namely, we start by applying the attack described in Sect. 5.1 in order to compute, for every i, a likelihood probability for each hypothesis \(X_i=x\) (x ranging over \({\mathbb F}_{2^k}\)); then we apply the same attack in order to compute, for every j, a likelihood probability for each hypothesis \(Y_j=y\) (y ranging over \({\mathbb F}_{2^k}\)) with the single difference that the probability \(\mathrm {p}_{X_i}(x)\) in (9) is replaced by the likelihood probability which was just computedFootnote 2. Then, one reiterates the attack to refine the likelihood probabilities \((\mathrm {p}_{X_i}(x))_{x\in {\mathbb F}_{2^k}}\), by evaluating (8) with the uniform distribution \(\mathrm {p}_{Y_j}(y)\) being replaced by the likelihood probability \(\mathrm {new}\text {-}\mathrm {p}_{Y_j}(y) \) which has been previously computed. The scheme is afterwards repeated until the maximum taken by the pdfs of each share \(X_i\) and \(Y_j\) is greater than some threshold \(\beta \). In order to have better results, we perform the whole attack a second time, by starting with the computation of the likelihood probability for each hypothesis \(Y_j=y\) instead of starting by \(X_i=x\).

We give the formal description of the attack processing in Algorithm 2 (in order to have the complete attack, one should perform the while loop a second time, by rather starting with the computation of \(\mathrm {new}\text {-}\mathrm {p}_{Y_j}(y)\) instead of \(\mathrm {new}\text {-}\mathrm {p}_{X_i}(x)\)).

figure b

6.2 Numerical Experiments

The iterative attack described in Algorithm 2 has been tested against leakages simulations defined exactly as in Sect. 5.3. As previously we stated that an attack succeeds if the totality of the \(n\) shares \(x_i\) have been recovered, which leads to the full recovery of \(x^{\star }\). For various noise standard deviations \(\sigma \) with \(\mathrm {SNR}= k(2\sigma )^{-2}\), Table 2 gives the average minimum number of shares \(n\) required for the attack to succeed with probability strictly greater than 0.5 (the averaging being computed over 300 attack iterations). The first row corresponds to \(k=4\), and the second row to \(k=8\) (the corresponding \(\mathrm {SNR}\)s are \(\mathrm {SNR}_4=\sigma ^{-2}\) and \(\mathrm {SNR}_8=(\sqrt{2}\sigma ^2)^{-1}\)). Numerical experiments yield greatly improved results in comparison to those obtained by running the basic attack. Namely, in \(\mathbb {F}_{2^{4}}\), for a noise \(\sigma = 0\), the number of shares required is 2, while 12 shares were needed for the basic attack, and the improvement is even more confirmed with a growing \(\sigma \): for a noise \(\sigma = 1\), the number of shares required is 25, while 284 shares were needed for the basic attack. It can also be observed that the results for shares in \(\mathbb {F}_{2^{4}}\) and \(\mathbb {F}_{2^{8}}\) are relatively close, the number of shares being most likely slightly smaller for shares in \(\mathbb {F}_{2^{4}}\) than in \(\mathbb {F}_{2^{8}}\). This observation is in-line with the lower bound in (5), where the cardinality \(2^{k}\) of the finite field plays no role.

Table 2. Iterative attack: number of shares n as a function of the noise \(\sigma \) to succeed with probability \(> 0.5\) in \(\mathbb {F}_{2^{4}}\) (first row) and in \(\mathbb {F}_{2^{8}}\) (second row).

7 Practical Results

In the full version of this paper [BCPZ16], we describe the result of practical experiments of our attack against a development platform embedding the ATMega328 processor.

8 A Countermeasure Against the Previous Attacks

8.1 Description

In the following, we describe a countermeasure against the previous attack against the Rivain-Prouff algorithm. More precisely, we describe a variant of Algorithm 1, called \(\mathsf{RefSecMult}\), to compute an \(n\)-sharing of \(c=x^{\star }\cdot y^{\star }\) from \((x_i)_{i\in [1..n]}\) and \((y_i)_{i\in [1..n]}\). Our new algorithm is still provably secure in the original ISW probing model, and heuristically secure against the horizontal side-channel attacks described the in previous sections.

figure c

As observed in [FRR+10], the ISW and Rivain-Prouff countermeasures can be divided in two steps: a “matrix” step in which starting from the input shares \(x_i\) and \(y_j\), one obtains a matrix \(x_i \cdot y_j\) with \(n^2\) elements, and a “compression” step in which one uses some randomness to get back to a n-sharing \(c_i\). Namely the matrix elements \((x_i \cdot y_j)_{1 \le i,j \le n}\) form a \(n^2\)-sharing of \(x^{\star }\cdot y^{\star }\):

$$\begin{aligned} x^{\star }\cdot y^{\star }=\left( \sum \limits _{i=1}^n x_i \right) \cdot \left( \sum \limits _{j=1}^n y_j \right) =\sum \limits _{1 \le i,j \le n} x_i \cdot y_j \end{aligned}$$
(10)

and the goal of the compression step is to securely go from such \(n^2\)-sharing of \(x^{\star }\cdot y^{\star }\) to a n-sharing of \(x^{\star }\cdot y^{\star }\).

Our new countermeasure (Algorithm 3) uses the same compression step as Rivain-Prouff, but with a different matrix step, called MatMult (Algorithm 4), so that the shares \(x_i\) and \(y_j\) are not used multiple times (as when computing the matrix elements \(x_i \cdot y_j\) in Rivain-Prouff). Eventually the MatMult algorithm outputs a matrix \((M_{ij})_{1 \le i,j \le n}\) which is still a \(n^2\)-sharing of \(x^{\star }\cdot y^{\star }\), as in (10); therefore using the same compression step as Rivain-Prouff, Algorithm 3 outputs a n-sharing of \(x^{\star }\cdot y^{\star }\), as required.

figure d

As illustrated in Fig. 1, the MatMult algorithm is recursive and computes the \(n \times n\) matrix in four sub-matrix blocs. This is done by splitting the input shares \(x_i\) and \(y_j\) in two parts, namely \({\varvec{X}}^{(1)}=(x_1,\ldots ,x_{n/2})\) and \({\varvec{X}}^{(2)}=(x_{n/2+1},\ldots ,x_{n})\), and similarly \({\varvec{Y}}^{(1)}=(y_1,\ldots ,y_{n/2})\) and \({\varvec{Y}}^{(2)}=(y_{n/2+1},\ldots ,y_{n})\), and recursively processing the four sub-matrix blocs corresponding to \({\varvec{X}}^{(u)} \times {\varvec{Y}}^{(v)}\) for \(1 \le u,v \le 2\). To prevent the same share \(x_i\) from being used twice, each input block \({\varvec{X}}^{(u)}\) and \({\varvec{Y}}^{(v)}\) is refreshed before being used a second time, using a mask refreshing algorithm. An example of such mask refreshing, hereafter called \(\mathsf{RefreshMasks}\), can for instance be found in [DDF14]; see Algorithm 5. Since the mask refreshing does not modify the xor of the input n / 2-vectors \({\varvec{X}}^{(u)}\) and \({\varvec{Y}}^{(v)}\), each sub-matrix block \({\varvec{M}}^{(u,v)}\) is still a \(n^2/4\)-sharing of \((\oplus {\varvec{X}}^{(u)}) \cdot (\oplus {\varvec{X}}^{(v)})\), and therefore the output matrix \({\varvec{M}}\) is still a \(n^2\)-sharing of \(x^{\star }\cdot y^{\star }\), as required. Note that without the \(\mathsf{RefreshMasks}\), we would have \(M_{ij}=x_i \cdot y_j\) as in Rivain-Prouff.

figure e

Since the RefreshMask algorithm has complexity \(\mathcal{O}(n^2)\), it is easy to see that the complexity of our RefSecMult algorithm is \(\mathcal{O}(n^2 \log n)\) (instead of \(\mathcal{O}(n^2)\) for the original Rivain-Prouff countermeasure in Algorithm 1). Therefore for a circuit of size |C| the complexity is \(\mathcal{O}(|C| \cdot n^2 \log n)\), instead of \(\mathcal{O}(|C| \cdot n^2)\) for Rivain-Prouff. The following lemma shows the soundness of our RefSecMult countermeasure.

Lemma 1

(Soundness of RefSecMult ). The RefSecMult algorithm, on input \(n\)-sharings \((x_i)_{i\in [1..n]}\) and \((y_j)_{j\in [1..n]}\) of \(x^{\star }\) and \(y^{\star }\) respectively, outputs an \(n\)-sharing \((c_i)_{i\in [1..n]}\) of \(x^{\star }\cdot y^{\star }\).

Proof

We prove recursively that the MatMult algorithm, taking as input \(n\)-sharings \((x_i)_{i\in [1..n]}\) and \((y_j)_{j\in [1..n]}\) of \(x^{\star }\) and \(y^{\star }\) respectively, outputs an \(n^2\)-sharing \(M_{ij}\) of \(x^{\star }\cdot y^{\star }\). The lemma for RefSecMult will follow, since as in Rivain-Prouff the lines 2 to 12 of Algorithm 3 transform a \(n^2\)-sharing \(M_{ij}\) of \(x^{\star }\cdot y^{\star }\) into a n-sharing of \(x^{\star }\cdot y^{\star }\).

The property clearly holds for \(n=1\). Assuming that it holds for n / 2, since the RefreshMasks does not change the xor of the input n / 2-vectors \({\varvec{X}}^{(u)}\) and \({\varvec{Y}}^{(v)}\), each sub-matrix block \({\varvec{M}}^{(u,v)}\) is still an \(n^2/4\)-sharing of \((\oplus {\varvec{X}}^{(u)}) \cdot (\oplus {\varvec{X}}^{(v)})\), and therefore the output matrix \({\varvec{M}}\) is still an \(n^2\)-sharing of \(x^{\star }\cdot y^{\star }\), as required. This proves the lemma.    \(\square \)

Fig. 1.
figure 1

The recursive MatMult algorithm, where R represents the RefreshMasks Algorithm, and \(\otimes \) represents a recursive call to the \(\mathsf{MatMult}\) algorithm.

Remark 1

The description of our countermeasure requires that n is a power of two, but it is easy to modify the countermeasure to handle any value of n. Namely in Algorithm 4, for odd n it suffices to split the inputs \(x_i\) and \(y_j\) in two parts of size \((n-1)/2\) and \((n+1)/2\) respectively, instead of n / 2.

8.2 Security Analysis

Proven Security in the ISW Probing Model. We prove that our RefSecMult algorithm achieves at least the same level of security of Rivain-Prouff, namely it is secure in the ISW probing model against t probes for \(n \ge t+1\) shares. For this we use the refined security model against probing attacks recently introduced in [BBD+15], called t-SNI security. This stronger definition of t-SNI security enables to prove that a gadget can be used in a full construction with \(n \ge t+1\) shares, instead of \(n \ge 2t+1\) for the weaker definition of t-NI security (corresponding to the original ISW security proof). The authors of [BBD+15] show that the ISW (and Rivain-Prouff) multiplication gadget does satisfy this stronger t-SNI security definition. They also show that with some additional mask refreshing satisfying the t-SNI property (such as RefreshMasks), the Rivain-Prouff countermeasure for the full AES can be made secure with \(n \ge t+1\) shares.

The following lemma shows that our RefSecMult countermeasure achieves the t-SNI property; we provide the proof in Appendix A. The proof is essentially the same as in [BBD+15] for the Rivain-Prouff countermeasure; namely the compression step is the same, and for the matrix step, in the simulation we can assume that all the randoms in RefreshMasks are given to the adversary. The t-SNI security implies that our RefSecMult algorithm is also composable, with \(n \ge t+1\) shares.

Lemma 2

( t -SNI of RefSecMult ). Let \((x_i)_{1 \le i \le n}\) and \((y_i)_{1 \le i \le n}\) be the input shares of the \(\mathsf{SecMult}\) operation, and let \((c_i)_{1 \le i<n}\) be the output shares. For any set of \(t_1\) intermediate variables and any subset \(|\mathcal {O}^{}| \le t_2\) of output shares such that \(t_1+t_2<n\), there exists two subsets \(I\) and \(J\) of indices with \(|I| \le t_1\) and \(|J| \le t_1\), such that those \(t_1\) intermediate variables as well as the output shares \(c_{|\mathcal {O}^{}}\) can be perfectly simulated from \(x_{|I}\) and \(y_{|J}\).

Heuristic Security Against Horizontal-DPA Attacks. We stress that the previous lemma only proves the security of our countermeasure against t probes for \(n \ge t+1\), so it does not prove that our countermeasure is secure against the horizontal-DPA attacks described in the previous sections, since such attacks use information about \(n^2\) intermediate variables instead of only \(n-1\).

As illustrated in Fig. 1, the main difference between the new RefSecMult algorithm and the original SecMult algorithm (Algorithm 1) is that we keep refreshing the \(x_i\) shares and the \(y_j\) shares blockwise between the processing of the finite field multiplications \(x_i \cdot y_j\). Therefore, as opposed to what happens in SecMult, we never have the same \(x_i\) being multiplied by all \(y_j\)’s for \(1 \le j \le n\). Therefore an attacker cannot accumulate information about a specific share \(x_i\), which heuristically prevents the attacks described in this paper.