On Adaptive Bandwidth Selection for Efficient MIA

Carbone, Mathieu; Tiran, Sébastien; Ordas, Sébastien; Agoyan, Michel; Teglia, Yannick; Ducharme, Gilles R.; Maurine, Philippe

doi:10.1007/978-3-319-10175-0_7

Mathieu Carbone^16,17,
Sébastien Tiran¹⁷,
Sébastien Ordas¹⁷,
Michel Agoyan¹⁶,
Yannick Teglia¹⁶,
Gilles R. Ducharme¹⁸ &
…
Philippe Maurine^17,19

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8622))

Included in the following conference series:

International Workshop on Constructive Side-Channel Analysis and Secure Design

1157 Accesses
4 Citations

Abstract

Recently, a generic DPA attack using the mutual information index as the side channel distinguisher has been introduced. Mutual Information Analysis’s (MIA) main interest is its claimed genericity. However, it requires the estimation of various probability density functions (PDF), which is a task that involves the complicated problem of selecting tuning parameters. This problem could be the cause of the lower efficiency of MIA that has been reported. In this paper, we introduce an approach that selects the tuning parameters with the goal of optimizing the performance of MIA. Our approach differs from previous works in that it maximizes the ability of MIA to discriminate one key among all guesses rather than optimizing the accuracy of PDF estimates. Application of this approach to various leakage traces confirms the soundness of our proposal.

Download conference paper PDF

On the optimality and practicability of mutual information analysis in some scenarios

Article 20 July 2017

Éloi de Chérisey, Sylvain Guilley, … Olivier Rioul

LDA-Based Clustering as a Side-Channel Distinguisher

Confused yet Successful:

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Adversaries aim at disclosingCarbone, Mathieu Tiran, Sébastien Ordas, Sébastien Agoyan, Michel Teglia, Yannick Ducharme, Gilles R. Maurine, Philippe secret information contained in integrated systems which are currently the main vector of data exchanges. One approach is Side Channel Analysis (SCA), which tries to reveal cryptographic keys by exploiting the information in one or several physical leakages of cryptographic devices, especially power consumption and electromagnetic emanations. In the seminal paper of [1], the difference of means was used as a distinguisher to identify from power consumption leakage the information about the key. Since then, more efficient distinguishers have been considered, notably Pearson’s correlation coefficient [2], leading to a SCA referred to as CPA, and the Mutual Information (MI) index, which appears as a promising alternative because it is capable of capturing any type of association. Mutual Information Analysis (MIA) in SCA was introduced in [3, 4] and much work has been devoted to investigate its potential in attacking cryptographic implementations, featuring various countermeasures and background noises in leakage traces [5, 6]. To summarize, MI was shown generally less efficient than Pearson’s coefficient when the leakage function is nearly linear, as is usually the case in unprotected devices [4, 7]. However, MIA appears promising when an adequate leakage profiling is a priori challenging [8, 9] or for attacking some protected devices [5, 6, 9, 10].

The main difficulty in implementing a MIA is that, in contrast to Pearson’s coefficient which is easily estimated via sample moments, the estimation of the MI index requires the estimation of a number of probability distribution functions (PDF) and this task is, both theoretically and practically, a difficult statistical problem. Further, it has been stated [6, 11, 12] that the choice of a good PDF estimator is crucial for the efficiency of a MIA. Thus, a variety of parametric (cumulant [9] or copula [13]) and nonparametric estimators (histograms [4], splines [14] and kernels [3, 8]) have been explored. Among the nonparametric methods, Kernel Density Estimators (KDE) [15, 16] have emerged in the statistical literature as one of the most popular approaches, in view of their many appealing theoretical properties. However, KDE involves a major technical difficulty because it requires the choice of a crucial tuning parameter, referred to as the bandwidth (see Sect. 3). There exists formulas for choosing in some optimal fashion this bandwidth for the problem of estimating a PDF. Unfortunately, formulas for the problem of estimating the MI index have not yet been developed. Thus, most MIA based on KDE (i.e. KDE-MIA) have taken the route of estimating the PDF using these formulas, the logic being that if all PDF are well estimated, plugging these estimates in the expression of the MI index should yield a good estimator. But these formulas, beside being based on an asymptotic argument (optimizing the trade-off between asymptotic bias and variance), are averages over the whole range of the PDF. Moreover they involve unknown quantities that in turn must be estimated. In practical situations, there is no guarantee that such average estimated values will yield globally good PDF estimators and it is often recommended that they be used as starting points in the estimation process. Thus, applying them in an automatic fashion amounts to using an unsharpened tool. All this is further compounded by the fact that in computing the MI index, many different PDF need to be simultaneously estimated and integrated over their range. As stated by [12], this may help inexplaining the often lower efficiency of a standard MIA, as compared to CPA.

In this paper, we develop a new approach that selects the bandwidth in KDE-MIA from the point of view of optimizing the quality of the attack regarding two criteria, namely efficiency and genericity, instead of aiming at the quality of the PDF estimates. Applying our approach to some data sets, the new MIA, referred to as ABS-MIA (ABS for Adaptive Bandwidth Selection), is much better than the standard MIA and can even compete favorably with CPA.

The paper is organized as follows. Section 2 briefly recalls the modus operandi of SCA attacks and introduces the basics of MIA. Section 3 presents the KDE. Section 4 motivates and presents our proposal. This is then applied in Sect. 5 to some data. Section 6 concludes the paper and discusses some extensions.

2 Side Channel Analysis: An Overview

SCA is based on the fact that the physical leakage emanating from secure devices contains information on secret keys, and that an adversary can retrieve such keys by relating this information to the known course of the cryptographic device. In practice, this is done by relating the leakage to intermediate values computed by the target device which depend on parts (e.g. sub-keys) of the secret key. The set $\mathcal {K}$ of all candidate sub-keys $k$ is assumed known and not too large. The secret sub-key targeted is noted $\kappa $. The relation is typically achieved in three steps.

2.1 Device Observation

To implement a SCA, an adversary first observes the target device by feeding it with known messages $m$ in a set $\mathcal {M}$, while collecting the corresponding leakage traces $\{o(m)=(o_{1}(m),\ldots ,o_{T}(m))\}$ as vectors representing the evolution of the physical leakage at $T$ time points. Thus, the adversary first observes $\mathcal {O}=\{o(m),m\in \mathcal {M}\}$.

2.2 Device Activity Modeling

Then the adversary measures a proxy for the electrical activity of the device. A target intermediate value of $w$ bits manipulated by the device is chosen and its values are recorded for each possible combination of candidate sub-keys $k$ and messages $m$.

Then, for each candidate sub-key $k\in \mathcal {K}$, the adversary splits the intermediate values into several clusters with similar electrical activity, using a selection function $L(m,k)=v\in \mathcal {V}$ (typically the Hamming Weight (HW) or Hamming Distance (HD)). For each $v\in \mathcal {V}$, the groups $\mathcal {G}_{k}(v)=\{(m,o(m))\in \mathcal {M}\times \mathcal {O}\mid L(m,k)=v\}$ are formed and collected to give a partition $\mathcal {P}(k)=\{\mathcal {G}_{k}(v),v\in \mathcal {V}\}$ of $\mathcal {M}\times \mathcal {O}$.

Note that there are several ways to manipulate the intermediate values. For example, one could work at the word level or at the bit level. For details, see Appendix A.

2.3 Estimation of $\kappa $

The final step of a SCA consists in processing the $\mathcal {P}(k)$ to get an estimate $\hat{\kappa }$ of $\kappa $. This is done through a distinguisher. In CPA, the distinguisher is Pearson’s correlation coefficient: at each time point $t\in \{1,\ldots ,T\}$ and for each candidate sub-key $k\in \mathcal {K}$, its value $r_{k}(t)$ for the data in $\{(L(m,k),o_{t}(m)),m\in \mathcal {M}\}$ is computed. Setting $R_{k}=\max _{t\in \{1,\ldots ,T\}}r_{k}(t)$, $\kappa $ is estimated by $\hat{\kappa }=\arg \max _{k\in \mathcal {K}}R_{k}$. The rationale is that when $k=\kappa $, the grouping of the traces induced by $L(\cdot ,k)$ could show a strong enough linear association to allow distinguishing the correct sub-key from incorrect candidates. CPA is most fruitful when the data points $\left\{ (L(m,\kappa ),o_{t}(m)),m\in \mathcal {M}\right\} $ exhibit a linear trend.

In MIA, the MI index is used. In the context considered here, where the random vector $(X,Y)$ is hybrid, that is $X$ is discrete while $Y$ is continuous with support $S_{Y}$, the theoretical version of this index is defined as

$$\begin{aligned} MI=\sum _{x}l(x)\int _{S_{Y}}f(y|x)\log \frac{f(y|x)}{g(y)}\,dy, \end{aligned}$$

(1)

where $f(y|x)$ is the conditional (on $X$) PDF of $Y$ while $g(y)$ (resp. $l(x)$) is the marginal PDF of $Y$ (resp. $X$)^{Footnote 1} and the symbol $\sum _{x}$ refers to a sum taken over values $x$ of $X$ such that $l(x)>0$. We have $MI \ge 0$ and = 0 if and only if $X$ and $Y$ are statistically independent. There are other equivalent formulas defining the MI index, notably

$$\begin{aligned} MI&=H(Y)-\sum _{x}l(x)H(Y|x)\end{aligned}$$

(2)

$$\begin{aligned}&=H(Y)-H(Y|X), \end{aligned}$$

(3)

where $H(Y)=-\int _{S_{Y}}g(y)\log g(y)dy$ is the (differential) entropy of random variable $Y$ and similarly $H(Y|x)=-\int _{S_{Y}}f(y|x)\log f(y|x)\,dy$.

Specializing formula (3), MIA can be expressed as computing at each time point $t\in \{1,\ldots ,T\}$ and for each sub-key $k\in \mathcal {K}$, the quantity

$$\begin{aligned} MI_{k}(t)=H(o_{t}(m))-H(o_{t}(m)|L(m,k)). \end{aligned}$$

(4)

The correct sub-key $\kappa $ should satisfy

$$\begin{aligned} \kappa =\arg \max _{k\in \mathcal {K}}\left\{ \max _{t\in \{1,\ldots , T\}}MI_{k}(t)\right\} , \end{aligned}$$

(5)

and if $\widehat{MI_{k}(t)}$ is an estimate of $MI_{k}(t),$ an estimate $\hat{\kappa }$ of $\kappa $ is obtained as

$$\begin{aligned} \hat{\kappa }=\arg \max _{k\in \mathcal {K}}\left\{ \max _{t\in \{1,\ldots , T\}}\widehat{MI_{k}(t)}\right\} . \end{aligned}$$

(6)

The main difficulty in implementing a MIA is in estimating the values $MI_{k}(t)$.

3 Estimating a PDF

Suppose a sample of independent copies $\left\{ (X_{n},Y_{n}),n=1,...,N\right\} $ of $(X,Y)$ is at disposal. The problem of estimating the MI index (2) requires estimators of the entropies $H(Y)$ and $H(Y\mid x)$, which in turn requires estimators of the PDF $g(y)$ and $f(y|x)$. As stated earlier, estimation of these underlying PDF is a difficult statistical problem.

In general, a PDF estimator must offer a good trade-off between accuracy (bias) and variability (variance). In this section, we present the KDE. For the interested reader, details about other nonparametric methods (histogram or B-spline) can be found in [4, 14]. Note that, for simplicity, we restrict attention to the case of univariate PDF.

The kernel method uses a function $K(\cdot )$, referred to as the kernel, in conjunction with a bandwidth $h>0$. The KDE of $g(y)$ is then

$$\begin{aligned} \hat{g}_{KDE}(y)=\frac{1}{N}{\displaystyle \sum _{n=1}^{N}K_{h}\left( y-Y_{n}\right) ,} \end{aligned}$$

(7)

where $K_{h}(y)=h^{-1}K(y/h)$. Regarding the kernel, classical choices are the Gaussian function: $K(y)=\frac{1}{\sqrt{2\pi }}e^{-y^{2}/2}$ or the Epanechnikov function: $K(y)=\frac{3}{4}(1-y^{2})$ for $|y|\le 1$, but in general, this choice has less impact on the estimator than the bandwidth, which is critical in controlling the trade-off between bias and variance. A huge literature, over-viewed in [17], has been devoted to choosing this tuning parameter, and the expression of an optimal (in an asymptotic and global mathematical sense) bandwidth has been obtained. A relatively good estimator of this optimal bandwidth is obtained by Silverman’s rule [18], which, for Epanechnikov’s kernel, is

$$\begin{aligned} h_{S}=2.34\ {\hat{\sigma }}N^{-1/5}, \end{aligned}$$

(8)

where $\hat{\sigma }$ the sample standard deviation of the data $\left\{ Y_{n},n=1,...,N\right\} $. From (2), $H(Y)$ can be estimated by

$$\begin{aligned} H_{KDE}(Y)=-\int _{S_{Y}}{\hat{g}}_{KDE}\left( y\right) \log {\hat{g}}_{KDE}(y)\,dy, \end{aligned}$$

(9)

and similarly

$$\begin{aligned} H_{KDE}(Y|x)=-\int _{S_{Y}}{\hat{f}}_{KDE}(y|x)\log {\hat{f}}_{KDE}(y|x)\,dy, \end{aligned}$$

(10)

while $l(x)$ can be estimated by $N_{x}/N$ where $N_{x}$ = $\sum _{n=1}^{N}\mathbb {I}\{X_{n}=x\}$ where $\mathbb {I}\{A\} = 1$ if event A is realized and 0 otherwise.

At this stage, another hurdle is encountered because the above computations require integration. To reduce the computational cost, one can choose points $\mathcal {Q}=\{q_{0}<\ldots <q_{B}\}$ (referred to as query points) and estimate $H(Y)$ by

$$\begin{aligned} H_{KDE}^{*}(Y)=-\sum _{b=1}^{B}{\hat{g}}_{KDE}(q_{b})\log {\hat{g}}_{KDE}(q_{b})(q_{b}-q_{b-1}), \end{aligned}$$

(11)

and similarly with $H_{KDE}^{*}(Y|x)$ in place of (10). If there exists computational constraints, a small number of query points in $\mathcal {Q}$ will be preferred, but then they be properly chosen to provide mathematical accuracy of the integral, a problem for which various solutions exist, for example via the rectangular method of (11) or more sophisticated quadrature formulas. Accuracy also depends on the number of these query points $B$ and can be made arbitrarily good by increasing $B$, at the expense of computational cost. We have taken the strategy of choosing these query points systematically, along a grid covering all the sample points, whose coarseness is chosen depending on the available computing power.

We stress that Silverman’s rule has been developed with the view of getting a globally good estimate of a PDF. There are no guarantees however that the formula yields a good bandwidth for estimation of complex functionals as the MI index, and this is a problem that requires further theoretical work in mathematical statistics. In the next section, we present our proposal to address this problem in the context of a SCA, where subject matter information allows another solution.

4 Setting the Tuning Parameters of KDE-MIA: Bandwidth and Query Points

To show the effect of various choices of bandwidth and query points on KDE-MIA, a small simulation study with synthetic data was conducted.

Ten thousand pairs $(HW,L)$ were drawn from the following non-linear leakage function with probability 0.5 either $-0.9017+0.009\cdot HW-0.0022\cdot HW^{2}+\epsilon $ or $-0.9017+\epsilon $, where $\epsilon \sim N(0,0.005)$. The values of $HW$ were independently computed from intermediate values of four independent binary (i.e. with range $\{0,1\}$) random variables. We used here synthetic data so that the exact value of the MI index (= 0.0312) could be computed. This leakage model is inspired from the actual EM data considered in Sect. 5.

Figure 1 shows the results of estimating the MI index as the bandwidth $h$ and the number of equispaced query points are changed. As expected, Silverman’s rule yields a good estimate of the actual MI index when $\mathcal {Q}$ contains a reasonable number of points (e.g. $\ge $16).

Note also that as the bandwidth is increased, the bias of the MI estimator increases (hence its variance decreases) as the estimator (i.e. MI) decays to zero. This is explained by the fact that, as $h$ increases, all KDE get oversmoothed and converge to the same function that resemble the initial kernel spread over $S_{Y}$, with the entropies converging to the same value and the MI index vanishing.

All this dovetails nicely with intuition and the admonishments in almost all publications on MIA that, in order to have a good estimator of the MI index, one should use adequate PDF estimators.

However, this does not guarantee maximal efficiency of the MIA. Based on real data, Fig. 2 shows surprisingly that increasing the bandwidth results in better attacks, in terms of partial Success Rate (pSR). This behavior was replicated with other data sets and suggests that good PDF estimation does not necessarily translate in efficiency of the attack, where larger bandwidths, and smoother PDF estimators, seem to yield better results.

It is this counterintuitive behavior that has led to the realization that the bandwidth could be seen, not as a nuisance parameter to be dealt with in a statistical estimation procedure, but more profitably as a lever that could be used to fine-tune a SCA. Note that such a lever does not exist in standard CPA and arises only with more complex distinguisher.

Our Adaptive Bandwidth Selection (ABS) procedure explicitly exploits the fact that there exists exactly one correct sub-key $\kappa $. For all other $k\in \mathcal {K},$ there should be statistical independence between the intermediate value and the leakage, so that $MI_{k}=0$ when $k \ne \kappa $ while $MI_{\kappa }>0$ (for simplicity, we suppress the time point $t\in \{1,\ldots , T\}$ from the notation because we consider only one point of leakage). We consider the average distance to the rivals instead of the second best rival to eliminate ghost peak effects. Thus, an alternate expression to (5) is

$$\begin{aligned} \kappa =\arg \max _{k\in \mathcal {K}}\left\{ MI_{k}-\overline{MI_{-k}}\right\} , \end{aligned}$$

(12)

where $\overline{MI_{-k}}$ denotes the mean of all the $MI$ values except $MI_{k}$.

Now, using KDE, let $\widehat{MI_{k}(h)}$ be an estimator of $MI_{k}$ using the bandwidth $h$ in all PDF involved in (2). The empirical version of (12) leads to the first estimator

$$\begin{aligned} \hat{\kappa }=\arg \max _{k\in \mathcal {K}}\left\{ \widehat{MI_{k}(h)}-\overline{\widehat{MI_{-k}(h)}}\right\} , \end{aligned}$$

(13)

where $\overline{\widehat{MI_{-k}(h)}}$ stands for the mean of all estimators except $\widehat{MI_{k}(h)}$. At this stage, the value $h$ is still unused. The above suggests choosing this value to facilitate the identification of $\kappa $. But, as noted earlier, when $h$ increases all PDF in (2) are oversmoothed (so that all $\widehat{MI_{k}(h)}$ decay to zero, albeit at a different rate for $\widehat{MI_{\kappa }(h)}$. This suggests normalizing expression (13) and leads to the consideration of

$$\begin{aligned} \hat{\kappa }=\arg \max _{k\in \mathcal {K}}\left\{ \max _{h>0}\left[ \frac{\widehat{MI_{k}(h)}-\overline{\widehat{MI_{-k}(h)}}}{\overline{\widehat{MI_{-k}(h)}}}\right] \right\} \end{aligned}$$

(14)

as an estimator of $\kappa $. The value of $h$ where the inner max operator is attained will be noted $h_{ABS}$.

Some computational and statistical comments are in order at this stage. First, even when $MI_{k}=0$, $\widehat{MI_{k}(h)}\ge 0$ (e.g. is upwardly biased) so that the denominator $\overline{\widehat{MI_{-k}(h)}}$ is almost surely ${>}0$; this eliminates the risk of indeterminacy. Second, the estimator $\widehat{MI_{\kappa }(h)}$ will tend to be greater than $\widehat{MI_{k}(h)}$ when $k \ne \kappa $, in the sense that $Prob(\widehat{MI_{\kappa }(h)}>\widehat{MI_{k}(h)})$ will be high. Simple algebra shows that the term in bracket in (14) should then be in the interval $[-1,0]$ with high probability, whereas when $k=\kappa $, this term should tend to be positive, thus allowing a good probability of discrimination for $\kappa $. The following maximization on $h$ aims at making this discrimination independent of the choice of $h$ and is an automatic bandwidth selection procedure targeting the goal of getting a good estimate of $\kappa $, in contrast to Silverman’s rule that aims at getting good estimates of the PDF involved in $MI_{k}$. The maximization also has the side effect of smoothing the quirks that could occur in the individuals estimated PDF, and thus in the resulting $MI_{k}$, with a single value of $h$. Finally, the smoothness of $\widehat{MI_{k}(h)}$ as a function of $h$ allows to evaluate the max operator over a (finite to avoid trivial problems) grid of properly chosen $h$ values ranging from some point in the neighborhood of the value $h_{S}$ to some large multiple of this value, and this accelerates the computation of $\hat{\kappa }$. In practice, (14) is implemented as

$$\begin{aligned} \hat{\kappa }=\arg \max _{k\in \mathcal {K}}\left\{ \max _{h\in \mathcal {I}}\left[ \frac{\widehat{MI_{k}(h)}-\overline{\widehat{MI_{-k}(h)}}}{\overline{\widehat{MI_{-k}(h)}}}\right] \right\} , \end{aligned}$$

(15)

where $\mathcal {I} = \{h_i\}_{1\le i\le H}$ a set of $H \ge 2$ bandwidths.

From an engineering point of view, we can see the action (via the value of $h$) of (14) as a focus adjustment to visualize a set of $K$ pixels (i.e. $K=256$ in the case of the MI index associated with each of the 256 key assumptions in the case of AES). The numerator allows to highlight a single pixel (a single key guess) while the denominator makes uniform the background of the picture, i.e. standardizing the estimated MI values associated with the remaining guesses.

To get some feeling about the behavior of our approach, we illustrate its action with real data. We consider the 1$^{st}$ Sbox at the last round of AES from the publicly available traces of the DPAContestV2 [19] with a HD model at the word level. It turns out that $h_{ABS}=1.8>h_{S} = 0.17$ (Volt), so that our PDF estimators are smoother.

Figure 3 shows the action of our ABS criterion. The top panel gives the term in brackets of (14) for the 256 key guesses using $h_{S}$. The bottom panel shows the same with $h_{ABS}$. In both cases, the correct sub-key value ($\kappa =83$), is disclosed by MIA after the processing of all traces. However, for $h_{S}$, the margin with the second sub-key guess is relatively small while being much larger using $h_{ABS}$. Thus, the maximizing step over $h$ reduces the impact of ghost peaks and allows a better discrimination of the correct sub-key.

Figure 4 presents another view sustaining this behavior. It reports the relative margin (%) of the best (correct) sub-key with respect to the second (wrong) best sub-key guess, i.e. the difference between the estimated MI for the correct sub-key and the highest MI value among wrong sub-keys, during each step of the attack for $h_{ABS}$ and $h_{S}$. Again, the approach based on $h_{ABS}$ is more effective at reducing ghost peaks.

We close this section by noting that the principle embodied in (14) is consonant with the idea mentioned in [13] who suggest detecting outlier behavior of the correct key to perform successful recoveries. Also, when analysing a set of traces over many time points $t\in \{1,\ldots , T\}$, in (15), the $\max _{t\in \{1,\ldots , T\}}$ operation should be computed after $\max _{h\in H}$ (to optimize the extraction of information at each leakage point), with the result being the operand of $\arg \max _{k\in \mathcal {K}}$.

5 Experimental Results

In this section, we further compare the performance of our ABS-MIA, to MIA using $h_{S}$, referred to as S-MIA. During this evaluation, CPA was also computed and used as a benchmark regarding three main criteria:

1.
Efficiency, as measured by the number of traces to reach a target success rate [20].
2.
Genericity, the ability of a SCA to be more or less successful under a unknown leakage model.
3.
Computational burden.

Comparisons were conducted according to two scenarii

1.
Bit level (multi-bit).
2.
Word level.

To distinguish between these scenarii, ‘mb’ and ‘wd’ suffixes are used in the remainder of the paper (see Appendix A for details).

5.1 ABS-MIA Efficiency

The attacks were conducted with the traces of the DPAContestV2 [19] at the same fixed time point chosen in Sect. 4. Again, we focused on the 1$^{st}$ Sbox at the last round of the AES. We used both the Gaussian and the Epanechnikov kernel but report only on the latter as both give very similar results. For the estimation of the MI index, a grid of 128 equidistant query points was taken to cover the peak-to-peak amplitude of traces (fixed by the choice of caliber and sensitivity during measurements) of the analog-to-digital converter of the oscilloscope (with a 8-bit resolution). Efficiency was measured by Success Rate (SR) following the framework in [20]. This metric has been sampled over 50 independent attacks to obtain an average partial Success Rate (pSR). The attacks were conducted with the HD model. Figure 5 illustrates the promising features of our approach. In all scenarii, ABS-MIA requires smaller number of measurements than S-MIA, demonstrating the improvement. More importantly, we observe that ABS-MIAmb compares favorably with the very efficient CPAwd. To sustain these results, we carried out CPA, ABS-MIA and S-MIA on a different data set of 10000 EM traces collected above a hardware AES block, mapped into an FPGA, operating at 50 MHz with a RF2 probe and 48 dB low noise amplifier. We concentrated on the 4$^{th}$ Sbox at the last round. The HD linear model was once again considered. Figure 6 shows results similar to those obtained with the DPAContestV2 data above, with ABS-MIA showing again a large improvement over S-MIA while staying competitive with CPA.

5.2 ABS-MIA Genericity

To investigate genericity, the evaluations were performed using the second set of traces in the previous section under the unknown HW leakage model. As the pSR of the attacks using the ‘wd’ scenario never reached 10 % after processing the 10000 traces, we excluded it for further considerations. Interestingly, ABS-MIAmb is the only successful HW-based attack, with a pSR of 80 % after processing 7400 traces (see Fig. 7). Besides, all the variants of CPA (i.e. CPAmb and CPAwd) fail in this case.

5.3 ABS-MIA Computational Burden

Regarding runtime, the computational cost of MIA is related to the number of entropies to be computed (17 for ‘mb’ and 10 for ‘wd’) and on the parameters used to compute each entropy (number of query points, choice of bandwidth). Recall that ABS-MIA is a two-stage procedure because, an additional preprocessing step to obtain $h_{ABS}$ is required before launching the attack. To save time, we emphasize that this profiling step can be performed on a representative subset of the traces to compute an approximation of $h_{ABS}$ because the terms in braces in (15) stabilizes quickly as the number of traces increases. Investigations were conducted with the ‘mb’ and ‘wd’ scenarii to perform ABS-MIA in Sect. 5.4. The time spent for this preprocessing for each Sbox is approximately one twentieth of the time required for S-MIA. However, this time is partly recovered by the reduction in the number of query points required for good behavior of ABS-MIA (16 compared to at least 96 for S-MIA; because the PDF are smoother in ABS-MIA, the integrals in (9), (10) are more easily approximated). This significantly reduces the number of computations involved in getting $\widehat{MI_{k}(h)}$.

5.4 ABS-MIA: Global Success Rate for the DPAContestV2

Finally, we applied S-MIA and ABS-MIA to the DPAContestV2 traces and considered the global Success Rate (gSR) using the HD model. We also launched CPAwd and CPAmb as benchmarks. As in Sect. 5.1, 50 trace orders were considered. The evolutions of gSR are shown in Fig. 8. We observe that ABS-MIAmb dominates with, in particular, 15200 traces for the gSR to be stable above $80\,\%$. On the other hand, S-MIA fails in recovering the key. Thirty minutes (resp. two hours) were necessary to complete both the preprocessing and the ABS-MIAwd (resp. ABS-MIAmb) on a personal computer.

6 Conclusions

MIA was motivated by its ability to capture all structures of dependencies between leakage and intermediate values. But the cost of this attractive feature is the difficulty in choosing adequately some tuning parameters. By focusing on the goal of optimizing the KDE-based MIA instead of the auxiliary task of estimating PDF, we have obtained an efficient bandwidth selection procedure. The resulting bandwidths are usually larger than the commonly used $h_S$ (obtained by Silverman’s rule) and give better results in terms of attack efficiency across various experiments. We have shown that MIA driven by this method is comparable to the variant of CPA [2]. Additionally, we have reported that our MIA could succeed when CPA failed (see Sect. 5.2). Our approach could be applied to select the tuning parameters in other SCA involving nonparametric estimators, namely histograms and splines. We feel the present work shows there can be some benefits in adapting the principles of statistical methods to the task at end: SCA in the present case.

Notes

1.
Formally $l(x)$ is a probability mass function (PMF) because $X$ is discrete. To simplify notation, we use the generic acronym PDF.

References

Kocher, P.C., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999)
Chapter Google Scholar
Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004)
Chapter Google Scholar
Aumonier, S.: Generalized correlation power analysis. In: ECRYPT Workshop on Tools For Cryptanalysis, Kraków, Poland, September 2007
Google Scholar
Gierlichs, B., Batina, L., Tuyls, P., Preneel, B.: Mutual information analysis. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 426–442. Springer, Heidelberg (2008)
Chapter Google Scholar
Batina, L., Gierlichs, B., Prouff, E., Rivain, M., Standaert, F.X., Veyrat-Charvillon, N.: Mutual information analysis: a comprehensive study. Cryptol. J. 24, 269–291 (2001). Springer, New York
Article MathSciNet Google Scholar
Prouff, E., Rivain, M.: Theoretical and practical aspects of mutual information-based side channel analysis. Int. J. Adv. Comput. Technol. (IJACT) 2(2), 121–138 (2010)
MATH MathSciNet Google Scholar
Moradi, A., Mousavi, N., Paar, C., Salmasizadeh, M.: A comparative study of mutual information analysis under a Gaussian assumption. In: Youm, H.Y., Yung, M. (eds.) WISA 2009. LNCS, vol. 5932, pp. 193–205. Springer, Heidelberg (2009)
Chapter Google Scholar
Veyrat-Charvillon, N., Standaert, F.-X.: Mutual information analysis: how, when and why? In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 429–443. Springer, Heidelberg (2009)
Chapter Google Scholar
Le, T.-H., Berthier, M.: Mutual information analysis under the view of higher-order statistics. In: Echizen, I., Kunihiro, N., Sasaki, R. (eds.) IWSEC 2010. LNCS, vol. 6434, pp. 285–300. Springer, Heidelberg (2010)
Chapter Google Scholar
Gierlichs, B., Batina, L., Preneel, B., Verbauwhede, I.: Revisiting higher-order DPA attacks: multivariate mutual information analysis. In: Pieprzyk, J. (ed.) CT-RSA 2010. LNCS, vol. 5985, pp. 221–234. Springer, Heidelberg (2010)
Chapter Google Scholar
Flament, F., Guilley, S., Danger, J.L., Elaabid, M.A., Maghrebi, H., Sauvage, L.: About probability density function estimation for side channel analysis. In: Proceedings of International Workshop on Constructive Side-Channel Analysis and Secure Design (COSADE), pp. 15–23 (2010)
Google Scholar
Whitnall, C., Oswald, E.: A comprehensive evaluation of mutual information analysis using a fair evaluation framework. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 316–334. Springer, Heidelberg (2011)
Chapter Google Scholar
Veyrat-Charvillon, N., Standaert, F.-X.: Generic side-channel distinguishers: improvements and limitations. In: Rogaway, P. (ed.) CRYPTO 2011. LNCS, vol. 6841, pp. 354–372. Springer, Heidelberg (2011)
Chapter Google Scholar
Venelli, A.: Efficient entropy estimation for mutual information analysis using B-splines. In: Samarati, P., Tunstall, M., Posegga, J., Markantonakis, K., Sauveron, D. (eds.) WISTP 2010. LNCS, vol. 6033, pp. 17–30. Springer, Heidelberg (2010)
Google Scholar
Rosenblatt, M.: Remark on some nonparametric estimates of a density function. Ann. Math. Stat. 27, 832–837 (1956)
Article MATH MathSciNet Google Scholar
Parzen, E.: On the estimation of a probability density function and the mode. Ann. Math. Stat. 33, 1065–1076 (1962)
Article MATH MathSciNet Google Scholar
Sheather, S.J.: Density estimation. Stat. Sci. 19(4), 588–597 (2004)
Article MATH MathSciNet Google Scholar
Silverman, B.W., Green, P.J.: Density Estimation for Statistics and Data Analysis. Chapman and Hall, London (1986)
Book MATH Google Scholar
VLSI Research Group and TELECOM ParisTech: The DPA contest (2008/2009)
Google Scholar
Standaert, F.-X., Gierlichs, B., Verbauwhede, I.: Partition vs. comparison side-channel distinguishers: an empirical evaluation of statistical tests for Univariate side-channel attacks against two unprotected CMOS devices. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 253–267. Springer, Heidelberg (2009)
Chapter Google Scholar
Messerges, T.S., Dabbish, E.A., Sloan, R.H., Messerges, T.S., Dabbish, E.A., Sloan, R.H.: Investigations of power analysis attacks on smartcards. In: Proceedings of the USENIX Workshop on Smartcard Technology, pp. 151–162 (1999)
Google Scholar
Bévan, R., Knudsen, E.W.: Ways to enhance differential power analysis. In: Lee, P.J., Lim, C.H. (eds.) ICISC 2002. LNCS, vol. 2587, pp. 327–342. Springer, Heidelberg (2003)
Chapter Google Scholar
Tiran, S., Maurine, P.: SCA with magnitude squared coherence. In: Mangard, S. (ed.) CARDIS 2012. LNCS, vol. 7771, pp. 234–247. Springer, Heidelberg (2013)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

ST Microelectronics - Advanced System Technology, Avenue Célestin Coq, 13790, Rousset, France
Mathieu Carbone, Michel Agoyan & Yannick Teglia
LIRMM - Laboratoire d’Informatique de Robotique et de Microélectronique de Montpellier, 161, Rue Ada, 34090, Montpellier Cedex 5, France
Mathieu Carbone, Sébastien Tiran, Sébastien Ordas & Philippe Maurine
EPS - Institut de Mathématiques et de Modélisation de Montpellier 2, Place Eugène Bataillon, Université Montpellier 2, 34095, Montpellier Cedex 5, France
Gilles R. Ducharme
CEA - Centre Microélectronique de Provence Georges Charpak, 880, Route de Mimet, 13541, Gardanne, France
Philippe Maurine

Authors

Mathieu Carbone
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Tiran
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Ordas
View author publications
You can also search for this author in PubMed Google Scholar
Michel Agoyan
View author publications
You can also search for this author in PubMed Google Scholar
Yannick Teglia
View author publications
You can also search for this author in PubMed Google Scholar
Gilles R. Ducharme
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Maurine
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathieu Carbone .

Editor information

Editors and Affiliations

FNISA, Paris, France
Emmanuel Prouff

A Appendix

From Sect. 2, modeling leakage consists essentially in choosing a selection function $L$ to classify the leakage samples $o(m)$, based on the predictions $v_{m,k}$ according to $m \in \mathcal {M}$ and $k \in \mathcal {K}$, either at the word level or at the bit level (Multi-bit).

Word. In view of the additive property of the power consumption in CMOS technologies, traditional leakage models inspired are based on works in [2, 21], aims at mapping activities of components using intermediate values to the physical observations by equal summation of $w$ bits
$$\begin{aligned} L:\mathbb {F}_2^w&\rightarrow [0;w]\nonumber \\ v_{m,k}=([v_{m,k}]_1,\ldots ,[v_{m,k}]_w)&\rightarrow L(v_{m,k})={\displaystyle \sum _{b=1}^{w}[v_{m,k}]_b}. \end{aligned}$$
(16)
with $[.]_b : \mathbb {F}_2^w \rightarrow \mathbb {F}_2$ being the projection onto the $i^{th}$ bit. (AES (resp. DES) output Sbox : $w=8$ (resp. $w=5$))
Multi-bit. Alternatively, as mentioned in [22], the leakage could be analyzed bit by bit, summing up at the end each equal contribution. The multi-bit version of a distinguisher $\mathcal {D}_{k,t}$ ($\mathcal {D} \equiv MI$ in this paper) is calculated as
$$\begin{aligned} \mathcal {D}_{k,t}={\displaystyle \sum _{b=1}^{w}\left| [\mathcal {D}_{k,t}]_{b}\right| }. \end{aligned}$$
(17)
This model seems better adapted to the EM side-channel for which the assumption of additivity may be less plausible. Initially introduced for the distinguisher using the difference of means, it can be extended to other distinguishers [23].

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carbone, M. et al. (2014). On Adaptive Bandwidth Selection for Efficient MIA. In: Prouff, E. (eds) Constructive Side-Channel Analysis and Secure Design. COSADE 2014. Lecture Notes in Computer Science(), vol 8622. Springer, Cham. https://doi.org/10.1007/978-3-319-10175-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-10175-0_7
Published: 15 August 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10174-3
Online ISBN: 978-3-319-10175-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Adaptive Bandwidth Selection for Efficient MIA

Abstract

Similar content being viewed by others

On the optimality and practicability of mutual information analysis in some scenarios