A Critical Analysis of ISO 17825 (‘Testing Methods for the Mitigation of Non-invasive Attack Classes Against Cryptographic Modules’)

Whitnall, Carolyn; Oswald, Elisabeth

doi:10.1007/978-3-030-34618-8_9

Carolyn Whitnall¹⁰ &
Elisabeth Oswald^10,11

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11923))

Included in the following conference series:

International Conference on the Theory and Application of Cryptology and Information Security

1445 Accesses
17 Citations

Abstract

The ISO standardisation of ‘Testing methods for the mitigation of non-invasive attack classes against cryptographic modules’ (ISO/IEC 17825:2016) specifies the use of the Test Vector Leakage Assessment (TVLA) framework as the sole measure to assess whether or not an implementation of (symmetric) cryptography is vulnerable to differential side-channel attacks. It is the only publicly available standard of this kind, and the first side-channel assessment regime to exclusively rely on a TVLA instantiation.

TVLA essentially specifies statistical leakage detection tests with the aim of removing the burden of having to test against an ever increasing number of attack vectors. It offers the tantalising prospect of ‘conformance testing’: if a device passes TVLA, then, one is led to hope, the device would be secure against all (first-order) differential side-channel attacks.

In this paper we provide a statistical assessment of the specific instantiation of TVLA in this standard. This task leads us to inquire whether (or not) it is possible to assess the side-channel security of a device via leakage detection (TVLA) only. We find a number of grave issues in the standard and its adaptation of the original TVLA guidelines. We propose some innovations on existing methodologies and finish by giving recommendations for best practice and the responsible reporting of outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Other detection methodologies exist outside of the TVLA framework (including approaches based on mutual information [6, 7, 25], correlation [13] and the F-statistic [3] – all variants on statistical hypothesis tests, with differing degrees of formalism). These other tests and ‘higher order’ tests are not part of ISO 17825 and therefore outside the scope of this submission.
2.
https://csrc.nist.gov/Projects/cryptographic-module-validation-program/Standards.
3.
‘Power,’ as we will explain later in the paper, is a statistical concept and should not be confused with the ‘P’ of DPA which refers to power consumption.
4.
We consider these conditions to approximately hold in the case of most of the ISO standard tests, where the partitions are determined by uniformly distributed intermediates.
5.
Porter uses the terminology d-minimal; we use r instead of d to avoid confusion with Cohen’s d.
6.
In a non-specific fixed-versus-random experiment (even more so in a fixed-versus-fixed one) the differences depend on more than a single bit so, depending on the value of a given intermediate under the fixed input, can potentially be several times larger (see e.g. [32]) – or they can be smaller (e.g. if the leakage of the fixed intermediate coincides with the average case, such as the (decimal) value 15 in an approximately Hamming weight leakage scenario). It is typically assumed in the non-specific case that, as the input propagates through the algorithm, at least some of the intermediates will correspond to large (efficiently detected) class differences [13].
7.
We compute the per-test power under the repetition step as the square of the power to detect with half the sample, deriving from the assumption that the two iterations of the test are independent.
8.
The overloading of terminology between ‘specific alternatives’ and ‘specific’ TVLA tests is unfortunate but unavoidable.

References

Ammann, P., Offutt, J.: Introduction to Software Testing, 1st edn. Cambridge University Press, New York (2008)
Book Google Scholar
Asonov, D., Agrawal, R.: Keyboard acoustic emanations. In: IEEE Symposium on Security and Privacy, pp. 3–11. IEEE Computer Society (2004)
Google Scholar
Bhasin, S., Danger, J.L., Guilley, S., Najm, Z.: Side-channel leakage and trace compression using normalized inter-class variance. In: Lee, R.B., Shi, W. (eds.) HASP 2014, Hardware and Architectural Support for Security and Privacy, pp. 7:1–7:9. ACM (2014)
Google Scholar
Bi, R., Liu, P.: Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments. BMC Bioinform. 17(1), 146 (2016)
Article Google Scholar
Brouchier, J., Kean, T., Marsh, C., Naccache, D.: Temperature attacks. IEEE Secur. Priv. 7(2), 79–82 (2009)
Article Google Scholar
Chatzikokolakis, K., Chothia, T., Guha, A.: Statistical measurement of information leakage. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12002-2_33
Chapter MATH Google Scholar
Chothia, T., Guha, A.: A statistical test for information leaks using continuous mutual information. In: CSF, pp. 177–190 (2011)
Google Scholar
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Routledge (1988)
Google Scholar
Danger, J.-L., Duc, G., Guilley, S., Sauvage, L.: Education and open benchmarking on side-channel analysis with the DPA contests. In: NIST Non-Invasive Attack Testing Workshop (2011)
Google Scholar
De Cnudde, T., Ender, M., Moradi, A.: Hardware masking, revisited. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(2), 123–148 (2018)
Google Scholar
Ding, A.A., Zhang, L., Durvaux, F., Standaert, F.-X., Fei, Y.: Towards sound and optimal leakage detection procedure. In: Eisenbarth, T., Teglia, Y. (eds.) CARDIS 2017. LNCS, vol. 10728, pp. 105–122. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75208-2_7
Chapter Google Scholar
Dunn, O.J.: Multiple comparisons among means. J. Am. Stat. Assoc. 56(293), 52–64 (1961)
Article MathSciNet Google Scholar
Durvaux, F., Standaert, F.-X.: From improved leakage detection to the detection of points of interests in leakage traces. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 240–262. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49890-3_10
Chapter MATH Google Scholar
Efron, B.: Size, power and false discovery rates. Ann. Stat. 35(4), 1351–1377 (2007)
Article MathSciNet Google Scholar
Ferrigno, J., Hlavác̆, M.: When AES blinks: introducing optical side channel. IET Inf. Secur. 2(3), 94–98 (2008)
Article Google Scholar
Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic analysis: concrete results. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 251–261. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44709-1_21
Chapter Google Scholar
Goodwill, G., Jun, B., Jaffe, J., Rohatgi, P.: A testing methodology for side-channel resistance validation. In: NIST Non-Invasive Attack Testing Workshop (2011)
Google Scholar
Hoenig, J.M., Heisey, D.M.: The abuse of power. Am. Stat. 55(1), 19–24 (2001)
Article Google Scholar
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)
MathSciNet MATH Google Scholar
Information technology - Security techniques - Testing methods for the mitigation of non-invasive attack classes against cryptographic modules. Standard, International Organization for Standardization, Geneva, CH (2016)
Google Scholar
Information technology - Security techniques - Security requirements for cryptographic modules. Standard, International Organization for Standardization, Geneva, CH (2012)
Google Scholar
Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5_9
Chapter Google Scholar
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_25
Chapter Google Scholar
Liu, P., Hwang, J.T.G.: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6), 739–746 (2007)
Article Google Scholar
Mather, L., Oswald, E., Bandenburg, J., Wójcik, M.: Does my device leak information? An a priori statistical power analysis of leakage detection tests. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8269, pp. 486–505. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42033-7_25
Chapter Google Scholar
Miller, J.C., Maloney, C.J.: Systematic mistake analysis of digital computer programs. Commun. ACM 6(2), 58–63 (1963)
Article Google Scholar
Porter, K.E.: Statistical power in evaluations that investigate effects on multiple outcomes: A guide for researchers. J. Res. Educ. Eff. 11, 1–29 (2017)
Google Scholar
Pounds, S., Cheng, C.: Sample size determination for the false discovery rate. Bioinformatics 21(23), 4263–4271 (2005)
Article Google Scholar
Quisquater, J.-J., Samyde, D.: ElectroMagnetic Analysis (EMA): Measures and counter-measures for smart cards. In: Attali, I., Jensen, T. (eds.) E-smart 2001. LNCS, vol. 2140, pp. 200–210. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45418-7_17
Chapter MATH Google Scholar
Sawilowsky, S.S.: New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8(2), 597–599 (2009)
Article MathSciNet Google Scholar
Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). https://doi.org/10.1007/11545262_3
Chapter Google Scholar
Schneider, T., Moradi, A.: Leakage assessment methodology. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015. LNCS, vol. 9293, pp. 495–513. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48324-4_25
Chapter Google Scholar
Shamir, A., Tromer, E.: Acoustic cryptanalysis (website). http://theory.csail.mit.edu/~tromer/acoustic/. Accessed 9 Sept 2019
Skorobogatov, S.: Using optical emission analysis for estimating contribution to power analysis. In: Breveglieri, L., Koren, I., Naccache, D., Oswald, E., Seifert, J.-P. (eds.) Fault Diagnosis and Tolerance in Cryptography - FDTC 2009, pp. 111–119. IEEE Computer Society (2009)
Google Scholar
Standaert, F.-X., Gierlichs, B., Verbauwhede, I.: Partition vs. comparison side-channel distinguishers: An empirical evaluation of statistical tests for univariate side-channel attacks against two unprotected CMOS devices. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 253–267. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00730-9_16
Chapter Google Scholar
Standaert, F.-X., Pereira, O., Yu, Y., Quisquater, J.-J., Yung, M., Oswald, E.: Leakage resilient cryptography in practice. In: Sadeghi, A.-R., Naccache, D. (eds.) Towards Hardware-Intrinsic Security: Foundations and Practice, pp. 99–134. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14452-3_5
Chapter Google Scholar
Tasiran, S., Keutzer, K.: Coverage metrics for functional validation of hardware designs. IEEE Des. Test 18(4), 36–45 (2001)
Article Google Scholar
Thillard, A., Prouff, E., Roche, T.: Success through confidence: Evaluating the effectiveness of a side-channel attack. In: Bertoni, G., Coron, J.-S. (eds.) CHES 2013. LNCS, vol. 8086, pp. 21–36. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40349-1_2
Chapter Google Scholar
Tong, T., Zhao, H.: Practical guidelines for assessing power and false discovery rate for fixed sample size in microarray experiments. Stat. Med. 27, 1960–1972 (2008)
Article MathSciNet Google Scholar
Šidák, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62(318), 626–633 (1967)
MathSciNet MATH Google Scholar
Welch, B.L.: The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34(1–2), 28–35 (1947)
MathSciNet MATH Google Scholar
Whitnall, C., Oswald, E.: A cautionary note regarding the usage of leakage detection tests in security evaluation. IACR Cryptology ePrint Archive, Report 2019/703 (2019). https://eprint.iacr.org/2019/703

Download references

Acknowledgements

Our work has been funded by the European Commission through the H2020 project 731591 (acronym REASSURE). A fuller report on this aspect of the project can be found in A Cautionary Note Regarding the Usage of Leakage Detection Tests in Security Evaluation [42].

Author information

Authors and Affiliations

University of Bristol, Bristol, UK
Carolyn Whitnall & Elisabeth Oswald
University of Klagenfurt, Klagenfurt, Austria
Elisabeth Oswald

Authors

Carolyn Whitnall
View author publications
You can also search for this author in PubMed Google Scholar
Elisabeth Oswald
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisabeth Oswald .

Editor information

Editors and Affiliations

University of Auckland, Auckland, New Zealand
Steven D. Galbraith
Security Fundamentals Lab, NICT, Tokyo, Japan
Shiho Moriai

Appendices

A Sample Size for the t-Test

We begin with a simple visual example that illustrates the concepts of $\alpha $ and $\beta $ values and their relationship to the sample size.

Consider the following two-sided hypothesis test for the mean of a Gaussian-distributed variable $A \sim \mathcal {N}(\mu ,\sigma )$, where $\mu $ and $\sigma $ are the (unknown) parameters:

$$\begin{aligned} H_0: \mu = \mu _0 \text { vs.}&\text { } H_{alt}: \mu \ne \mu _0. \end{aligned}$$

(3)

Note that, in the leakage detection setting, where one typically wishes to test for a non-zero difference in means between two Gaussian distributions $Y_1$ and $Y_2$, this can be achieved by defining $A = Y_1 - Y_2$ and (via the properties of the Gaussian distribution) performing the above test with $\mu _0 = 0$.

Suppose the alternative hypothesis is true and that $\mu = \mu _{alt}$. This is called a ‘specific alternative’^{Footnote 8}, in recognition of the fact that it is not usually possible to compute power for all the alternatives when $H_{alt}$ defines a set or range. In the leakage detection setting one typically chooses $\mu _{alt} > 0$ to be the smallest difference $|\mu _1 - \mu _2|$ that is considered of practical relevance; this is called the effect size. Without loss of generality, we suppose that $\mu _{alt} > \mu _0$.

Figure 5 illustrates the test procedure when the risk of a Type I error is set to $\alpha $ and the sample size is presumed large enough (typically $n>30$) that the distributions of the test statistic under the null and alternative hypotheses can be approximated by Gaussian distributions. The red areas together sum to $\alpha $; the blue area indicates the overlap of $H_0$ and $H_{alt}$ and corresponds to $\beta $ (the risk of a Type II error). The power of the test – that is, the probability of correctly rejecting the null hypothesis when the alternative in true – is then $1-\beta $, as depicted by the shaded area.

There are essentially three ways to raise the power of the test. One is to increase the effect size of interest which, as should be clear from Fig. 5, serves to push the distributions apart, thereby diminishing the overlap between them. Another is to increase $\alpha $ – that is, to make a trade-off between Type II and Type I errors – or (if appropriate) to perform a one-sided test, either of which has the effect (in this case) of shifting the critical value to the left so that the shaded region becomes larger. (In the leakage detection case the one-sided test is unlikely to be suitable as differences in either direction are equally important and neither can be ruled out a priori). The third way to increase the power is to increase the sample size for the experiment. This reduces the standard error on the sample means, which again pushes the alternative distribution of the test statistic further away from null (note from Fig. 5 that it features in the denominator of the distance).

Suppose you have an effect size in mind – based either on observations made during similar previous experiments, or on a subjective value judgement about how large an effect needs to be before it is practically relevant (e.g. the level of leakage which is deemed intolerable) – and you want your test to have a given confidence level $\alpha $ and power $1-\beta $. The relationship between confidence, power, effect size and sample size can then be used to derive the minimum sample size necessary to achieve this.

The details of the argumentation that now follows are specific to a two-tailed t-test, but the general procedure can be adapted to any test for which the distribution of the test statistic is known under the null and alternative hypotheses.

For the sake of simplicity (i.e. to avoid calculating effectively irrelevant degrees of freedom) we will assume that our test will in any case require the acquisition of more than 30 observations, so that the Gaussian approximations for the test statistics hold as in Fig. 5. Without loss of generality we also assume that the difference of means is positive (otherwise the sets can be easily swapped). Finally, we assume that we seek to populate both sets with equal numbers $n=|Y|/2$ of observed traces.

Theorem 1

Let $Y_1$ be a set of traces of size N/2 drawn via repeat sampling from a normal distribution $\mathcal {N}(\mu _1,\sigma _1^2)$ and $Y_2$ be a set of traces of size N/2 drawn via repeat sampling from a normal distribution $\mathcal {N}(\mu _2,\sigma _2^2)$. Then, in a two-tailed test for a difference between the sample means:

$$\begin{aligned} H_0 \text {: } \mu _1 = \mu _2 \text { vs. } H_{alt} \text {: } \mu _1 \ne \mu _2, \end{aligned}$$

(4)

in order to achieve significance level $\alpha $ and power $1-\beta $, the overall number of traces N needs to be chosen such that:

$$\begin{aligned} N&\ge 2\cdot \frac{(z_{\alpha /2}+z_\beta )^2\cdot ({\sigma _1}^2 + {\sigma _2}^2)}{(\mu _1-\mu _2)^2}. \end{aligned}$$

(5)

Note that Eq. 5 can be straightforwardly rearranged to alternatively compute any of the significance level, effect size or power in terms of the other three quantities.

B Results for Original TVLA-Recommended Threshold

Table 6. LHS: Power to achieve Cohen’s and Sawilowsky’s standardised effects under the TVLA significance criteria (which approximates to $\alpha = 0.00001$) and the standard level 3 ($N = 10,000$) and level 4 ($N = 100,000$) sample size criteria; RHS: Minimum effect sizes detectable for increasing power thresholds.

Full size table

Table 7. Average (‘per-test’) and 1-minimal (‘overall’) power to detect observed and ‘tiny’ effect sizes under the level 3 and 4 criteria, and the sample size required to achieve balanced errors for a significance criterion of $\alpha = 0.00001$. (30 leak points in a trace set of length 1,400).

Full size table

Table 8. Different types of power and error to detect 30 true effects of size 0.04 in a trace set of length 1,400, under the level 3 and level 4 sample size criteria and with an overall significance level of $\alpha = 0.00001$. (Based on 5,000 random draws from the multivariate test statistic distribution under the alternative hypothesis).

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Whitnall, C., Oswald, E. (2019). A Critical Analysis of ISO 17825 (‘Testing Methods for the Mitigation of Non-invasive Attack Classes Against Cryptographic Modules’). In: Galbraith, S., Moriai, S. (eds) Advances in Cryptology – ASIACRYPT 2019. ASIACRYPT 2019. Lecture Notes in Computer Science(), vol 11923. Springer, Cham. https://doi.org/10.1007/978-3-030-34618-8_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-34618-8_9
Published: 22 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34617-1
Online ISBN: 978-3-030-34618-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the International Association for Cryptologic Research (opens in a new tab)

A Critical Analysis of ISO 17825 (‘Testing Methods for the Mitigation of Non-invasive Attack Classes Against Cryptographic Modules’)

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Sample Size for the t-Test

Theorem 1

B Results for Original TVLA-Recommended Threshold

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation