Abstract
The ISO standardisation of ‘Testing methods for the mitigation of non-invasive attack classes against cryptographic modules’ (ISO/IEC 17825:2016) specifies the use of the Test Vector Leakage Assessment (TVLA) framework as the sole measure to assess whether or not an implementation of (symmetric) cryptography is vulnerable to differential side-channel attacks. It is the only publicly available standard of this kind, and the first side-channel assessment regime to exclusively rely on a TVLA instantiation.
TVLA essentially specifies statistical leakage detection tests with the aim of removing the burden of having to test against an ever increasing number of attack vectors. It offers the tantalising prospect of ‘conformance testing’: if a device passes TVLA, then, one is led to hope, the device would be secure against all (first-order) differential side-channel attacks.
In this paper we provide a statistical assessment of the specific instantiation of TVLA in this standard. This task leads us to inquire whether (or not) it is possible to assess the side-channel security of a device via leakage detection (TVLA) only. We find a number of grave issues in the standard and its adaptation of the original TVLA guidelines. We propose some innovations on existing methodologies and finish by giving recommendations for best practice and the responsible reporting of outcomes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Other detection methodologies exist outside of the TVLA framework (including approaches based on mutual information [6, 7, 25], correlation [13] and the F-statistic [3] – all variants on statistical hypothesis tests, with differing degrees of formalism). These other tests and ‘higher order’ tests are not part of ISO 17825 and therefore outside the scope of this submission.
- 2.
- 3.
‘Power,’ as we will explain later in the paper, is a statistical concept and should not be confused with the ‘P’ of DPA which refers to power consumption.
- 4.
We consider these conditions to approximately hold in the case of most of the ISO standard tests, where the partitions are determined by uniformly distributed intermediates.
- 5.
Porter uses the terminology d-minimal; we use r instead of d to avoid confusion with Cohen’s d.
- 6.
In a non-specific fixed-versus-random experiment (even more so in a fixed-versus-fixed one) the differences depend on more than a single bit so, depending on the value of a given intermediate under the fixed input, can potentially be several times larger (see e.g. [32]) – or they can be smaller (e.g. if the leakage of the fixed intermediate coincides with the average case, such as the (decimal) value 15 in an approximately Hamming weight leakage scenario). It is typically assumed in the non-specific case that, as the input propagates through the algorithm, at least some of the intermediates will correspond to large (efficiently detected) class differences [13].
- 7.
We compute the per-test power under the repetition step as the square of the power to detect with half the sample, deriving from the assumption that the two iterations of the test are independent.
- 8.
The overloading of terminology between ‘specific alternatives’ and ‘specific’ TVLA tests is unfortunate but unavoidable.
References
Ammann, P., Offutt, J.: Introduction to Software Testing, 1st edn. Cambridge University Press, New York (2008)
Asonov, D., Agrawal, R.: Keyboard acoustic emanations. In: IEEE Symposium on Security and Privacy, pp. 3–11. IEEE Computer Society (2004)
Bhasin, S., Danger, J.L., Guilley, S., Najm, Z.: Side-channel leakage and trace compression using normalized inter-class variance. In: Lee, R.B., Shi, W. (eds.) HASP 2014, Hardware and Architectural Support for Security and Privacy, pp. 7:1–7:9. ACM (2014)
Bi, R., Liu, P.: Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments. BMC Bioinform. 17(1), 146 (2016)
Brouchier, J., Kean, T., Marsh, C., Naccache, D.: Temperature attacks. IEEE Secur. Priv. 7(2), 79–82 (2009)
Chatzikokolakis, K., Chothia, T., Guha, A.: Statistical measurement of information leakage. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12002-2_33
Chothia, T., Guha, A.: A statistical test for information leaks using continuous mutual information. In: CSF, pp. 177–190 (2011)
Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Routledge (1988)
Danger, J.-L., Duc, G., Guilley, S., Sauvage, L.: Education and open benchmarking on side-channel analysis with the DPA contests. In: NIST Non-Invasive Attack Testing Workshop (2011)
De Cnudde, T., Ender, M., Moradi, A.: Hardware masking, revisited. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(2), 123–148 (2018)
Ding, A.A., Zhang, L., Durvaux, F., Standaert, F.-X., Fei, Y.: Towards sound and optimal leakage detection procedure. In: Eisenbarth, T., Teglia, Y. (eds.) CARDIS 2017. LNCS, vol. 10728, pp. 105–122. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75208-2_7
Dunn, O.J.: Multiple comparisons among means. J. Am. Stat. Assoc. 56(293), 52–64 (1961)
Durvaux, F., Standaert, F.-X.: From improved leakage detection to the detection of points of interests in leakage traces. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 240–262. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49890-3_10
Efron, B.: Size, power and false discovery rates. Ann. Stat. 35(4), 1351–1377 (2007)
Ferrigno, J., Hlavác̆, M.: When AES blinks: introducing optical side channel. IET Inf. Secur. 2(3), 94–98 (2008)
Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic analysis: concrete results. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 251–261. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44709-1_21
Goodwill, G., Jun, B., Jaffe, J., Rohatgi, P.: A testing methodology for side-channel resistance validation. In: NIST Non-Invasive Attack Testing Workshop (2011)
Hoenig, J.M., Heisey, D.M.: The abuse of power. Am. Stat. 55(1), 19–24 (2001)
Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)
Information technology - Security techniques - Testing methods for the mitigation of non-invasive attack classes against cryptographic modules. Standard, International Organization for Standardization, Geneva, CH (2016)
Information technology - Security techniques - Security requirements for cryptographic modules. Standard, International Organization for Standardization, Geneva, CH (2012)
Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5_9
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_25
Liu, P., Hwang, J.T.G.: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6), 739–746 (2007)
Mather, L., Oswald, E., Bandenburg, J., Wójcik, M.: Does my device leak information? An a priori statistical power analysis of leakage detection tests. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8269, pp. 486–505. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42033-7_25
Miller, J.C., Maloney, C.J.: Systematic mistake analysis of digital computer programs. Commun. ACM 6(2), 58–63 (1963)
Porter, K.E.: Statistical power in evaluations that investigate effects on multiple outcomes: A guide for researchers. J. Res. Educ. Eff. 11, 1–29 (2017)
Pounds, S., Cheng, C.: Sample size determination for the false discovery rate. Bioinformatics 21(23), 4263–4271 (2005)
Quisquater, J.-J., Samyde, D.: ElectroMagnetic Analysis (EMA): Measures and counter-measures for smart cards. In: Attali, I., Jensen, T. (eds.) E-smart 2001. LNCS, vol. 2140, pp. 200–210. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45418-7_17
Sawilowsky, S.S.: New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8(2), 597–599 (2009)
Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). https://doi.org/10.1007/11545262_3
Schneider, T., Moradi, A.: Leakage assessment methodology. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015. LNCS, vol. 9293, pp. 495–513. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48324-4_25
Shamir, A., Tromer, E.: Acoustic cryptanalysis (website). http://theory.csail.mit.edu/~tromer/acoustic/. Accessed 9 Sept 2019
Skorobogatov, S.: Using optical emission analysis for estimating contribution to power analysis. In: Breveglieri, L., Koren, I., Naccache, D., Oswald, E., Seifert, J.-P. (eds.) Fault Diagnosis and Tolerance in Cryptography - FDTC 2009, pp. 111–119. IEEE Computer Society (2009)
Standaert, F.-X., Gierlichs, B., Verbauwhede, I.: Partition vs. comparison side-channel distinguishers: An empirical evaluation of statistical tests for univariate side-channel attacks against two unprotected CMOS devices. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 253–267. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00730-9_16
Standaert, F.-X., Pereira, O., Yu, Y., Quisquater, J.-J., Yung, M., Oswald, E.: Leakage resilient cryptography in practice. In: Sadeghi, A.-R., Naccache, D. (eds.) Towards Hardware-Intrinsic Security: Foundations and Practice, pp. 99–134. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14452-3_5
Tasiran, S., Keutzer, K.: Coverage metrics for functional validation of hardware designs. IEEE Des. Test 18(4), 36–45 (2001)
Thillard, A., Prouff, E., Roche, T.: Success through confidence: Evaluating the effectiveness of a side-channel attack. In: Bertoni, G., Coron, J.-S. (eds.) CHES 2013. LNCS, vol. 8086, pp. 21–36. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40349-1_2
Tong, T., Zhao, H.: Practical guidelines for assessing power and false discovery rate for fixed sample size in microarray experiments. Stat. Med. 27, 1960–1972 (2008)
Šidák, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62(318), 626–633 (1967)
Welch, B.L.: The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34(1–2), 28–35 (1947)
Whitnall, C., Oswald, E.: A cautionary note regarding the usage of leakage detection tests in security evaluation. IACR Cryptology ePrint Archive, Report 2019/703 (2019). https://eprint.iacr.org/2019/703
Acknowledgements
Our work has been funded by the European Commission through the H2020 project 731591 (acronym REASSURE). A fuller report on this aspect of the project can be found in A Cautionary Note Regarding the Usage of Leakage Detection Tests in Security Evaluation [42].
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Sample Size for the t-Test
We begin with a simple visual example that illustrates the concepts of \(\alpha \) and \(\beta \) values and their relationship to the sample size.
Consider the following two-sided hypothesis test for the mean of a Gaussian-distributed variable \(A \sim \mathcal {N}(\mu ,\sigma )\), where \(\mu \) and \(\sigma \) are the (unknown) parameters:
Note that, in the leakage detection setting, where one typically wishes to test for a non-zero difference in means between two Gaussian distributions \(Y_1\) and \(Y_2\), this can be achieved by defining \(A = Y_1 - Y_2\) and (via the properties of the Gaussian distribution) performing the above test with \(\mu _0 = 0\).
Suppose the alternative hypothesis is true and that \(\mu = \mu _{alt}\). This is called a ‘specific alternative’Footnote 8, in recognition of the fact that it is not usually possible to compute power for all the alternatives when \(H_{alt}\) defines a set or range. In the leakage detection setting one typically chooses \(\mu _{alt} > 0\) to be the smallest difference \(|\mu _1 - \mu _2|\) that is considered of practical relevance; this is called the effect size. Without loss of generality, we suppose that \(\mu _{alt} > \mu _0\).
Figure 5 illustrates the test procedure when the risk of a Type I error is set to \(\alpha \) and the sample size is presumed large enough (typically \(n>30\)) that the distributions of the test statistic under the null and alternative hypotheses can be approximated by Gaussian distributions. The red areas together sum to \(\alpha \); the blue area indicates the overlap of \(H_0\) and \(H_{alt}\) and corresponds to \(\beta \) (the risk of a Type II error). The power of the test – that is, the probability of correctly rejecting the null hypothesis when the alternative in true – is then \(1-\beta \), as depicted by the shaded area.
There are essentially three ways to raise the power of the test. One is to increase the effect size of interest which, as should be clear from Fig. 5, serves to push the distributions apart, thereby diminishing the overlap between them. Another is to increase \(\alpha \) – that is, to make a trade-off between Type II and Type I errors – or (if appropriate) to perform a one-sided test, either of which has the effect (in this case) of shifting the critical value to the left so that the shaded region becomes larger. (In the leakage detection case the one-sided test is unlikely to be suitable as differences in either direction are equally important and neither can be ruled out a priori). The third way to increase the power is to increase the sample size for the experiment. This reduces the standard error on the sample means, which again pushes the alternative distribution of the test statistic further away from null (note from Fig. 5 that it features in the denominator of the distance).
Suppose you have an effect size in mind – based either on observations made during similar previous experiments, or on a subjective value judgement about how large an effect needs to be before it is practically relevant (e.g. the level of leakage which is deemed intolerable) – and you want your test to have a given confidence level \(\alpha \) and power \(1-\beta \). The relationship between confidence, power, effect size and sample size can then be used to derive the minimum sample size necessary to achieve this.
The details of the argumentation that now follows are specific to a two-tailed t-test, but the general procedure can be adapted to any test for which the distribution of the test statistic is known under the null and alternative hypotheses.
For the sake of simplicity (i.e. to avoid calculating effectively irrelevant degrees of freedom) we will assume that our test will in any case require the acquisition of more than 30 observations, so that the Gaussian approximations for the test statistics hold as in Fig. 5. Without loss of generality we also assume that the difference of means is positive (otherwise the sets can be easily swapped). Finally, we assume that we seek to populate both sets with equal numbers \(n=|Y|/2\) of observed traces.
Theorem 1
Let \(Y_1\) be a set of traces of size N/2 drawn via repeat sampling from a normal distribution \(\mathcal {N}(\mu _1,\sigma _1^2)\) and \(Y_2\) be a set of traces of size N/2 drawn via repeat sampling from a normal distribution \(\mathcal {N}(\mu _2,\sigma _2^2)\). Then, in a two-tailed test for a difference between the sample means:
in order to achieve significance level \(\alpha \) and power \(1-\beta \), the overall number of traces N needs to be chosen such that:
Note that Eq. 5 can be straightforwardly rearranged to alternatively compute any of the significance level, effect size or power in terms of the other three quantities.
B Results for Original TVLA-Recommended Threshold
Rights and permissions
Copyright information
© 2019 International Association for Cryptologic Research
About this paper
Cite this paper
Whitnall, C., Oswald, E. (2019). A Critical Analysis of ISO 17825 (‘Testing Methods for the Mitigation of Non-invasive Attack Classes Against Cryptographic Modules’). In: Galbraith, S., Moriai, S. (eds) Advances in Cryptology – ASIACRYPT 2019. ASIACRYPT 2019. Lecture Notes in Computer Science(), vol 11923. Springer, Cham. https://doi.org/10.1007/978-3-030-34618-8_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-34618-8_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34617-1
Online ISBN: 978-3-030-34618-8
eBook Packages: Computer ScienceComputer Science (R0)