Skip to main content

A Critical Analysis of ISO 17825 (‘Testing Methods for the Mitigation of Non-invasive Attack Classes Against Cryptographic Modules’)

  • Conference paper
  • First Online:
Advances in Cryptology – ASIACRYPT 2019 (ASIACRYPT 2019)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11923))

Abstract

The ISO standardisation of ‘Testing methods for the mitigation of non-invasive attack classes against cryptographic modules’ (ISO/IEC 17825:2016) specifies the use of the Test Vector Leakage Assessment (TVLA) framework as the sole measure to assess whether or not an implementation of (symmetric) cryptography is vulnerable to differential side-channel attacks. It is the only publicly available standard of this kind, and the first side-channel assessment regime to exclusively rely on a TVLA instantiation.

TVLA essentially specifies statistical leakage detection tests with the aim of removing the burden of having to test against an ever increasing number of attack vectors. It offers the tantalising prospect of ‘conformance testing’: if a device passes TVLA, then, one is led to hope, the device would be secure against all (first-order) differential side-channel attacks.

In this paper we provide a statistical assessment of the specific instantiation of TVLA in this standard. This task leads us to inquire whether (or not) it is possible to assess the side-channel security of a device via leakage detection (TVLA) only. We find a number of grave issues in the standard and its adaptation of the original TVLA guidelines. We propose some innovations on existing methodologies and finish by giving recommendations for best practice and the responsible reporting of outcomes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Other detection methodologies exist outside of the TVLA framework (including approaches based on mutual information [6, 7, 25], correlation [13] and the F-statistic [3] – all variants on statistical hypothesis tests, with differing degrees of formalism). These other tests and ‘higher order’ tests are not part of ISO 17825 and therefore outside the scope of this submission.

  2. 2.

    https://csrc.nist.gov/Projects/cryptographic-module-validation-program/Standards.

  3. 3.

    ‘Power,’ as we will explain later in the paper, is a statistical concept and should not be confused with the ‘P’ of DPA which refers to power consumption.

  4. 4.

    We consider these conditions to approximately hold in the case of most of the ISO standard tests, where the partitions are determined by uniformly distributed intermediates.

  5. 5.

    Porter uses the terminology d-minimal; we use r instead of d to avoid confusion with Cohen’s d.

  6. 6.

    In a non-specific fixed-versus-random experiment (even more so in a fixed-versus-fixed one) the differences depend on more than a single bit so, depending on the value of a given intermediate under the fixed input, can potentially be several times larger (see e.g. [32]) – or they can be smaller (e.g. if the leakage of the fixed intermediate coincides with the average case, such as the (decimal) value 15 in an approximately Hamming weight leakage scenario). It is typically assumed in the non-specific case that, as the input propagates through the algorithm, at least some of the intermediates will correspond to large (efficiently detected) class differences [13].

  7. 7.

    We compute the per-test power under the repetition step as the square of the power to detect with half the sample, deriving from the assumption that the two iterations of the test are independent.

  8. 8.

    The overloading of terminology between ‘specific alternatives’ and ‘specific’ TVLA tests is unfortunate but unavoidable.

References

  1. Ammann, P., Offutt, J.: Introduction to Software Testing, 1st edn. Cambridge University Press, New York (2008)

    Book  Google Scholar 

  2. Asonov, D., Agrawal, R.: Keyboard acoustic emanations. In: IEEE Symposium on Security and Privacy, pp. 3–11. IEEE Computer Society (2004)

    Google Scholar 

  3. Bhasin, S., Danger, J.L., Guilley, S., Najm, Z.: Side-channel leakage and trace compression using normalized inter-class variance. In: Lee, R.B., Shi, W. (eds.) HASP 2014, Hardware and Architectural Support for Security and Privacy, pp. 7:1–7:9. ACM (2014)

    Google Scholar 

  4. Bi, R., Liu, P.: Sample size calculation while controlling false discovery rate for differential expression analysis with RNA-sequencing experiments. BMC Bioinform. 17(1), 146 (2016)

    Article  Google Scholar 

  5. Brouchier, J., Kean, T., Marsh, C., Naccache, D.: Temperature attacks. IEEE Secur. Priv. 7(2), 79–82 (2009)

    Article  Google Scholar 

  6. Chatzikokolakis, K., Chothia, T., Guha, A.: Statistical measurement of information leakage. In: Esparza, J., Majumdar, R. (eds.) TACAS 2010. LNCS, vol. 6015, pp. 390–404. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12002-2_33

    Chapter  MATH  Google Scholar 

  7. Chothia, T., Guha, A.: A statistical test for information leaks using continuous mutual information. In: CSF, pp. 177–190 (2011)

    Google Scholar 

  8. Cohen, J.: Statistical Power Analysis for the Behavioral Sciences. Routledge (1988)

    Google Scholar 

  9. Danger, J.-L., Duc, G., Guilley, S., Sauvage, L.: Education and open benchmarking on side-channel analysis with the DPA contests. In: NIST Non-Invasive Attack Testing Workshop (2011)

    Google Scholar 

  10. De Cnudde, T., Ender, M., Moradi, A.: Hardware masking, revisited. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018(2), 123–148 (2018)

    Google Scholar 

  11. Ding, A.A., Zhang, L., Durvaux, F., Standaert, F.-X., Fei, Y.: Towards sound and optimal leakage detection procedure. In: Eisenbarth, T., Teglia, Y. (eds.) CARDIS 2017. LNCS, vol. 10728, pp. 105–122. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75208-2_7

    Chapter  Google Scholar 

  12. Dunn, O.J.: Multiple comparisons among means. J. Am. Stat. Assoc. 56(293), 52–64 (1961)

    Article  MathSciNet  Google Scholar 

  13. Durvaux, F., Standaert, F.-X.: From improved leakage detection to the detection of points of interests in leakage traces. In: Fischlin, M., Coron, J.-S. (eds.) EUROCRYPT 2016. LNCS, vol. 9665, pp. 240–262. Springer, Heidelberg (2016). https://doi.org/10.1007/978-3-662-49890-3_10

    Chapter  MATH  Google Scholar 

  14. Efron, B.: Size, power and false discovery rates. Ann. Stat. 35(4), 1351–1377 (2007)

    Article  MathSciNet  Google Scholar 

  15. Ferrigno, J., Hlavác̆, M.: When AES blinks: introducing optical side channel. IET Inf. Secur. 2(3), 94–98 (2008)

    Article  Google Scholar 

  16. Gandolfi, K., Mourtel, C., Olivier, F.: Electromagnetic analysis: concrete results. In: Koç, Ç.K., Naccache, D., Paar, C. (eds.) CHES 2001. LNCS, vol. 2162, pp. 251–261. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44709-1_21

    Chapter  Google Scholar 

  17. Goodwill, G., Jun, B., Jaffe, J., Rohatgi, P.: A testing methodology for side-channel resistance validation. In: NIST Non-Invasive Attack Testing Workshop (2011)

    Google Scholar 

  18. Hoenig, J.M., Heisey, D.M.: The abuse of power. Am. Stat. 55(1), 19–24 (2001)

    Article  Google Scholar 

  19. Holm, S.: A simple sequentially rejective multiple test procedure. Scand. J. Stat. 6, 65–70 (1979)

    MathSciNet  MATH  Google Scholar 

  20. Information technology - Security techniques - Testing methods for the mitigation of non-invasive attack classes against cryptographic modules. Standard, International Organization for Standardization, Geneva, CH (2016)

    Google Scholar 

  21. Information technology - Security techniques - Security requirements for cryptographic modules. Standard, International Organization for Standardization, Geneva, CH (2012)

    Google Scholar 

  22. Kocher, P.C.: Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In: Koblitz, N. (ed.) CRYPTO 1996. LNCS, vol. 1109, pp. 104–113. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-68697-5_9

    Chapter  Google Scholar 

  23. Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_25

    Chapter  Google Scholar 

  24. Liu, P., Hwang, J.T.G.: Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics 23(6), 739–746 (2007)

    Article  Google Scholar 

  25. Mather, L., Oswald, E., Bandenburg, J., Wójcik, M.: Does my device leak information? An a priori statistical power analysis of leakage detection tests. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8269, pp. 486–505. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42033-7_25

    Chapter  Google Scholar 

  26. Miller, J.C., Maloney, C.J.: Systematic mistake analysis of digital computer programs. Commun. ACM 6(2), 58–63 (1963)

    Article  Google Scholar 

  27. Porter, K.E.: Statistical power in evaluations that investigate effects on multiple outcomes: A guide for researchers. J. Res. Educ. Eff. 11, 1–29 (2017)

    Google Scholar 

  28. Pounds, S., Cheng, C.: Sample size determination for the false discovery rate. Bioinformatics 21(23), 4263–4271 (2005)

    Article  Google Scholar 

  29. Quisquater, J.-J., Samyde, D.: ElectroMagnetic Analysis (EMA): Measures and counter-measures for smart cards. In: Attali, I., Jensen, T. (eds.) E-smart 2001. LNCS, vol. 2140, pp. 200–210. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45418-7_17

    Chapter  MATH  Google Scholar 

  30. Sawilowsky, S.S.: New effect size rules of thumb. J. Mod. Appl. Stat. Methods 8(2), 597–599 (2009)

    Article  MathSciNet  Google Scholar 

  31. Schindler, W., Lemke, K., Paar, C.: A stochastic model for differential side channel cryptanalysis. In: Rao, J.R., Sunar, B. (eds.) CHES 2005. LNCS, vol. 3659, pp. 30–46. Springer, Heidelberg (2005). https://doi.org/10.1007/11545262_3

    Chapter  Google Scholar 

  32. Schneider, T., Moradi, A.: Leakage assessment methodology. In: Güneysu, T., Handschuh, H. (eds.) CHES 2015. LNCS, vol. 9293, pp. 495–513. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-48324-4_25

    Chapter  Google Scholar 

  33. Shamir, A., Tromer, E.: Acoustic cryptanalysis (website). http://theory.csail.mit.edu/~tromer/acoustic/. Accessed 9 Sept 2019

  34. Skorobogatov, S.: Using optical emission analysis for estimating contribution to power analysis. In: Breveglieri, L., Koren, I., Naccache, D., Oswald, E., Seifert, J.-P. (eds.) Fault Diagnosis and Tolerance in Cryptography - FDTC 2009, pp. 111–119. IEEE Computer Society (2009)

    Google Scholar 

  35. Standaert, F.-X., Gierlichs, B., Verbauwhede, I.: Partition vs. comparison side-channel distinguishers: An empirical evaluation of statistical tests for univariate side-channel attacks against two unprotected CMOS devices. In: Lee, P.J., Cheon, J.H. (eds.) ICISC 2008. LNCS, vol. 5461, pp. 253–267. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-00730-9_16

    Chapter  Google Scholar 

  36. Standaert, F.-X., Pereira, O., Yu, Y., Quisquater, J.-J., Yung, M., Oswald, E.: Leakage resilient cryptography in practice. In: Sadeghi, A.-R., Naccache, D. (eds.) Towards Hardware-Intrinsic Security: Foundations and Practice, pp. 99–134. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14452-3_5

    Chapter  Google Scholar 

  37. Tasiran, S., Keutzer, K.: Coverage metrics for functional validation of hardware designs. IEEE Des. Test 18(4), 36–45 (2001)

    Article  Google Scholar 

  38. Thillard, A., Prouff, E., Roche, T.: Success through confidence: Evaluating the effectiveness of a side-channel attack. In: Bertoni, G., Coron, J.-S. (eds.) CHES 2013. LNCS, vol. 8086, pp. 21–36. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40349-1_2

    Chapter  Google Scholar 

  39. Tong, T., Zhao, H.: Practical guidelines for assessing power and false discovery rate for fixed sample size in microarray experiments. Stat. Med. 27, 1960–1972 (2008)

    Article  MathSciNet  Google Scholar 

  40. Šidák, Z.: Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 62(318), 626–633 (1967)

    MathSciNet  MATH  Google Scholar 

  41. Welch, B.L.: The generalization of “Student’s” problem when several different population variances are involved. Biometrika 34(1–2), 28–35 (1947)

    MathSciNet  MATH  Google Scholar 

  42. Whitnall, C., Oswald, E.: A cautionary note regarding the usage of leakage detection tests in security evaluation. IACR Cryptology ePrint Archive, Report 2019/703 (2019). https://eprint.iacr.org/2019/703

Download references

Acknowledgements

Our work has been funded by the European Commission through the H2020 project 731591 (acronym REASSURE). A fuller report on this aspect of the project can be found in A Cautionary Note Regarding the Usage of Leakage Detection Tests in Security Evaluation [42].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elisabeth Oswald .

Editor information

Editors and Affiliations

Appendices

A Sample Size for the t-Test

We begin with a simple visual example that illustrates the concepts of \(\alpha \) and \(\beta \) values and their relationship to the sample size.

Consider the following two-sided hypothesis test for the mean of a Gaussian-distributed variable \(A \sim \mathcal {N}(\mu ,\sigma )\), where \(\mu \) and \(\sigma \) are the (unknown) parameters:

$$\begin{aligned} H_0: \mu = \mu _0 \text { vs.}&\text { } H_{alt}: \mu \ne \mu _0. \end{aligned}$$
(3)

Note that, in the leakage detection setting, where one typically wishes to test for a non-zero difference in means between two Gaussian distributions \(Y_1\) and \(Y_2\), this can be achieved by defining \(A = Y_1 - Y_2\) and (via the properties of the Gaussian distribution) performing the above test with \(\mu _0 = 0\).

Suppose the alternative hypothesis is true and that \(\mu = \mu _{alt}\). This is called a ‘specific alternative’Footnote 8, in recognition of the fact that it is not usually possible to compute power for all the alternatives when \(H_{alt}\) defines a set or range. In the leakage detection setting one typically chooses \(\mu _{alt} > 0\) to be the smallest difference \(|\mu _1 - \mu _2|\) that is considered of practical relevance; this is called the effect size. Without loss of generality, we suppose that \(\mu _{alt} > \mu _0\).

Figure 5 illustrates the test procedure when the risk of a Type I error is set to \(\alpha \) and the sample size is presumed large enough (typically \(n>30\)) that the distributions of the test statistic under the null and alternative hypotheses can be approximated by Gaussian distributions. The red areas together sum to \(\alpha \); the blue area indicates the overlap of \(H_0\) and \(H_{alt}\) and corresponds to \(\beta \) (the risk of a Type II error). The power of the test – that is, the probability of correctly rejecting the null hypothesis when the alternative in true – is then \(1-\beta \), as depicted by the shaded area.

There are essentially three ways to raise the power of the test. One is to increase the effect size of interest which, as should be clear from Fig. 5, serves to push the distributions apart, thereby diminishing the overlap between them. Another is to increase \(\alpha \) – that is, to make a trade-off between Type II and Type I errors – or (if appropriate) to perform a one-sided test, either of which has the effect (in this case) of shifting the critical value to the left so that the shaded region becomes larger. (In the leakage detection case the one-sided test is unlikely to be suitable as differences in either direction are equally important and neither can be ruled out a priori). The third way to increase the power is to increase the sample size for the experiment. This reduces the standard error on the sample means, which again pushes the alternative distribution of the test statistic further away from null (note from Fig. 5 that it features in the denominator of the distance).

Suppose you have an effect size in mind – based either on observations made during similar previous experiments, or on a subjective value judgement about how large an effect needs to be before it is practically relevant (e.g. the level of leakage which is deemed intolerable) – and you want your test to have a given confidence level \(\alpha \) and power \(1-\beta \). The relationship between confidence, power, effect size and sample size can then be used to derive the minimum sample size necessary to achieve this.

The details of the argumentation that now follows are specific to a two-tailed t-test, but the general procedure can be adapted to any test for which the distribution of the test statistic is known under the null and alternative hypotheses.

For the sake of simplicity (i.e. to avoid calculating effectively irrelevant degrees of freedom) we will assume that our test will in any case require the acquisition of more than 30 observations, so that the Gaussian approximations for the test statistics hold as in Fig. 5. Without loss of generality we also assume that the difference of means is positive (otherwise the sets can be easily swapped). Finally, we assume that we seek to populate both sets with equal numbers \(n=|Y|/2\) of observed traces.

Theorem 1

Let \(Y_1\) be a set of traces of size N/2 drawn via repeat sampling from a normal distribution \(\mathcal {N}(\mu _1,\sigma _1^2)\) and \(Y_2\) be a set of traces of size N/2 drawn via repeat sampling from a normal distribution \(\mathcal {N}(\mu _2,\sigma _2^2)\). Then, in a two-tailed test for a difference between the sample means:

$$\begin{aligned} H_0 \text {: } \mu _1 = \mu _2 \text { vs. } H_{alt} \text {: } \mu _1 \ne \mu _2, \end{aligned}$$
(4)

in order to achieve significance level \(\alpha \) and power \(1-\beta \), the overall number of traces N needs to be chosen such that:

$$\begin{aligned} N&\ge 2\cdot \frac{(z_{\alpha /2}+z_\beta )^2\cdot ({\sigma _1}^2 + {\sigma _2}^2)}{(\mu _1-\mu _2)^2}. \end{aligned}$$
(5)
Fig. 5.
figure 5

Figure showing the Type I and II error probabilities, \(\alpha \) and \(\beta \) as well as the effect size \(\mu _{alt}-\mu _0\) for a specific alternative such that \(\mu _{alt} > \mu _0\).

Note that Eq. 5 can be straightforwardly rearranged to alternatively compute any of the significance level, effect size or power in terms of the other three quantities.

B Results for Original TVLA-Recommended Threshold

Table 6. LHS: Power to achieve Cohen’s and Sawilowsky’s standardised effects under the TVLA significance criteria (which approximates to \(\alpha = 0.00001\)) and the standard level 3 (\(N = 10,000\)) and level 4 (\(N = 100,000\)) sample size criteria; RHS: Minimum effect sizes detectable for increasing power thresholds.
Table 7. Average (‘per-test’) and 1-minimal (‘overall’) power to detect observed and ‘tiny’ effect sizes under the level 3 and 4 criteria, and the sample size required to achieve balanced errors for a significance criterion of \(\alpha = 0.00001\). (30 leak points in a trace set of length 1,400).
Fig. 6.
figure 6

Different types of power and error to detect 30 true effects of size 0.04 in a trace set of length 1,400, as sample size increases, for an overall significance level of \(\alpha = 0.00001\). (Based on 5,000 random draws from the multivariate test statistic distribution under the alternative hypothesis).

Table 8. Different types of power and error to detect 30 true effects of size 0.04 in a trace set of length 1,400, under the level 3 and level 4 sample size criteria and with an overall significance level of \(\alpha = 0.00001\). (Based on 5,000 random draws from the multivariate test statistic distribution under the alternative hypothesis).
Fig. 7.
figure 7

FWER and 1-minimal (‘overall’) power of the tests to detect effects of the ‘observed’ size 0.04 for various leakage scenarios as the trace length increases, under the level 3 and level 4 standard criteria with an overall significance level of \(\alpha = 0.00001\).

Rights and permissions

Reprints and permissions

Copyright information

© 2019 International Association for Cryptologic Research

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Whitnall, C., Oswald, E. (2019). A Critical Analysis of ISO 17825 (‘Testing Methods for the Mitigation of Non-invasive Attack Classes Against Cryptographic Modules’). In: Galbraith, S., Moriai, S. (eds) Advances in Cryptology – ASIACRYPT 2019. ASIACRYPT 2019. Lecture Notes in Computer Science(), vol 11923. Springer, Cham. https://doi.org/10.1007/978-3-030-34618-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34618-8_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34617-1

  • Online ISBN: 978-3-030-34618-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics