Abstract
In Side-Channel Analysis (SCA), several papers have shown that neural networks could be trained to efficiently extract sensitive information from implementations running on embedded devices. This paper introduces a new tool called Gradient Visualization that aims to proceed a post-mortem information leakage characterization after the successful training of a neural network. It relies on the computation of the gradient of the loss function used during the training. The gradient is no longer computed with respect to the model parameters, but with respect to the input trace components. Thus, it can accurately highlight temporal moments where sensitive information leaks. We theoretically show that this method, based on Sensitivity Analysis, may be used to efficiently localize points of interest in the SCA context. The efficiency of the proposed method does not depend on the particular countermeasures that may be applied to the measured traces as long as the profiled neural network can still learn in presence of such difficulties. In addition, the characterization can be made for each trace individually. We verified the soundness of our proposed method on simulated data and on experimental traces from a public side-channel database. Eventually we empirically show that the Sensitivity Analysis is at least as good as state-of-the-art characterization methods, in presence (or not) of countermeasures.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In practice, the latter methods usually emphasize the same PoIs than SNR. This claim has been empirically verified on the data considered in this study. For this reason, we will only focus on the SNR when challenging the effectiveness of our method in the remaining of this paper.
- 2.
A general definition of Sensitivity Analysis is the study of how the uncertainty in the output of a mathematical model or system (numerical or otherwise) can be apportioned to different sources of uncertainty in its inputs [1].
- 3.
It corresponds to 26 clock cycles.
- 4.
Following the recent work in [29], the classical Machine Learning metrics (accuracy, recall) are ignored, as they are not proved to fit well the context of SCA.
- 5.
References
Sensitivity analysis - Wikipedia. https://en.wikipedia.org/wiki/Sensitivity_analysis
Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. arXiv:1605.08695 [cs], 27 May 2016
Brier, E., Clavier, C., Olivier, F.: Correlation power analysis with a leakage model. In: Joye, M., Quisquater, J.-J. (eds.) CHES 2004. LNCS, vol. 3156, pp. 16–29. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28632-5_2
Cagli, E., Dumas, C., Prouff, E.: Enhancing dimensionality reduction methods for side-channel attacks. In: Homma, N., Medwed, M. (eds.) CARDIS 2015. LNCS, vol. 9514, pp. 15–33. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31271-2_2
Cagli, E., Dumas, C., Prouff, E.: Convolutional neural networks with data augmentation against jitter-based countermeasures. In: Fischer, W., Homma, N. (eds.) CHES 2017. LNCS, vol. 10529, pp. 45–68. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66787-4_3
Cagli, E., Dumas, C., Prouff, E.: Kernel discriminant analysis for information extraction in the presence of masking. In: Lemke-Rust, K., Tunstall, M. (eds.) CARDIS 2016. LNCS, vol. 10146, pp. 1–22. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-54669-8_1
Chari, S., Rao, J.R., Rohatgi, P.: Template attacks. In: Kaliski, B.S., Koç, K., Paar, C. (eds.) CHES 2002. LNCS, vol. 2523, pp. 13–28. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-36400-5_3
Choudary, M.O., Kuhn, M.G.: Efficient stochastic methods: profiled attacks beyond 8 bits. In: Joye, M., Moradi, A. (eds.) CARDIS 2014. LNCS, vol. 8968, pp. 85–103. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16763-3_6
Choudary, O., Kuhn, M.G.: Efficient template attacks. In: Francillon, A., Rohatgi, P. (eds.) CARDIS 2013. LNCS, vol. 8419, pp. 253–270. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-08302-5_17
Durvaux, F., Renauld, M., Standaert, F.-X., van Oldeneel tot Oldenzeel, L., Veyrat-Charvillon, N.: Efficient removal of random delays from embedded software implementations using hidden Markov models. In: Mangard, S. (ed.) CARDIS 2012. LNCS, vol. 7771, pp. 123–140. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-37288-9_9
Eisenbarth, T., Paar, C., Weghenkel, B.: Building a side channel based disassembler. In: Gavrilova, M.L., Tan, C.J.K., Moreno, E.D. (eds.) Transactions on Computational Science X. LNCS, vol. 6340, pp. 78–99. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-17499-5_4
Gilmore, R., Hanley, N., O’Neill, M.: Neural network based attack on a masked implementation of AES. In: 2015 IEEE International Symposium on Hardware Oriented Security and Trust (HOST), pp. 106–111, May 2015. https://doi.org/10.1109/HST.2015.7140247
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. Adaptive Computation and Machine Learning Series. MIT Press, Cambridge (2017)
Hardt, M.: Off the convex path. http://offconvex.github.io/
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 [cs], 22 December 2014
Kocher, P., Jaffe, J., Jun, B.: Differential power analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48405-1_25
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Bengio, Y.: Convolutional networks for images, speech, and time series. In: The Handbook of Brain Theory and Neural Networks, pp. 255–258. MIT Press, Cambridge (1998). http://dl.acm.org/citation.cfm?id=303568.303704
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015). https://doi.org/10.1038/nature14539. http://www.nature.com/articles/nature14539
Lerman, L., Bontempi, G., Markowitch, O.: A machine learning approach against amasked AES: reaching the limit of side-channel attacks with a learningmodel. J. Cryptographic Eng. 5(2), 123–139 (2015). https://doi.org/10.1007/s13389-014-0089-3
Maghrebi, H., Portigliatti, T., Prouff, E.: Breaking cryptographic implementations using deep learning techniques. In: Carlet, C., Hasan, M.A., Saraswat, V. (eds.) SPACE 2016. LNCS, vol. 10076, pp. 3–26. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49445-6_1
Mangard, S., Oswald, E., Popp, T.: Power Analysis Attacks: Revealing the Secrets of Smart Cards. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-38162-6. OCLC: ocm71541637
Martinasek, Z., Dzurenda, P., Malina, L.: Profiling power analysis attack based on MLP in DPA contest v4.2. In: 2016 39th International Conference on Telecommunications and Signal Processing (TSP), pp. 223–226, June 2016. https://doi.org/10.1109/TSP.2016.7760865
Mather, L., Oswald, E., Bandenburg, J., Wójcik, M.: Does my device leak information? An a priori statistical power analysis of leakage detection tests. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013. LNCS, vol. 8269, pp. 486–505. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-42033-7_25
Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digit. Sig. Process. 73, 1–15 (2018). https://doi.org/10.1016/j.dsp.2017.10.011. http://linkinghub.elsevier.com/retrieve/pii/S1051200417302385
Moradi, A., Richter, B., Schneider, T., Standaert, F.X.: Leakage detection with the x2-test. IACR Trans. Cryptographic Hardware Embed. Syst. 2018(1), 209–237 (2018)
Nagashima, S., Homma, N., Imai, Y., Aoki, T., Satoh, A.: DPA using phase-based waveform matching against random-delay countermeasure. In: 2007 IEEE International Symposium on Circuits and Systems, pp. 1807–1810, May 2007. https://doi.org/10.1109/ISCAS.2007.378024
Paszke, A., et al.: Automatic differentiation in Pytorch. In: NIPS-W (2017)
Picek, S., Heuser, A., Jovic, A., Bhasin, S., Regazzoni, F.: The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Trans. Cryptographic Hardware Embed. Syst. 2019(1), 209–237 (2018). https://doi.org/10.13154/tches.v2019.i1.209-237. https://tches.iacr.org/index.php/TCHES/article/view/7339
Picek, S., Samiotis, I.P., Heuser, A., Kim, J., Bhasin, S., Legay, A.: On the performance of deep learning for side-channel analysis. IACR Cryptology ePrint Archive 2018, 4 (2018). http://eprint.iacr.org/2018/004
Prouff, E., Rivain, M., Bevan, R.: Statistical analysis of second order differential power analysis. IEEE Trans. Comput. 58(6), 799–811 (2009). https://doi.org/10.1109/TC.2009.15. http://ieeexplore.ieee.org/document/4752810/
Prouff, E., Strullu, R., Benadjila, R., Cagli, E., Dumas, C.: Study of deep learning techniques for side-channel analysis and introduction to ASCAD database. IACR Cryptology ePrint Archive 2018, 53 (2018). http://eprint.iacr.org/2018/053
Rivain, M., Prouff, E., Doget, J.: Higher-order masking and shuffling for software implementations of block ciphers. In: Clavier, C., Gaj, K. (eds.) CHES 2009. LNCS, vol. 5747, pp. 171–188. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04138-9_13
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theoryto Algorithms. Cambridge University Press (2014). https://doi.org/10.1017/CBO9781107298019. http://ebooks.cambridge.org/ref/id/CBO9781107298019
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv:1312.6034 [cs], 20 December 2013
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv:1412.6806 [cs], 21 December 2014
Standaert, F.-X., Archambeau, C.: Using subspace-based template attacks to compare and combine power and electromagnetic information leakages. In: Oswald, E., Rohatgi, P. (eds.) CHES 2008. LNCS, vol. 5154, pp. 411–425. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85053-3_26
Standaert, F.-X., Malkin, T.G., Yung, M.: A unified framework for the analysis of side-channel key recovery attacks. In: Joux, A. (ed.) EUROCRYPT 2009. LNCS, vol. 5479, pp. 443–461. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01001-9_26
van Woudenberg, J.G.J., Witteman, M.F., Bakker, B.: Improving differential power analysis by elastic alignment. In: Kiayias, A. (ed.) CT-RSA 2011. LNCS, vol. 6558, pp. 104–119. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19074-2_8. http://dl.acm.org/citation.cfm?id=1964621.1964632
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. arXiv:1311.2901 [cs], 12 November 2013
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2929, June 2016. https://doi.org/10.1109/CVPR.2016.319
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A Profiling Attacks
As the model is aiming at approximating the conditional pdf, a Maximum Likelihood score can be used for the guessing:
Based on these scores, the key hypotheses are ranked in a decreasing order. Finally, the attacker chooses the key that is ranked first (resp. the set of first o ranked keys). More generally, the rank \(g_{S_{a}}(k^{\star })\) of the correct key hypothesis \(k^{\star }\) is defined as:
Remark 2
In practice, to compute \(\mathrm {GE}(N_a)\), sampling many attack sets may be very prohibitive in an evaluation context, especially if we need to reproduce the estimations for many values of \(N_a\); one solution to circumvent this problem is, given a validation set \(S_{v}\) of \(N_v\) traces, to sample some attack sets by permuting the order of the traces into the validation set. can then be computed with a cumulative sum to get a score for each \(N_a\in [|1, N_v|]\), and so is \(g_{S_{a}}(k^{\star })\). While this trick gives good estimations for \(N_a\ll N_v\), one has to keep in mind that the estimates become biased when \(N_a\rightarrow N_v\). This problem also happens in Machine Learning when one lacks data to validate a model. A technique called Cross-Validation [34] enables to circumvent this problem by splitting the dataset into q parts called folds. The profiling is done on \(q-1\) folds and the model is evaluated with the remaining fold. By repeating this step q times, the measured results can be averaged so that they are less biased.
B Study of an Optimal Model
Informally, Assumption 1 tells that the leaking information is non-uniformly distributed over the trace \(\mathbf X \), i.e. only a few coordinates contain clues about the attacked sensitive variable. Assumption 1 has been made in many studies such as [4]. Depending on the countermeasures implemented into the attacked device, the nature of \(\mathcal {I}_{Z}\) may be precised. Without any countermeasure, and supposing that the target sensitive variable only leaks once, Assumption 1 states that \(\mathcal {I}_{Z}\) is only a set of contiguous and constant coordinates, regardless the input traces.
Adding masking will split \(\mathcal {I}_{Z}\) into several contiguous and fixed sets whose number is equal to the number of shares in the masking scheme (or at least equal to the number of shares if we relax the hypothesis of one leakage per share). For example if M (resp. \(Z \oplus M\)) denotes the mask (resp. masked data) variable leaking at coordinate \(t_1\) (resp. \(t_2\)), then M and X[t] with \(t \ne t_1\) are independent (resp. Z and X[t] with \(t \ne t_2\) are independent). The conditional probability \(\mathrm {Pr}[Z= z\vert \mathbf X = \mathbf x ]\) satisfies:
Adding de-synchronization should force \(\mathcal {I}_{Z}\) to be non-constant between each trace.
Likewise, Assumption 2 is realistic because it is a direct corollary of a Gaussian leakage model for the traces [7, 9]. Such an hypothesis is common for Side Channel Analysis [7]. It implies that \(\mathbf x \mapsto \mathrm {Pr}[\mathbf X = \mathbf x | Z= z]\) is differentiable and:
where \(\mu _{z}\) and \(\varSigma _{z}^{-1}\) respectively denote the mean vector and the covariance matrix of the normal probability distribution related to the target sensitive value hypothesis \(z\). Then, from Bayes’ theorem, (11) and the basic rules for derivatives computation, it gives an analytic expression of \(\nabla _\mathbf{x }F^*(\mathbf x )\), thereby proving that \(F^*\) is differentiable with respect to the input trace.
C Neural Networks
Neural Networks (NN) are nowadays the privileged tool to address the classification problem in Machine Learning [19]. They aim at constructing a function \(F:\mathcal {X}\rightarrow \mathcal {P}(\mathcal {Z})\) that takes data \(\mathbf x \) and outputs vectors \(\mathbf y \) of scores. The classification of \(\mathbf x \) is done afterwards by choosing the label , but the output can be also directly used for soft decision contexts, which corresponds more to Side Channel Analysis as the NN outputs on attack traces will be used to compute the score vector in (8). In general \(F\) is obtained by combining several simpler functions, called layers. An NN has an input layer (the identity over the input datum \(\mathbf x \), an output layer (the last function, whose output is the scores vector \(\mathbf y \) and all other layers are called hidden layers. The nature (the number and the dimension) of the layers is called the architecture of the NN. All the parameters that define an architecture, together with some other parameters that govern the training phase, have to be carefully set by the attacker, and are called hyper-parameters. The so-called neurons, that give the name to the NNs, are the computational units of the network and essentially process a scalar product between the coordinate of its input and a vector of trainable weights (or simply weights) that have to be trained. We denote \(\theta \) the vector containing all the trainable weights. Therefore, for a fixed architecture, an NN is completely parameterized by \(\theta \). Convolutional Neural Networks (CNN) implement other operations, but can be rewritten as regular NN with specific constraints on the weights [18]. Each layer processes some neurons and the outputs of the neuron evaluations will form new input vectors for the subsequent layer.
The ability of a Neural Network to approximate well a target probabilistic function \(F^*\) by minimizing a loss function on sampled training data with Stochastic Gradient Descent is still an open question. This is what we call the mystery of Deep Learning. It theoretically requires a huge quantity of training data so that the solution obtained by loss minimization generalizes well, though it empirically works with much less data. Likewise, finding the minimum with Stochastic Gradient Descent is theoretically not proved, but has been empirically shown to be a good heuristic. For more information, see [14]. Indeed, though it raises several theoretical issues, it has been empirically shown to be efficient, especially in SCA with CNN based attacks [5, 30].
D Experimental Results
1.1 D.1 The Jacobian Matrix
In this appendix, we present the Jacobian matrix visualization, equivalent to the GV. It shows, in addition, that some target values seem more sensitive, especially those whose Hamming weight is shared by only few other values (so it gives clues about how the traces leak sensitive information). Figure 8 (top) shows such a matrix in application context (Exp. 1) as described in Sect. 6, while Fig. 8 (bottom) shows the Jacobian matrix corresponding to the application context (Exp. 2). Fig. 9 shows the SNR computed on de-synchronized traces.
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Masure, L., Dumas, C., Prouff, E. (2019). Gradient Visualization for General Characterization in Profiling Attacks. In: Polian, I., Stöttinger, M. (eds) Constructive Side-Channel Analysis and Secure Design. COSADE 2019. Lecture Notes in Computer Science(), vol 11421. Springer, Cham. https://doi.org/10.1007/978-3-030-16350-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-16350-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16349-5
Online ISBN: 978-3-030-16350-1
eBook Packages: Computer ScienceComputer Science (R0)