Abstract
Layer-wise relevance propagation (LRP) has shown potential for explaining neural network classifier decisions. In this paper, we investigate how LRP is to be applied to deep neural network which makes use of batch normalization (BatchNorm), and show that despite the functional simplicity of BatchNorm, several intuitive choices of published LRP rules perform poorly for a number of frequently used state of the art networks. Also, we show that by using the \(\varepsilon \)-rule for BatchNorm layers we are able to detect training artifacts for MobileNet and layer design artifacts for ResNet. The causes for such failures are analyzed deeply and thoroughly. We observe that some assumptions on the LRP decomposition rules are broken given specific networks, and propose a novel LRP rule tailored for BatchNorm layers. Our quantitatively evaluated results show advantage of our novel LRP rule for BatchNorm layers and its wide applicability to common deep neural network architectures. As an aside, we demonstrate that one observation made by LRP analysis serves to modify a ResNet for faster initial training convergence.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), 1–46 (2015). https://doi.org/10.1371/journal.pone.0130140
Binder, A., Montavon, G., Lapuschkin, S., Müller, K.-R., Samek, W.: Layer-wise relevance propagation for neural networks with local renormalization layers. In: Villa, A.E.P., Masulli, P., Pons Rivero, A.J. (eds.) ICANN 2016. LNCS, vol. 9887, pp. 63–71. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-44781-0_8
Bjorck, J., Gomes, C., Selman, B., Weinberger, K.Q.: Understanding batch normalization (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications (2017)
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. arXiv preprint arXiv:1608.06993 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1106–1114 (2012)
Montavon, G., Bach, S., Binder, A., Samek, W., Müller, K.R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008
Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Mller, K.R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28(11), 2660–2673 (2017)
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. CoRR abs/1704.02685 (2017). http://arxiv.org/abs/1704.02685
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Smilkov, D., Thorat, N., Kim, B., Vigas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise (2017)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.A.: Striving for simplicity: the all convolutional net. CoRR abs/1412.6806 (2014), http://arxiv.org/abs/1412.6806
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-resnet and the impact of residual connections on learning (2016)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hui, L.Y.W., Binder, A. (2019). BatchNorm Decomposition for Deep Neural Network Interpretation. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_24
Download citation
DOI: https://doi.org/10.1007/978-3-030-20518-8_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)