Gradient-Based Attribution Methods

  • Marco AnconaEmail author
  • Enea Ceolini
  • Cengiz Öztireli
  • Markus Gross
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11700)


The problem of explaining complex machine learning models, including Deep Neural Networks, has gained increasing attention over the last few years. While several methods have been proposed to explain network predictions, the definition itself of explanation is still debated. Moreover, only a few attempts to compare explanation methods from a theoretical perspective has been done. In this chapter, we discuss the theoretical properties of several attribution methods and show how they share the same idea of using the gradient information as a descriptive factor for the functioning of a model. Finally, we discuss the strengths and limitations of these methods and compare them with available alternatives.


Attribution methods Deep Neural Networks Explainable artificial intelligence 


  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  2. 2.
    Adebayo, J., Gilmer, J., Muelly, M., Goodfellow, I., Hardt, M., Kim, B.: Sanity checks for saliency maps. In: Advances in Neural Information Processing Systems, pp. 9524–9535 (2018)Google Scholar
  3. 3.
    Ancona, M., Ceolini, E., Oztireli, C., Gross, M.: Towards better understanding of gradient-based attribution methods for deep neural networks. In: 6th International Conference on Learning Representations (ICLR) (2018)Google Scholar
  4. 4.
    Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)CrossRefGoogle Scholar
  5. 5.
    Balduzzi, D., Frean, M., Leary, L., Lewis, J.P., Ma, K.W.D., McWilliams, B.: The shattered gradients problem: if resnets are the answer, then what is the question? In: Proceedings of the 34th International Conference on Machine Learning, ICML 2017, vol. 70, pp. 342–350 (2017).
  6. 6.
    Doshi-Velez, F., Kim, B.: Towards a rigorous science of interpretable machine learning. arXiv (2017).
  7. 7.
    Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3429–3437 (2017)Google Scholar
  8. 8.
    Ghorbani, A., Abid, A., Zou, J.: Interpretation of neural networks is fragile. In: AAAI 2019 (2019)CrossRefGoogle Scholar
  9. 9.
    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: International Conference on Learning Representations (ICLR) (2015)Google Scholar
  10. 10.
    Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a “right to explanation”. In: ICML Workshop on Human Interpretability in Machine Learning (WHI) (2016)Google Scholar
  11. 11.
    Kindermans, P., Schütt, K., Müller, K., Dähne, S.: Investigating the influence of noise and distractors on the interpretation of neural networks. In: NIPS Workshop on Interpretable Machine Learning in Complex Systems (2016)Google Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of NIPS, pp. 1097–1105 (2012)Google Scholar
  13. 13.
    Kutner, M.H., Nachtsheim, C., Neter, J.: Applied Linear Regression Models. McGraw-Hill/Irwin, New York (2004)Google Scholar
  14. 14.
    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)CrossRefGoogle Scholar
  15. 15.
    Lipton, Z.C.: The mythos of model interpretability. In: ICML Workshop on Human Interpretability of Machine Learning (2016)Google Scholar
  16. 16.
    Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. In: Guyon, I., et al. (eds.) Proceedings of Advances in Neural Information Processing Systems (NIPS), pp. 4765–4774 (2017)Google Scholar
  17. 17.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  18. 18.
    Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.R.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017)CrossRefGoogle Scholar
  19. 19.
    Montavon, G., Samek, W., Müller, K.R.: Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73, 1–15 (2018)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Nie, W., Zhang, Y., Patel, A.: A theoretical explanation for perplexing behaviors of back propagation-based visualizations. In: ICML 2018 (2018)Google Scholar
  21. 21.
    Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS Autodiff Workshop (2017)Google Scholar
  22. 22.
    Ribeiro, M.T., Singh, S., Guestrin, C.: “Why Should I Trust You?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, ACM, New York, NY, USA, pp. 1135–1144 (2016)Google Scholar
  23. 23.
    Roth, A.E.: The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press, Cambridge (1988)CrossRefGoogle Scholar
  24. 24.
    Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Networks Learn. Syst. 28(11), 2660–2673 (2017)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)Google Scholar
  26. 26.
    Shapley, L.S.: A value for n-person games. Contrib. Theory Games 2(28), 307–317 (1953)MathSciNetzbMATHGoogle Scholar
  27. 27.
    Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, vol. 70, pp. 3145–3153, 06–11 August 2017Google Scholar
  28. 28.
    Shrikumar, A., Greenside, P., Shcherbina, A., Kundaje, A.: Not just a black box: learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713 (2016)
  29. 29.
    Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  30. 30.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR Workshop (2014)Google Scholar
  31. 31.
    Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. In: ICML Workshop on Visualization for Deep Learning (2017)Google Scholar
  32. 32.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: ICLR 2015 Workshop (2015)Google Scholar
  33. 33.
    Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, vol. 70, pp. 3319–3328, 06–11 August 2017Google Scholar
  34. 34.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)Google Scholar
  35. 35.
    Szegedy, C., et al.: Intriguing properties of neural networks. In: International Conference on Learning Representations (ICLR) (2014)Google Scholar
  36. 36.
    Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017)CrossRefGoogle Scholar
  37. 37.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). Scholar
  38. 38.
    Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: prediction difference analysis. In: International Conference on Learning Representations (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Marco Ancona
    • 1
    Email author
  • Enea Ceolini
    • 1
    • 2
  • Cengiz Öztireli
    • 1
  • Markus Gross
    • 1
  1. 1.ETH ZürichZürichSwitzerland
  2. 2.University of ZürichZürichSwitzerland

Personalised recommendations