Skip to main content
Log in

Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

The adoption of deep convolutional neural networks (CNN) is growing exponentially in wide varieties of applications due to exceptional performance that equals to or is better than classical machine learning as well as a human. However, such models are difficult to interpret, susceptible to overfit, and hard to decode failure. An increasing body of literature, such as class activation map (CAM), focused on understanding what representations or features a model learned from the data. This paper presents novel Eigen-CAM to enhance explanations of CNN predictions by visualizing principal components of learned representations from convolutional layers. The Eigen-CAM is intuitive, easy to use, computationally efficient, and does not require correct classification by the model. Eigen-CAM can work with all CNN models without the need to modify layers or retrain models. For the task of generating a visual explanation of CNN predictions, compared to state-of-the-art methods, Eigen-CAM is more consistent, class discriminative, and robust against classification errors made by dense layers. Empirical analyses and comparison with the best state-of-the-art methods show up to 12% improvement in weakly-supervised object localization, an average of 13% improvement in weakly-supervised segmentation, and at least 15% improvement in generic object proposal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition. 2016. p. 770–8.

  2. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017. https://doi.org/10.1145/3065386.

    Article  Google Scholar 

  3. Wang Q, Li Q, Li X. Hyperspectral image super-resolution using spectrum and feature context. IEEE Trans Industr Electron. 2020. https://doi.org/10.1109/TIE.2020.3038096.

    Article  Google Scholar 

  4. Girshick R. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2015. p. 1440–8.

  5. Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. 2015. p. 91–9.

  6. Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. arXiv. 2016. https://doi.org/10.1007/978-3-319-46448-0_2.

    Article  Google Scholar 

  7. Wang Q, Gao J, Lin W, Li X. NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Pattern Anal Mach Intell. 2020. https://doi.org/10.1109/TPAMI.2020.3013269.

    Article  Google Scholar 

  8. Aneja J, Deshpande A, Schwing AG. Convolutional image captioning. In: IEEE/CVF conference on computer vision and pattern recognition. 2018, pp 5561–5570

  9. Fang H, Gupta S, Iandola F, et al. From captions to visual concepts and back. IEEE conference on computer vision and pattern recognition (CVPR). 2015, pp 1473–1482

  10. Johnson J, Karpathy A, Fei-Fei L. DenseCap: fully convolutional localization networks for dense captioning. In: IEEE conference on computer vision and pattern recognition. 2016, pp 4565–4574

  11. Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell. 2017. https://doi.org/10.1109/TPAMI.2016.2587640.

    Article  Google Scholar 

  12. Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention – MICCAI 2015. Springer: Cham; 2015. p. 234–41.

    Chapter  Google Scholar 

  13. Han C, Duan Y, Tao X, Lu J. Dense convolutional networks for semantic segmentation. IEEE Access. 2019. https://doi.org/10.1109/ACCESS.2019.2908685.

    Article  Google Scholar 

  14. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp 3431–3440

  15. Abdulla W. Title of subordinate document. In: Mask_RCNN: mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. Matterport. 2017. https://github.com/matterport/Mask_RCNN. Accessed 18 Dec 2020

  16. Wang J, Liu Z, Chorowski J, et al. Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon A, Lazebnik S, Perona P, et al., editors. Computer vision—ECCV 2012. Berlin: Springer; 2012. p. 872–85.

    Chapter  Google Scholar 

  17. Xia L, Chen C-C, Aggarwal JK. View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops. 2012, pp 20–27

  18. Antol S, Agrawal A, Lu J, et al. VQA: visual question answering. In: IEEE international conference on computer vision (ICCV). 2015, pp 2425–2433

  19. Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering. In: IEEE/CVF conference on computer vision and pattern recognition. 2018, pp 6077–6086

  20. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:2015.14091556 [cs].

  21. Muhammad MB, Yeasin M. Eigen-CAM: class activation map using principal components. In: International joint conference on neural networks (IJCNN). 2020, pp 1–7

  22. Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. In: International conference on learning representations. 2014. p. 1–8.

  23. Zeiler MD, Taylor GW, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning. In: IEEE international conference on computer vision. 2011, pp 2018–2025

  24. Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. 2014. arXiv preprint arXiv:1412.6806.

  25. Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. In: IEEE conference on computer vision and pattern recognition (CVPR). 2016, pp 2921–2929

  26. Selvaraju RR, Cogswell M, Das A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 618–626.

  27. Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-CAM++: improved visual explanations for deep convolutional networks. In: IEEE winter conference on applications of computer vision (WACV). 2018. https://doi.org/10.1109/WACV.2018.00097

  28. Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: Proceedings of the 34th international conference on machine learning. 2017. p. 3145–53.

  29. Bach S, Binder A, Montavon G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE. 2015;10:e0130140. https://doi.org/10.1371/journal.pone.0130140.

    Article  Google Scholar 

  30. Mopuri KR, Garg U, Venkatesh BR. CNN fixations: an unraveling approach to visualize the discriminative image regions. IEEE Trans Image Process. 2019. https://doi.org/10.1109/TIP.2018.2881920.

    Article  Google Scholar 

  31. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci. 2018. https://doi.org/10.1155/2018/7068349.

    Article  Google Scholar 

  32. Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015. https://doi.org/10.1007/s11263-015-0816-y.

    Article  MathSciNet  Google Scholar 

  33. Szegedy C, Wei Liu, Yangqing Jia, et al. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). 2015. p. 1–9.

  34. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). 2017. p. 2261–9.

  35. Cholakkal H, Johnson J, Rajan D. Backtracking ScSPM image classifier for weakly supervised top-down saliency. In: IEEE conference on computer vision and pattern recognition (CVPR). 2016, pp 5278–5287

  36. Marszałek M, Schmid C. Accurate object recognition with shape masks. Int J Comput Vis. 2012. https://doi.org/10.1007/s11263-011-0479-2.

    Article  MathSciNet  Google Scholar 

  37. Bazzani L, Bergamo A, Anguelov D, Torresani L. Self-taught object localization with deep networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). 2016. p. 1–9.

  38. Pinheiro PO, Collobert R, Dollár P. Learning to segment object candidates. In: Advances in neural information processing systems. 2015. p. 1990–8.

  39. Zhang J, Bargal SA, Lin Z, et al. Top-down neural attention by excitation backprop. Int J Comput Vis. 2018;126:1084–102.

    Article  Google Scholar 

  40. Everingham M, Gool L, Williams CK, et al. The Pascal visual object classes (VOC) challenge. Int J Comput Vis. 2010. https://doi.org/10.1007/s11263-009-0275-4.

    Article  Google Scholar 

  41. Moosavi-Dezfooli S-M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 2574–82.

Download references

Acknowledgements

The authors thank Felix Havugimana for having helpful discussions while conducting the performance evaluation for this research.

Funding

The authors acknowledge the funding and research support provided by the Dept. of EECE at the Herff College of Engineering, University of Memphis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Bany Muhammad.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bany Muhammad, M., Yeasin, M. Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks. SN COMPUT. SCI. 2, 47 (2021). https://doi.org/10.1007/s42979-021-00449-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00449-3

Keywords

Navigation