Abstract
The adoption of deep convolutional neural networks (CNN) is growing exponentially in wide varieties of applications due to exceptional performance that equals to or is better than classical machine learning as well as a human. However, such models are difficult to interpret, susceptible to overfit, and hard to decode failure. An increasing body of literature, such as class activation map (CAM), focused on understanding what representations or features a model learned from the data. This paper presents novel Eigen-CAM to enhance explanations of CNN predictions by visualizing principal components of learned representations from convolutional layers. The Eigen-CAM is intuitive, easy to use, computationally efficient, and does not require correct classification by the model. Eigen-CAM can work with all CNN models without the need to modify layers or retrain models. For the task of generating a visual explanation of CNN predictions, compared to state-of-the-art methods, Eigen-CAM is more consistent, class discriminative, and robust against classification errors made by dense layers. Empirical analyses and comparison with the best state-of-the-art methods show up to 12% improvement in weakly-supervised object localization, an average of 13% improvement in weakly-supervised segmentation, and at least 15% improvement in generic object proposal.
Similar content being viewed by others
References
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition. 2016. p. 770–8.
Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM. 2017. https://doi.org/10.1145/3065386.
Wang Q, Li Q, Li X. Hyperspectral image super-resolution using spectrum and feature context. IEEE Trans Industr Electron. 2020. https://doi.org/10.1109/TIE.2020.3038096.
Girshick R. Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision. 2015. p. 1440–8.
Ren S, He K, Girshick R, Sun J. Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems. 2015. p. 91–9.
Liu W, Anguelov D, Erhan D, et al. SSD: single shot multibox detector. arXiv. 2016. https://doi.org/10.1007/978-3-319-46448-0_2.
Wang Q, Gao J, Lin W, Li X. NWPU-crowd: a large-scale benchmark for crowd counting and localization. IEEE Trans Pattern Anal Mach Intell. 2020. https://doi.org/10.1109/TPAMI.2020.3013269.
Aneja J, Deshpande A, Schwing AG. Convolutional image captioning. In: IEEE/CVF conference on computer vision and pattern recognition. 2018, pp 5561–5570
Fang H, Gupta S, Iandola F, et al. From captions to visual concepts and back. IEEE conference on computer vision and pattern recognition (CVPR). 2015, pp 1473–1482
Johnson J, Karpathy A, Fei-Fei L. DenseCap: fully convolutional localization networks for dense captioning. In: IEEE conference on computer vision and pattern recognition. 2016, pp 4565–4574
Vinyals O, Toshev A, Bengio S, Erhan D. Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell. 2017. https://doi.org/10.1109/TPAMI.2016.2587640.
Ronneberger O, Fischer P, Brox T. U-Net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention – MICCAI 2015. Springer: Cham; 2015. p. 234–41.
Han C, Duan Y, Tao X, Lu J. Dense convolutional networks for semantic segmentation. IEEE Access. 2019. https://doi.org/10.1109/ACCESS.2019.2908685.
Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2015, pp 3431–3440
Abdulla W. Title of subordinate document. In: Mask_RCNN: mask R-CNN for object detection and instance segmentation on Keras and TensorFlow. Matterport. 2017. https://github.com/matterport/Mask_RCNN. Accessed 18 Dec 2020
Wang J, Liu Z, Chorowski J, et al. Robust 3D action recognition with random occupancy patterns. In: Fitzgibbon A, Lazebnik S, Perona P, et al., editors. Computer vision—ECCV 2012. Berlin: Springer; 2012. p. 872–85.
Xia L, Chen C-C, Aggarwal JK. View invariant human action recognition using histograms of 3D joints. In: IEEE computer society conference on computer vision and pattern recognition workshops. 2012, pp 20–27
Antol S, Agrawal A, Lu J, et al. VQA: visual question answering. In: IEEE international conference on computer vision (ICCV). 2015, pp 2425–2433
Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering. In: IEEE/CVF conference on computer vision and pattern recognition. 2018, pp 6077–6086
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:2015.14091556 [cs].
Muhammad MB, Yeasin M. Eigen-CAM: class activation map using principal components. In: International joint conference on neural networks (IJCNN). 2020, pp 1–7
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: visualising image classification models and saliency maps. In: International conference on learning representations. 2014. p. 1–8.
Zeiler MD, Taylor GW, Fergus R. Adaptive deconvolutional networks for mid and high level feature learning. In: IEEE international conference on computer vision. 2011, pp 2018–2025
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M. Striving for simplicity: the all convolutional net. 2014. arXiv preprint arXiv:1412.6806.
Zhou B, Khosla A, Lapedriza A, et al. Learning deep features for discriminative localization. In: IEEE conference on computer vision and pattern recognition (CVPR). 2016, pp 2921–2929
Selvaraju RR, Cogswell M, Das A, et al. Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. 2017. p. 618–626.
Chattopadhyay A, Sarkar A, Howlader P, Balasubramanian VN. Grad-CAM++: improved visual explanations for deep convolutional networks. In: IEEE winter conference on applications of computer vision (WACV). 2018. https://doi.org/10.1109/WACV.2018.00097
Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences. In: Proceedings of the 34th international conference on machine learning. 2017. p. 3145–53.
Bach S, Binder A, Montavon G, et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE. 2015;10:e0130140. https://doi.org/10.1371/journal.pone.0130140.
Mopuri KR, Garg U, Venkatesh BR. CNN fixations: an unraveling approach to visualize the discriminative image regions. IEEE Trans Image Process. 2019. https://doi.org/10.1109/TIP.2018.2881920.
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E. Deep learning for computer vision: a brief review. Comput Intell Neurosci. 2018. https://doi.org/10.1155/2018/7068349.
Russakovsky O, Deng J, Su H, et al. ImageNet large scale visual recognition challenge. Int J Comput Vis. 2015. https://doi.org/10.1007/s11263-015-0816-y.
Szegedy C, Wei Liu, Yangqing Jia, et al. Going deeper with convolutions. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR). 2015. p. 1–9.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). 2017. p. 2261–9.
Cholakkal H, Johnson J, Rajan D. Backtracking ScSPM image classifier for weakly supervised top-down saliency. In: IEEE conference on computer vision and pattern recognition (CVPR). 2016, pp 5278–5287
Marszałek M, Schmid C. Accurate object recognition with shape masks. Int J Comput Vis. 2012. https://doi.org/10.1007/s11263-011-0479-2.
Bazzani L, Bergamo A, Anguelov D, Torresani L. Self-taught object localization with deep networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). 2016. p. 1–9.
Pinheiro PO, Collobert R, Dollár P. Learning to segment object candidates. In: Advances in neural information processing systems. 2015. p. 1990–8.
Zhang J, Bargal SA, Lin Z, et al. Top-down neural attention by excitation backprop. Int J Comput Vis. 2018;126:1084–102.
Everingham M, Gool L, Williams CK, et al. The Pascal visual object classes (VOC) challenge. Int J Comput Vis. 2010. https://doi.org/10.1007/s11263-009-0275-4.
Moosavi-Dezfooli S-M, Fawzi A, Frossard P. Deepfool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 2574–82.
Acknowledgements
The authors thank Felix Havugimana for having helpful discussions while conducting the performance evaluation for this research.
Funding
The authors acknowledge the funding and research support provided by the Dept. of EECE at the Herff College of Engineering, University of Memphis.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bany Muhammad, M., Yeasin, M. Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks. SN COMPUT. SCI. 2, 47 (2021). https://doi.org/10.1007/s42979-021-00449-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-021-00449-3