Do Humans Look Where Deep Convolutional Neural Networks “Attend”?

  • Mohammad K. EbrahimpourEmail author
  • J. Ben Falandays
  • Samuel Spevack
  • David C. Noelle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11845)


Deep Convolutional Neural Networks (CNNs) have recently begun to exhibit human level performance on some visual perception tasks. Performance remains relatively poor, however, on some vision tasks, such as object detection: specifying the location and object class for all objects in a still image. We hypothesized that this gap in performance may be largely due to the fact that humans exhibit selective attention, while most object detection CNNs have no corresponding mechanism. In examining this question, we investigated some well-known attention mechanisms in the deep learning literature, identifying their weaknesses and leading us to propose a novel attention algorithm called the Densely Connected Attention Model. We then measured human spatial attention, in the form of eye tracking data, during the performance of an analogous object detection task. By comparing the learned representations produced by various CNN architectures with that exhibited by human viewers, we identified some relative strengths and weaknesses of the examined computational attention mechanisms. Some CNNs produced attentional patterns somewhat similar to those of humans. Others focused processing on objects in the foreground. Still other CNN attentional mechanisms produced usefully interpretable internal representations. The resulting comparisons provide insights into the relationship between CNN attention algorithms and the human visual system.


Visual spatial attention Computer vision Convolutional Neural Networks Densely connected attention maps Class Activation Maps Sensitivity analysis 


  1. 1.
    Brainard, D.H., Vision, S.: The psychophysics toolbox. Spat. Vis. 10, 433–436 (1997)CrossRefGoogle Scholar
  2. 2.
    Ebrahimpour, M.K., et al.: Ventral-dorsal neural networks: object detection via selective attention. In: WACV (2019)Google Scholar
  3. 3.
    Erhan, D., Bengio, Y., Courville, A., Vincent, P.: Visualizing higher-layer features of a deep network. Univ. Montreal 1341(3), 1 (2009)Google Scholar
  4. 4.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  5. 5.
    Nguyen, A., Yosinski, J., Clune, J.: Multifaceted feature visualization: uncovering the different types of features learned by each neuron in deep neural networks. arXiv preprint arXiv:1602.03616 (2016)
  6. 6.
    O’Reilly, R.C., Munakata, Y.: Computational Explorations in Cognitive Neuroscience: Understanding the Mind by Simulating the Brain. MIT Press, Cambridge (2000)CrossRefGoogle Scholar
  7. 7.
    Rajaei, K., Mohsenzadeh, Y., Ebrahimpour, R., Khaligh-Razavi, S.M.: Beyond core object recognition: Recurrent processes account for object recognition under occlusion, p. 302034 (2018). bioRxivGoogle Scholar
  8. 8.
    Rumelhart, D.E., McClelland, J.L.: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations. MIT Press, Cambridge (1986)Google Scholar
  9. 9.
    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: ICCV (2017)Google Scholar
  10. 10.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034 (2013)
  11. 11.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  12. 12.
    Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. arXiv preprint arXiv:1412.6806 (2014)
  13. 13.
    Yamins, D.L.K., Hong, H., Cadieu, C.F., Solomon, E.A., Seibert, D., DiCarlo, J.J.: Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc. Natl. Acad. Sci. 111(23), 8619–8624 (2014)CrossRefGoogle Scholar
  14. 14.
    Yosinski, J., Clune, J., Nguyen, A., Fuchs, T., Lipson, H.: Understanding neural networks through deep visualization. arXiv preprint arXiv:1506.06579 (2015)
  15. 15.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)Google Scholar
  16. 16.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Mohammad K. Ebrahimpour
    • 1
    Email author
  • J. Ben Falandays
    • 2
  • Samuel Spevack
    • 2
  • David C. Noelle
    • 1
    • 2
  1. 1.EECSUniversity of CaliforniaMercedUSA
  2. 2.Cognitive and Information SciencesUniversity of CaliforniaMercedUSA

Personalised recommendations