Encyclopedia of Computational Neuroscience

Living Edition
| Editors: Dieter Jaeger, Ranu Jung

Hierarchical Models of the Visual System

Living reference work entry

Latest version View entry history

DOI: https://doi.org/10.1007/978-1-4614-7320-6_345-2



Hierarchical models of the visual system are neural networks with a layered topology. The receptive fields of units (i.e., the region of visual space to which units respond) at one level of the hierarchy are constructed by combining inputs from units at a lower level. After a few processing stages, small receptive fields tuned to simple stimuli get combined to form larger receptive fields tuned to more complex stimuli. Such an anatomical and functional hierarchical architecture is a hallmark of the organization of the visual system. In feedforward networks, information flows in a bottom-up fashion – from lower to higher processing stages. In feedback networks, information is able to dynamically reenter processing stages via recurrent connections. Feedback connections can be broadly divided between horizontal or lateral...

This is a preview of subscription content, log in to check access.


  1. Amit Y, Mascaro M (2003) An integrated network for invariant visual detection and recognition. Vis Res 43(19):2073–2088CrossRefPubMedPubMedCentralGoogle Scholar
  2. Angelucci A, Shushruth S (2013) Beyond the Classical Receptive Field: Surround Modulation in Primary Visual Cortex . In J. S. Werner L. M. Chalupa (Eds.), The New Visual Neurosciences (pp. 425–444). Cambridge: MIT Press.Google Scholar
  3. Bengio Y, Lee D-H, Bornschein J, Lin Z (2015) Towards biologically plausible deep learning. Learning. arXiv:1502.04156 [cs.LG]Google Scholar
  4. Biederman I (1987) Recognition-by-components: a theory of human image understanding. Psychol Rev 94(2):115–147CrossRefPubMedPubMedCentralGoogle Scholar
  5. Cadena SA, Denfield GH, Walker EY, Gatys LA, Tolias AS, Bethge M, Ecker AS (2019) Deep convolutional models improve predictions of macaque V1 responses to natural images. PLoS Comput Biol 15(4):e1006897CrossRefPubMedPubMedCentralGoogle Scholar
  6. Cadieu CF, Hong H, Yamins DLK, Pinto N, Ardila D, Solomon EA, Majaj NJ, DiCarlo JJ (2014) Deep neural networks rival the representation of primate IT cortex for core visual object recognition. PLoS Comput Biol 10(12):e1003963CrossRefPubMedPubMedCentralGoogle Scholar
  7. Carandini M, Heeger DJ (1994) Summation and division by neurons in primate visual cortex. Science 264:1333–1336CrossRefPubMedPubMedCentralGoogle Scholar
  8. Carandini M, Heeger DJ (2012) Normalization as a canonical neural computation. Nature Reviews Neuroscience 13(1):51–62.  https://doi.org/10.1038/nrn3136
  9. Chen X, Han F, Poo M-m, Dan Y (2007) Excitatory and suppressive receptive field subunits in awake monkey primary visual cortex (V1). Proc Natl Acad Sci 104(48):19120–19125CrossRefPubMedPubMedCentralGoogle Scholar
  10. Cho K, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder decoder for statistical machine translationGoogle Scholar
  11. Cichy RM, Khosla A, Pantazis D, Torralba A (2016) Comparison of deep neural networks to spatio-temporal cortical dynamics of human visual object recognition reveals hierarchical correspondence. Sci Rep 6:27755CrossRefPubMedPubMedCentralGoogle Scholar
  12. Cichy RM, Khosla A, Pantazis D, Oliva A (2017) Dynamics of scene representations in the human brain revealed by magnetoencephalography and deep neural networks. NeuroImage 153:346–358CrossRefGoogle Scholar
  13. Clevenger PE, Hummel JE (2014) Working memory for relations among objects. Atten Percept Psychophys 76:1933–1953CrossRefPubMedPubMedCentralGoogle Scholar
  14. Devereux BJ, Clarke A, Tyler LK (2018) Integrated deep visual and semantic attractor neural networks predict fMRI pattern-information along the ventral object processing pathway. Sci Rep 8:10636CrossRefPubMedPubMedCentralGoogle Scholar
  15. DiCarlo JJ, Zoccolan D, Rust NC (2012) How does the brain solve visual object recognition? Neuron 73(3):415–434CrossRefPubMedPubMedCentralGoogle Scholar
  16. Donderi DONC, Zelnicker D (1969) Parallel processing in visual same-different. Percept Psychophys 5(4):197–200CrossRefGoogle Scholar
  17. Eberhardt S, Cader J, Serre T (2016) How deep is the feature analysis underlying rapid visual categorization ? In: Lee D, Sugiyama M, Luxburg UV, Guyon I, Garnett R (eds) Neural information processing systems. Curran Associates, Red Hook, pp 1100–1108Google Scholar
  18. Eickenberg M, Gramfort A, Varoquaux G, Thirion B (2017) Seeing it all: convolutional network layers map the function of the human visual system. NeuroImage 152:184–194CrossRefPubMedPubMedCentralGoogle Scholar
  19. Evans KK, Treisman A (2005) Perception of objects in natural scenes: is it really attention free? J Exp Psychol Hum Percept Perform 31(6):1476–1492CrossRefPubMedPubMedCentralGoogle Scholar
  20. Field DJ, Hayes A, Hess RF (1993) Contour integration by the human visual system: evidence for a local “association field”. Vis Res 33(2):173–193CrossRefGoogle Scholar
  21. Fodor JA, Pylyshyn ZW (1988) Connectionism and cognitive architecture: a critical analysis. Cognition 28(1–2):3–71CrossRefGoogle Scholar
  22. Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36:193–202CrossRefGoogle Scholar
  23. Gazzaley A, Nobre AC (2012) Top-down modulation: bridging selective attention and working memory. Trends Cogn Sci 16(2):129–135CrossRefGoogle Scholar
  24. Geirhos R, Temme CRM, Rauber J, Schütt HH, Bethge M, Wichmann FA (2018) Generalisation in humans and deep neural networks. In: NeurIPS. Curran Associates, Red HookGoogle Scholar
  25. Geman S (2006) Invariance and selectivity in the ventral visual pathway. J Physiol Paris 100(4):212–224CrossRefGoogle Scholar
  26. Geman D, Geman S, Hallonquist N, Younes L (2015) Visual Turing test for computer vision systems. Proc Natl Acad Sci 112(12):3618–3623CrossRefGoogle Scholar
  27. Giese MA, Poggio T (2003) Neural mechanisms for the recognition of biological movements. Nat Rev Neurosci 4(3):179–192CrossRefGoogle Scholar
  28. Gilbert CD, Li W (2013) Top-down influences on visual processing. Nat Rev Neurosci 14(5):350–363CrossRefGoogle Scholar
  29. Gilbert CD, Sigman M (2007) Brain states: top-down influences in sensory processing. Neuron 54(5):677–696CrossRefGoogle Scholar
  30. Greene MR, Hansen BC (2018) Shared spatiotemporal category representations in biological and artificial deep neural networks. PLoS Comput Biol 14(7)Google Scholar
  31. Grossberg S, Mingolla E (1985) Neural dynamics of form perception: boundary completion, illusory figures, and neon color spreading. Psychol Rev 92(2):173–211CrossRefPubMedPubMedCentralGoogle Scholar
  32. Grossberg S, Mingolla E (1987) Neural dynamics of surface perception: boundary webs, illuminants, and shape-from-shading. Comput Vis Graphics Image Process 37(1):116–165CrossRefGoogle Scholar
  33. Grossberg S, Raizada RD (2000) Contrast-sensitive perceptual grouping and object-based attention in the laminar circuits of primary visual cortex. Vis Res 40(10–12):1413–1432CrossRefPubMedPubMedCentralGoogle Scholar
  34. Grossberg S, Williamson JR (2001) A neural model of how horizontal and interlaminar connections of visual cortex develop into adult circuits that carry out perceptual grouping and learning. Cereb Cortex 11(1):37–58CrossRefPubMedPubMedCentralGoogle Scholar
  35. Grossberg S, Mingolla E, Pack C (1999) A neural model of motion processing and visual navigation by cortical area MST. Cereb Cortex 9(8):878–895CrossRefPubMedPubMedCentralGoogle Scholar
  36. Güçlü U, Gerven MAJV (2017) Increasingly complex representations of natural movies across the dorsal stream are shared between subjects. NeuroImage 145:329–336CrossRefPubMedPubMedCentralGoogle Scholar
  37. Guclu U, van Gerven MAJ (2015) Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream. J Neurosci 35(27):10005–10014CrossRefPubMedPubMedCentralGoogle Scholar
  38. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Computer Vision and Pattern Recognition; Artificial Intelligence; Learning, Santiago, Chile, IEEE, pp 2026–1034. Retrieved from http://arxiv.org/abs/1502.01852
  39. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. CoRR, abs/1603.05027Google Scholar
  40. Heeger DJ (1993) Modeling simple-cell direction selectivity with normalized, half-squared, linear operators. J Neurophysiol 70(5):1885–1898CrossRefPubMedPubMedCentralGoogle Scholar
  41. Hochreiter S, Hochreiter S, Schmidhuber J, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780CrossRefPubMedPubMedCentralGoogle Scholar
  42. Hochstein S, Ahissar M (2002) View from the top: hierarchies and reverse hierarchies in the visual system. Neuron 36(5):791–804CrossRefPubMedPubMedCentralGoogle Scholar
  43. Hong H, Yamins DLK, Majaj NJ, DiCarlo JJ (2016) Explicit information for category-orthogonal object properties increases along the ventral stream. Nat Neurosci 19(4):613–622CrossRefPubMedPubMedCentralGoogle Scholar
  44. Hubel D, Wiesel T (1962) Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160:106–154CrossRefPubMedPubMedCentralGoogle Scholar
  45. Hyötyniemi H (1996) Turing Machines are Recurrent Neural Networks. In Alander J, Honkela T, Jakobsson M (eds), STeP’96 Genes, Nets and Symbols. Vaasa: The Finnish Artificial Intelligence Society, pp 13–24. Retrieved from http://lipas.uwasa.fi/stes/step96/step96/hyotyniemi1/
  46. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML’15: Proceedings of the 32nd International Conference on Machine Learning (pp. 448–456). Lille, France, Proceedings of Machine Learning ResearchGoogle Scholar
  47. Jhuang, H., Serre, T., Wolf, L., Poggio, T. (2007). A biologically inspired system for action recognition. In Proceedings of the Eleventh IEEE International Conference on Computer Vision (pp. 1–8). Rio de Janiero, Brazil, IEEE. Retrieved from http://www.cnbc.cmu.edu/cns/papers/Jhuang_etal_iccv07.pdf https://arxiv.org/pdf/1811.09716.pdf
  48. Kalfas I, Kumar S, Vogels R (2017) Shape selectivity of middle superior temporal sulcus body patch neurons. eNeuro 4(3):0113–0117CrossRefGoogle Scholar
  49. Kar K, Kubilius J, Schmidt K, Issa EB, DiCarlo JJ (2019) Evidence that recurrent circuits are critical to the ventral stream’s execution of core object recognition behavior. Nature Neuroscience 22(6):974–983.  https://doi.org/10.1038/s41593-019-0392-5
  50. Khaligh-Razavi S-M, Kriegeskorte N (2014) Deep supervised, but not unsupervised, models may explain IT cortical representation. PLoS Comput Biol 10(11):e1003915CrossRefPubMedPubMedCentralGoogle Scholar
  51. Kim J, Ricci M, Serre T, Serre T (2018) Not-So-CLEVR: learning same different relations strains feedforward neural networks. Interface Focus 8:2018011CrossRefGoogle Scholar
  52. Kouh M, Poggio T (2008) A canonical neural circuit for cortical nonlinear operations. Neural Comput 20(6):1427–1451CrossRefGoogle Scholar
  53. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classification with deep convolutional neural networks. In: Neural information processing system, Lake TahoeGoogle Scholar
  54. Lamme VAF, Supèr H, Spekreijse H (1998) Feedforward, horizontal, and feedback processing in the visual cortex. Curr Opin Neurobiol 8(4):529–535CrossRefGoogle Scholar
  55. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444CrossRefGoogle Scholar
  56. Lee H, Ng AY (2008) Sparse deep belief net model for visual area V2. In: Advances in Neural Information Processing Systems 20. Vancouver, Curran Associates, pp 873–880. https://doi.org/ Scholar
  57. Liao Q, Poggio T (2016) Bridging the gaps between residual learning, recurrent neural networks and visual cortex. Technical report, Massachusetts Institute of TechnologyGoogle Scholar
  58. Liao Q, Leibo JZ, Poggio T (2015) How important is weight symmetry in backpropagation? Technical report 36Google Scholar
  59. Linsley D, Kim J, Veerabadran V, Windolf C, Serre T (2018) Learning long-range spatial dependencies with horizontal gated recurrent units. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Neural information processing systems. Red Hook, Curran Associates, pp 152–164Google Scholar
  60. Mallat S (2016) Understanding deep convolutional networks. Phil Trans R Soc A 374(20150203):1–17Google Scholar
  61. Marblestone AH, Wayne G, Kording KP (2016) Toward an integration of deep learning and neuroscience. Front Comput Neurosci 10:1–41CrossRefGoogle Scholar
  62. Marko H, Giebel H (1970) Recognition of handwritten characters with a system of homogeneous layers. Nachr Z 23:455–459Google Scholar
  63. Martinho A III, Kacelnik A (2016) Ducklings imprint on the relational concept of same or different. Science 353(6296):286–288CrossRefGoogle Scholar
  64. Masquelier T, Thorpe SJ (2007) Unsupervised learning of visual features through spike timing dependent plasticity. PLoS Comput Biol 3(2):e31CrossRefPubMedPubMedCentralGoogle Scholar
  65. Mel BW (1997) SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput 9:777–804CrossRefGoogle Scholar
  66. Mineault P, Khawaja F, Butts D, Pack C (2012) Hierarchical processing of complex motion along the primate dorsal visual pathway. Proc Natl Acad Sci 109(16):E972–E980CrossRefGoogle Scholar
  67. Nakamura H, Gattass R, Desimone R, Ungerleider LG (1993) The modular organization of projections areas V4 and TEO in macaques from areas VI and V2 to. The Journal of Neuroscience 13(9):3681–3691Google Scholar
  68. Nayebi A, Bear D, Kubilius J, Kar K, Ganguli S, Sussillo D, DiCarlo JJ, Yamins DLK (2018) Task-driven convolutional recurrent models of the visual system. In: Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Neural information processing systems. Curran Associates, Red HookGoogle Scholar
  69. O’Reilly RC, Wyatte D, Herd S, Mingus B, Jilk DJ (2013) Recurrent processing during object recognition. Front Psychol 4:1–14CrossRefGoogle Scholar
  70. Olshausen BA, Field DJ (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609CrossRefPubMedPubMedCentralGoogle Scholar
  71. Ostojic S, Brunel N (2011) From spiking neuron models to linear-nonlinear models. PLoS Comput Biol 7(1):e1001056CrossRefPubMedPubMedCentralGoogle Scholar
  72. Pack CC, Born RT (2008) Cortical mechanisms for the integration of visual motion. Elsevier, OxfordCrossRefGoogle Scholar
  73. Pennartz CMA, Dora S, Muckli L, Lorteije JAM (2019) Towards a unified view on pathways and functions of neural recurrent processing. Trends Neurosci 42:1–15CrossRefGoogle Scholar
  74. Perrett D, Oram M (1993) Neurophysiology of shape processing. Image Vis Comput 11(6):317–333CrossRefGoogle Scholar
  75. Perrone JA, Thiele A (2002) A model of speed tuning in MT neurons. Vis Res 42(8):1035–1051CrossRefPubMedPubMedCentralGoogle Scholar
  76. Rao RPN, Ballard DH (1999) Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2(1):79–87CrossRefPubMedPubMedCentralGoogle Scholar
  77. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You Only Look Once: Unified, Real-Time Object Detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, IEEE.  https://doi.org/10.1016/j.nima.2015.05.028
  78. Reid VM, Dunn K, Young RJ, Amu J, Donovan T, Reissland N (2017) The human fetus preferentially engages with face-like visual stimuli. Curr Biol 27(12):1825–1828.e3CrossRefPubMedPubMedCentralGoogle Scholar
  79. Rieke F, Warland D, de Ruyter van Steveninck R, Bialek W (1997) Spikes: exploring the neural code. MIT Press, Cambridge, MAGoogle Scholar
  80. Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nat Neurosci 2(11):1019–1025CrossRefPubMedPubMedCentralGoogle Scholar
  81. Ringach DL (2004) Mapping receptive fields in primary visual cortex. J Physiol 558(3):717–728CrossRefPubMedPubMedCentralGoogle Scholar
  82. Rosenfeld, A., Zemel, R., Tsotsos, J. K. (2018). The Elephant in the Room. arXiv:1808.03305v1 [cs.CV]Google Scholar
  83. Rust NC, Schwartz O, Movshon JA, Simoncelli EP (2005) Spatiotemporal elements of macaque v1 receptive fields. Neuron 46(6):945–956CrossRefPubMedPubMedCentralGoogle Scholar
  84. Rust NC, Mante V, Simoncelli EP, Movshon JA (2006) How MT cells analyze the motion of visual patterns. Nat Neurosci 9(11):1421–1431CrossRefPubMedPubMedCentralGoogle Scholar
  85. Series P, Lorenceau J, Frégnac Y (2003) The silent surround of V1 receptive fields: theory and experiments. J Physiol 97:453–474Google Scholar
  86. Serre T (2016) Models of visual categorization. Wiley Interdiscip Rev Cogn Sci 7:197–213CrossRefPubMedPubMedCentralGoogle Scholar
  87. Serre T (2019) Deep learning: the good, the bad, and the ugly. Annu Rev Vis Sci 5(1):399CrossRefPubMedPubMedCentralGoogle Scholar
  88. Serre T, Poggio T (2010) A neuromorphic approach to computer vision. Commun ACM 53(10):54CrossRefGoogle Scholar
  89. Serre T, Kreiman G, Kouh M, Cadieu C, Knoblich U, Poggio T (2007) A quantitative theory of immediate visual recognition. Prog Brain Res 165:33CrossRefPubMedPubMedCentralGoogle Scholar
  90. Simoncelli EP, Heeger DJ (1998) A model of neuronal responses in visual area MT. Vis Res 38(5):743–761CrossRefGoogle Scholar
  91. Simoncelli, E. P., Paninski, L., Pillow, J., Swartz, O. (2004). Characterization of Neural Responses with Stochastic Stimuli. In M. Gazzaniga (Ed.), The Cognitive Neurosciences (3rd ed., pp. 327–338). Cambridge: MIT PressGoogle Scholar
  92. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems Vol 1. Montreal, Canada, Curran Associates, pp 568–576Google Scholar
  93. Sjöström J, Gerstner W (2010) Spike-timing dependent plasticity. Scholarpedia 5(2):1362. Revision #184913CrossRefGoogle Scholar
  94. Szegedy C, Zaremba W, Sutskever I (2013) Intriguing properties of neural networks. arXiv Preprint arXiv …, pp 1–10Google Scholar
  95. Thorpe S (2002) Ultra-Rapid Scene Categorization with a Wave of Spikes. In: Bülthoff H.H., Wallraven C., Lee SW., Poggio T.A. (eds) Biologically Motivated Computer Vision. BMCV 2002. Lecture Notes in Computer Science, vol 2525. Springer, Berlin, HeidelbergGoogle Scholar
  96. Thorpe SJ, Gegenfurtner KR, Fabre-Thorpe M, Bülthoff HH (2001) Detection of animals in natural images using far peripheral vision. European Journal of Neuroscience 14(5):869–876.  https://doi.org/10.1046/j.0953-816X.2001.01717.x
  97. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: ICCV ’15 Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile, IEEE, pp 4489–4497Google Scholar
  98. Treisman A, Gelade G (1980) A feature-integration theory of attention. Cogn Psychol 136:97–136CrossRefGoogle Scholar
  99. Ullman, S., Soloviev, S. (1999). Computation of pattern invariance in brain-like structures. Neural Networks, 12, 1021–1036.Google Scholar
  100. Ullman S, Vidal-Naquet M, Sali E (2002) Visual features of intermediate complexity and their use in classification. Nat Neurosci 5(7):682–687CrossRefGoogle Scholar
  101. van den Hurk J, Van Baelen M, Op de Beeck HP (2017) Development of visual category selectivity in ventral visual cortex does not require visual experience. Proc Natl Acad Sci 114(22):E4501–E4510CrossRefGoogle Scholar
  102. Wallis G (1997) Invariant face and object recognition in the visual system. Prog Neurobiol 51(2):167–194CrossRefGoogle Scholar
  103. Wersing H, Koerner E (2003) Learning optimized features for hierarchical models of invariant recognition. Neural Comput 15(7):1559–1588CrossRefGoogle Scholar
  104. Yamins DLK, Hong H, Cadieu CF, Solomon EA, Seibert D, DiCarlo JJ (2014) Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proc Natl Acad Sci 111(23):8619–8624CrossRefGoogle Scholar
  105. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Computer vision: ECCV 2014. Springer, Berlin, pp 818–833CrossRefGoogle Scholar

Further Reading

  1. Kreiman G (2008) Biological object recognition. Scholarpedia 3(6):2667CrossRefGoogle Scholar
  2. Poggio T, Serre T (2013) Models of visual cortex. Scholarpedia 8(4):3516CrossRefGoogle Scholar

Authors and Affiliations

  1. 1.Department of Cognitive, Linguistic, and Psychological Sciences, Carney Institute for Brain ScienceBrown UniversityProvidenceUSA

Section editors and affiliations

  • Thomas Serre
    • 1
  1. 1.Institute for Brain Sciences, Brown UniversityProvidenceUSA