Skip to main content

Advances in Learning Visual Saliency: From Image Primitives to Semantic Contents

  • Chapter
  • First Online:
Neural Computation, Neural Devices, and Neural Prosthesis

Abstract

Humans and other primates shift their gaze to allocate processing resources to a subset of the visual input. Understanding and emulating the way that human observers free-view a natural scene has both scientific and economic impact. While previous research focused on low-level image features in saliency, the problem of “semantic gap” has recently attracted attention from vision researchers, and higher-level features have been proposed to fill the gap. Based on various features, machine learning has become a popular computational tool to mine human data in the exploration of how people direct their gaze when inspecting a visual scene. While learning saliency consistently boosts the performance of a saliency model, insights of what is learned inside the black box is also of great interest to both the human vision and computer vision communities. This chapter introduces recent advances in features that determine saliency, reviews related learning methods and insights drawn from learning outcomes, and discusses resources and metrics in saliency prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. L. Itti, C. Koch, E. Niebur, A model for saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998)

    Article  Google Scholar 

  2. D. Parkhurst, K. Law, E. Niebur, Modeling the role of salience in the allocation of overt visual attention. Vision Res. 42, 107–123 (2002)

    Article  PubMed  Google Scholar 

  3. A. Oliva, A. Torralba, M. Castelhano, J. Henderson, Top-down control of visual attention in object detection. In: International Conference on Image Processing, vol I, 2003, pp. 253–256

    Google Scholar 

  4. D. Walther, T. Serre, T. Poggio, C. Koch, Modeling feature sharing between object detection and top-down attention. J. Vis. 5, 1041–1041 (2005)

    Article  Google Scholar 

  5. T. Foulsham, G. Underwood, What can saliency models predict about eye movements spatial and sequential aspects of fixations during encoding and recognition. J. Vis. 8, 601–617 (2008)

    Article  Google Scholar 

  6. W. Einhauser, M. Spain, P. Perona, Objects predict fixations better than early saliency. J. Vis. 8(18), 1–26(2008)

    Google Scholar 

  7. C. Masciocchi, S. Mihalas, D. Parkhurst, E. Niebur, Everyone knows what is interesting: Salient locations which should be fixated. J. Vis. 9(25), 1–22 (2009)

    PubMed  Google Scholar 

  8. S. Chikkerur, T. Serre, C. Tan, T. Poggio, What and where: a bayesian inference theory of attention. Vision Res. 50, 2233–2247 (2010)

    Article  PubMed  Google Scholar 

  9. V. Mahadevan, N. Vasconcelos, Spatiotemporal saliency in highly dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 32, 171–177 (2010)

    Article  PubMed  Google Scholar 

  10. P. Reinagel, A. Zador, Natural scene statistics at the center of gaze. Network Comput. Neural Syst. 10, 341–350 (1999)

    Article  CAS  Google Scholar 

  11. R. Baddeley, B. Tatler, High frequency edges (but not contrast) predict where we fixate: a bayesian system identification analysis. Vision Res. 46, 2824–2833 (2006)

    Article  PubMed  Google Scholar 

  12. G. Krieger, I. Rentschler, G. Hauske, K. Schill, C. Zetzsche, Object and scene analysis by saccadic eye-movements: an investigation with higher-order statistics. Spat. Vis. 13, 201–214 (2000)

    Article  CAS  PubMed  Google Scholar 

  13. T. Jost, N. Ouerhani, R. von Wartburg, R. Muri, H. Hugli, Assessing the contribution of color in visual attention. Comput. Vis. Image Und. 100, 107–123 (2005)

    Article  Google Scholar 

  14. C. Privitera, L. Stark, Algorithms for defining visual regions-of-interest: comparison with eye fixations. IEEE Trans. Pattern Anal. Mach. Intell. 22, 970–982 (2000)

    Article  Google Scholar 

  15. M. Cerf, E. Frady, C. Koch, Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9(10), :1–15 (2009)

    Google Scholar 

  16. T. Judd, K. Ehinger, F. Durand, A. Torralba, Learning to predict where humans look. In: IEEE International Conference on Computer Vision (2009)

    Google Scholar 

  17. Q. Zhao, C. Koch, Learning a saliency map using fixated locations in natural scenes. J. Vis. 11(9), :1–15 (2011)

    Google Scholar 

  18. Q. Zhao, C. Koch, Learning visual saliency. In: Conference on Information Sciences and Systems, 2011, pp. 1–6

    Google Scholar 

  19. Q. Zhao, C. Koch, Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. J. Vis. 12(22), 1–15 (2012)

    Google Scholar 

  20. L. Itti, P. Baldi, Bayesian surprise attracts human attention. Adv. Neural Inform. Process. Syst. 19, 547–554 (2006)

    Google Scholar 

  21. D. Gao, V. Mahadevan, N. Vasconcelos, The discriminant center-surround hypothesis for bottom-up saliency. In: Advances in Neural Information Processing Systems, 2007, pp. 497–504

    Google Scholar 

  22. R. Raj, W. Geisler, R. Frazor, A. Bovik, Contrast statistics for foveated visual systems: fixation selection by minimizing contrast entropy. J. Opt. Soc. Am. A 22, 2039–2049 (2005)

    Article  Google Scholar 

  23. H. Seo, P. Milanfar, Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(15), 1–27 (2009)

    PubMed  Google Scholar 

  24. N. Bruce, J. Tsotsos, Saliency, attention, and visual search: an information theoretic approach. J. Vis. 9, 1–24 (2009)

    Article  PubMed  Google Scholar 

  25. A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000)

    Article  CAS  PubMed  Google Scholar 

  26. D. Field, What is the goal of sensory coding Neural Comput. 6, 559–601 (1994)

    Article  Google Scholar 

  27. W. Wang, Y. Wang, Q. Huang, W. Gao, Measuring visual saliency by site entropy rate. In: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2368–2375

    Google Scholar 

  28. T. Avraham, M. Lindenbaum, Esaliency (extended saliency): meaningful attention using stochastic image modeling. IEEE Trans. Pattern Anal. Mach. Intell. 99, 693–708 (2009)

    Google Scholar 

  29. J. Harel, C. Koch, P. Perona, Graph-based visual saliency. In: Advances in Neural Information Processing Systems, 2007, pp. 545–552

    Google Scholar 

  30. A. Carbone, F. Pirri, Learning saliency. an ica based model using bernoulli mixtures. In Proceedings of Brain Ispired Cognitive Systems, 2010

    Google Scholar 

  31. P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol I, 2001, pp. 511–518

    Google Scholar 

  32. P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8

    Google Scholar 

  33. A. Treisman, G. Gelade, A feature-integration theory of attention. Cognit. Psychol. 12, 97–136 (1980)

    Article  CAS  PubMed  Google Scholar 

  34. H. Nothdurft, Salience from feature contrast: additivity across dimensions. Vision Res. 40, 1183–1201 (2000)

    Article  CAS  PubMed  Google Scholar 

  35. S. Onat, K. Libertus, P. König, Integrating audiovisual information for the control of overt attention. J. Vis. 7(11), 1–6 (2007)

    Article  PubMed  Google Scholar 

  36. S. Engmann, B. ’t Hart, T. Sieren, S. Onat, P. König, W. Einhäuser, Saliency on a natural scene background: Effects of color and luminance contrast add linearly. Atten. Percept. Psychophys. 71, 1337–1352 (2009)

    Google Scholar 

  37. Z. Li, A saliency map in primary visual cortex. Trends Cogn. Sci. 6, 9–16 (2002)

    Article  PubMed  Google Scholar 

  38. A. Koene, L. Zhaoping, Feature-specific interactions in salience from combined feature contrasts: evidence for a bottom-up saliency map in v1. J. Vis. 7(6), 1–14 (2007)

    Article  Google Scholar 

  39. L. Itti, C. Koch, Comparison of feature combination strategies for saliency-based visual attention systems. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol 3644, 1999, pp. 473–482

    Google Scholar 

  40. Y. Hu, X. Xie, W. Ma, L. Chia, D. Rajan, Salient region detection using weighted feature maps based on the human visual attention model. In: IEEE Pacific-Rim Conference on Multimedia, 2004, pp. 993–1000

    Google Scholar 

  41. C. Koch, Biophysics of Computation: Information Processing in Single Neurons (Oxford University Press, New York, 1999)

    Google Scholar 

  42. E. Craft, H. Schütze, E. Niebur, R. von der Heydt, A neural model of figure–ground organization. J. Neurophysiol. 97, 4310–4326 (2007)

    Article  PubMed  Google Scholar 

  43. S. Mihalas, Y. Dong, R. von der Heydt, E. Niebur, Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. J. Vis. 10, 979–979 (2010)

    Article  Google Scholar 

  44. A. Nuthmann, J. Henderson, Object-based attentional selection in scene viewing. J. Vis. 10(8), 20, 1–19 (2010)

    Google Scholar 

  45. G. Edelman, Neural Darwinism: The Theory of Neuronal Group Selection (Basic Books, New York, 1987)

    Google Scholar 

  46. K. Friston, G. Tononi, G. Reeke, O. Sporns, G. Edelman, et al. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994)

    Article  CAS  PubMed  Google Scholar 

  47. W. Einhauser, U. Rutishauser, E. Frady, S. Nadler, P. Konig, C. Koch, The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. J. Vis. 6(1), 1148–1158 (2006)

    PubMed  Google Scholar 

  48. J. Xu, M. Jiang, S. Wang, M. Kankanhalli, Q. Zhao, Predicting human gaze beyond pixels. J. Vis. 14(1), 1–20, Article 28 (2014)

    Google Scholar 

  49. B. Russell, A. Torralba, K. Murphy, W. Freeman, Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)

    Article  Google Scholar 

  50. J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255

    Google Scholar 

  51. B. Tatler, The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. J. Vis. 7, 1–17 (2007)

    Article  PubMed  Google Scholar 

  52. L. Zhang, M. Tong, T. Marks, H. Shan, G. Cottrell, Sun: a bayesian framework for saliency using natural statistics. J. Vis. 8, 1–20 (2008)

    CAS  Google Scholar 

  53. L. Zhang, M. Tong, G. Cottrell, Sunday: saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st Annual Cognitive Science Conference, 2009, pp. 2944–2949

    Google Scholar 

  54. B. Tatler, R. Baddeley, I. Gilchrist, Visual correlates of fixation selection: effects of scale and time. Vision Res. 45, 643–659 (2005)

    Article  PubMed  Google Scholar 

  55. F. Schumann, W. Einhauser, J. Vockeroth, K. Bartl, E. Schneider, P. Konig, Salient features in gaze-aligned recordings of human visual input during free exploratoin of natural environments. J. Vis. 8(12), 1–17 (2008)

    Article  PubMed  Google Scholar 

  56. F. Cristino, R. Baddeley, The nature of the visual representations involved in eye movements when walking down the street. Vis Cogn. 17, 880–903 (2009)

    Article  Google Scholar 

  57. B. Tatler, M. Hayhoe, M. Land, D. Ballard, Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 1–23 (2011)

    Article  Google Scholar 

  58. R. Peters, A. Iyer, L. Itti, C. Koch, Components of bottom-up gaze allocation in natural images. Vision Res. 45, 2397–2416 (2005)

    Article  PubMed  Google Scholar 

  59. J. Xu, Z. Yang, J. Tsien, Emergence of visual saliency from natural scenes via contextmediated probability distributions coding. PLoS One 5, e15796 (2010)

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  60. V. Yanulevskaya, J. Marsman, F. Cornelissen, J. Geusebroek, An image statistics-based model for fixation prediction. Cogn. Comput. 3, 94–104 (2010)

    Article  Google Scholar 

  61. V. Navalpakkam, L. Itti, Modeling the influence of task on attention. Vision Res. 45, 205–231 (2005)

    Article  PubMed  Google Scholar 

  62. W. Kienzle, F. Wichmann, B. Scholkopf, M. Franz, A nonparametric approach to bottom-up visual saliency. In: Advances in Neural Information Processing Systems, 2006, pp. 689–696

    Google Scholar 

  63. S. Mihalas, Y. Dong, R. von der Heydt, E. Niebur, Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. Proc. Natl. Acad. Sci. 108, 75–83 (2011)

    Article  Google Scholar 

  64. C. Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4, 219–227 (1985)

    CAS  PubMed  Google Scholar 

  65. A. Leventhal, The Neural Basis of Visual Function: Vision and Visual Dysfunction (CRC Press, Boca Raton, 1991)

    Google Scholar 

  66. J. Elder, R. Goldberg, Ecological statistics of gestalt laws for the perceptual organization of contours. J. Vis. 2(5), 324–353 (2002)

    PubMed  Google Scholar 

  67. N. Bruce, J. Tsotsos, Saliency based on information maximization. Adv. Neural Inform. Process. Syst. 18, 155 (2006)

    Google Scholar 

  68. S. Palmer, Vision Science: Photons to Phenomenology, vol. 1 (MIT Press, Cambridge, 1999)

    Google Scholar 

  69. P. Garrard, M. Ralph, J. Hodges, K. Patterson, Prototypicality, distinctiveness, and intercorrelation: analyses of the semantic attributes of living and nonliving concepts. Cogn. Neuropsychol. 18, 125–174 (2001)

    Article  CAS  PubMed  Google Scholar 

  70. G. Cree, K. McRae, Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J. Exp. Psychol. Gen. 132, 163 (2003)

    Google Scholar 

  71. A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009). IEEE (2009), pp. 1778–1785

    Google Scholar 

  72. E. Simoncelli, W. Freeman, The steerable pyramid: a flexible architecture for multi-scale derivative computation. In: International Conference on Image Processing, vol III, 1995 pp. 444–447

    Google Scholar 

  73. A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)

    Article  Google Scholar 

  74. C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)

    Article  Google Scholar 

  75. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)

    Google Scholar 

  76. Y. Freund, R. Schapire, Game theory, on-line prediction and boosting. In: Conference on Computational Learning Theory, 1996, pp. 325–332

    Google Scholar 

  77. R. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)

    Article  Google Scholar 

  78. J. Friedman, T. Hastle, R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Stat. 38, 337–374 (2000)

    Article  Google Scholar 

  79. A. Vezhnevets, V. Vezhnevets, Modest adaboost - teaching adaboost to generalize better. In: Graphicon. (2005)

    Google Scholar 

  80. R. Jin, Y. Liu, L. Si, J. Carbonell, A.G. Hauptmann, A new boosting algorithm using input-dependent regularizer. In: International Conference on Machine Learning, 2003

    Google Scholar 

  81. P. Khuwuthyakorn, A. Robles-Kelly, J. Zhou, Object of interest detection by saliency learning. In: European Conference on Computer Vision, vol 6312, 2010, pp. 636–649

    Google Scholar 

  82. T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H. Shum, Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33, 353–367 (2011)

    Article  PubMed  Google Scholar 

  83. J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, 2001, pp. 282–289

    Google Scholar 

  84. T. Liu, N. Zheng, W. Ding, Z. Yuan, Video attention: learning to detect a salient object sequence. In: IEEE Conference on Pattern Recognition, 2008, pp. 1–4

    Google Scholar 

  85. R. Subramanian, H. Katti, N. Sebe, M. Kankanhalli, T. Chua, An eye fixation database for saliency detection in images. In: European Conference on Computer Vision, vol 6314, 2010, pp. 30–43

    Google Scholar 

  86. S. Mannan, C. Kennard, M. Husain, The role of visual salience in directing eye movements in visual object agnosia. Curr. Biol. 19, 247–248 (2009)

    Article  Google Scholar 

  87. L. Nummenmaa, A. Calder, Neural mechanisms of social attention. Trends Cogn. Sci. 13, 135–143 (2009)

    Article  PubMed  Google Scholar 

  88. C. Friesen, A. Kingstone, The eyes have it! reflexive orienting is triggered by nonpredictive gaze. Psychon. Bull. Rev. 5, 490–495 (1998)

    Article  Google Scholar 

  89. C. Fowlkes, D. Martin, J. Malik, Local figure–ground cues are valid for natural images. J. Vis. 7(8), 2, 1–9 (2007)

    Google Scholar 

  90. P. Lang, M. Bradley, B. Cuthbert, (IAPS): Affective ratings of pictures and instruction manual. Technical Report, University of Florida. (2008)

    Google Scholar 

  91. L. Itti, Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 13, 1304–1318 (2004)

    Article  PubMed  Google Scholar 

  92. L. Itti, Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn. 12, 1093–1123 (2005)

    Article  Google Scholar 

  93. R. Carmi, L. Itti, The role of memory in guiding attention during natural vision. J. Vis. 6, 898–914 (2006)

    Article  PubMed  Google Scholar 

  94. R. Carmi, L. Itti, Visual causes versus correlates of attentional selection in dynamic scenes. Vision Res. 46, 4333–4345 (2006)

    Article  PubMed  Google Scholar 

  95. X. Hou, L. Zhang, Dynamic visual attention: searching for coding length increments. In: Advances in Neural Information Processing Systems, 2008

    Google Scholar 

  96. D. Green, J. Swets, Signal Detection Theory and Psychophysics (Wiley, New York, 1966)

    Google Scholar 

  97. U. Rajashekar, I. van der Linde, A. Bovik, L. Cormack, Gaffe: a gaze-attentive fixation finding engine. IEEE Trans. Image Process. 17, 564–573 (2008)

    Article  CAS  PubMed  Google Scholar 

  98. U. Rajashekar, L. Cormack, A. Bovik, Point of gaze analysis reveals visual search strategies. In: Proceedings of SPIE Human Vision and Electronic Imaging IX, vol 5292, 2004, pp. 296–306

    Google Scholar 

  99. S. Mannan, K. Ruddock, D. Wooding, The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. Spat. Vis. 10, 165–188 (1996)

    Article  CAS  PubMed  Google Scholar 

  100. J. Henderson, J. Brockmole, M. Castelhano, M. Mack, Visual saliency does not account for eye movements during visual search in real-world scenes, in Eye Movements: A Window on Mind and Brain, ed. by R. van Gompel, M. Fischer, W. Murray, R. Hill (Elsevier, Amsterdam, 2007), pp. 537–562

    Google Scholar 

  101. S. Hacisalihzade, J. Allen, L. Stark, Visual perception and sequences of eye movement fixations: a stochastic modelling approach. IEEE Trans. Syst. Man Cybern. 22, 474–481 (1992)

    Article  Google Scholar 

  102. Y. Choi, A. Mosley, L. Stark, String editing analysis of human visual search. Optom. Vis. Sci. 72, 439–451 (1995)

    Article  CAS  PubMed  Google Scholar 

  103. S.A. Brandt, L.W. Stark, Spontaneous eye movements during visual imagery reflect the content of the visual scene. J. Cogn. Neurosci. 9, 27–38 (1997)

    Article  CAS  PubMed  Google Scholar 

  104. Y. Rubner, C. Tomasi, L. Guibas, The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 99–121 (2000)

    Article  Google Scholar 

  105. M. Dorr, T. Martinetz, K. Gegenfurtner, E. Barth, Variability of eye movements when viewing dynamic natural scenes. J. Vis. 10(28), 28:1–17 (2010)

    Google Scholar 

  106. D. Johnson, S. Sinanovic, Symmetrizing the kullback-leibler distance. Technical Report, Rice University. (2001)

    Google Scholar 

  107. M. Clauss, P. Bayerl, H. Neumann, A statistical measure for evaluating regions-of-interest based attention algorithms. In: Pattern Recognition: Lecture Notes in Computer Science, vol 3175, 2004, pp. 383–390

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christof Koch .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Zhao, Q., Koch, C. (2014). Advances in Learning Visual Saliency: From Image Primitives to Semantic Contents. In: Yang, Z. (eds) Neural Computation, Neural Devices, and Neural Prosthesis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8151-5_14

Download citation

Publish with us

Policies and ethics