Advances in Learning Visual Saliency: From Image Primitives to Semantic Contents

  • Qi Zhao
  • Christof Koch


Humans and other primates shift their gaze to allocate processing resources to a subset of the visual input. Understanding and emulating the way that human observers free-view a natural scene has both scientific and economic impact. While previous research focused on low-level image features in saliency, the problem of “semantic gap” has recently attracted attention from vision researchers, and higher-level features have been proposed to fill the gap. Based on various features, machine learning has become a popular computational tool to mine human data in the exploration of how people direct their gaze when inspecting a visual scene. While learning saliency consistently boosts the performance of a saliency model, insights of what is learned inside the black box is also of great interest to both the human vision and computer vision communities. This chapter introduces recent advances in features that determine saliency, reviews related learning methods and insights drawn from learning outcomes, and discusses resources and metrics in saliency prediction.


Receiver Operating Characteristic Saliency Detection Visual Saliency Radial Basis Function Saliency Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    L. Itti, C. Koch, E. Niebur, A model for saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998)CrossRefGoogle Scholar
  2. 2.
    D. Parkhurst, K. Law, E. Niebur, Modeling the role of salience in the allocation of overt visual attention. Vision Res. 42, 107–123 (2002)PubMedCrossRefGoogle Scholar
  3. 3.
    A. Oliva, A. Torralba, M. Castelhano, J. Henderson, Top-down control of visual attention in object detection. In: International Conference on Image Processing, vol I, 2003, pp. 253–256Google Scholar
  4. 4.
    D. Walther, T. Serre, T. Poggio, C. Koch, Modeling feature sharing between object detection and top-down attention. J. Vis. 5, 1041–1041 (2005)CrossRefGoogle Scholar
  5. 5.
    T. Foulsham, G. Underwood, What can saliency models predict about eye movements spatial and sequential aspects of fixations during encoding and recognition. J. Vis. 8, 601–617 (2008)CrossRefGoogle Scholar
  6. 6.
    W. Einhauser, M. Spain, P. Perona, Objects predict fixations better than early saliency. J. Vis. 8(18), 1–26(2008)Google Scholar
  7. 7.
    C. Masciocchi, S. Mihalas, D. Parkhurst, E. Niebur, Everyone knows what is interesting: Salient locations which should be fixated. J. Vis. 9(25), 1–22 (2009)PubMedGoogle Scholar
  8. 8.
    S. Chikkerur, T. Serre, C. Tan, T. Poggio, What and where: a bayesian inference theory of attention. Vision Res. 50, 2233–2247 (2010)PubMedCrossRefGoogle Scholar
  9. 9.
    V. Mahadevan, N. Vasconcelos, Spatiotemporal saliency in highly dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 32, 171–177 (2010)PubMedCrossRefGoogle Scholar
  10. 10.
    P. Reinagel, A. Zador, Natural scene statistics at the center of gaze. Network Comput. Neural Syst. 10, 341–350 (1999)CrossRefGoogle Scholar
  11. 11.
    R. Baddeley, B. Tatler, High frequency edges (but not contrast) predict where we fixate: a bayesian system identification analysis. Vision Res. 46, 2824–2833 (2006)PubMedCrossRefGoogle Scholar
  12. 12.
    G. Krieger, I. Rentschler, G. Hauske, K. Schill, C. Zetzsche, Object and scene analysis by saccadic eye-movements: an investigation with higher-order statistics. Spat. Vis. 13, 201–214 (2000)PubMedCrossRefGoogle Scholar
  13. 13.
    T. Jost, N. Ouerhani, R. von Wartburg, R. Muri, H. Hugli, Assessing the contribution of color in visual attention. Comput. Vis. Image Und. 100, 107–123 (2005)CrossRefGoogle Scholar
  14. 14.
    C. Privitera, L. Stark, Algorithms for defining visual regions-of-interest: comparison with eye fixations. IEEE Trans. Pattern Anal. Mach. Intell. 22, 970–982 (2000)CrossRefGoogle Scholar
  15. 15.
    M. Cerf, E. Frady, C. Koch, Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9(10), :1–15 (2009)Google Scholar
  16. 16.
    T. Judd, K. Ehinger, F. Durand, A. Torralba, Learning to predict where humans look. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  17. 17.
    Q. Zhao, C. Koch, Learning a saliency map using fixated locations in natural scenes. J. Vis. 11(9), :1–15 (2011)Google Scholar
  18. 18.
    Q. Zhao, C. Koch, Learning visual saliency. In: Conference on Information Sciences and Systems, 2011, pp. 1–6Google Scholar
  19. 19.
    Q. Zhao, C. Koch, Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. J. Vis. 12(22), 1–15 (2012)Google Scholar
  20. 20.
    L. Itti, P. Baldi, Bayesian surprise attracts human attention. Adv. Neural Inform. Process. Syst. 19, 547–554 (2006)Google Scholar
  21. 21.
    D. Gao, V. Mahadevan, N. Vasconcelos, The discriminant center-surround hypothesis for bottom-up saliency. In: Advances in Neural Information Processing Systems, 2007, pp. 497–504Google Scholar
  22. 22.
    R. Raj, W. Geisler, R. Frazor, A. Bovik, Contrast statistics for foveated visual systems: fixation selection by minimizing contrast entropy. J. Opt. Soc. Am. A 22, 2039–2049 (2005)CrossRefGoogle Scholar
  23. 23.
    H. Seo, P. Milanfar, Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(15), 1–27 (2009)PubMedGoogle Scholar
  24. 24.
    N. Bruce, J. Tsotsos, Saliency, attention, and visual search: an information theoretic approach. J. Vis. 9, 1–24 (2009)PubMedCrossRefGoogle Scholar
  25. 25.
    A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000)PubMedCrossRefGoogle Scholar
  26. 26.
    D. Field, What is the goal of sensory coding Neural Comput. 6, 559–601 (1994)CrossRefGoogle Scholar
  27. 27.
    W. Wang, Y. Wang, Q. Huang, W. Gao, Measuring visual saliency by site entropy rate. In: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2368–2375Google Scholar
  28. 28.
    T. Avraham, M. Lindenbaum, Esaliency (extended saliency): meaningful attention using stochastic image modeling. IEEE Trans. Pattern Anal. Mach. Intell. 99, 693–708 (2009)Google Scholar
  29. 29.
    J. Harel, C. Koch, P. Perona, Graph-based visual saliency. In: Advances in Neural Information Processing Systems, 2007, pp. 545–552Google Scholar
  30. 30.
    A. Carbone, F. Pirri, Learning saliency. an ica based model using bernoulli mixtures. In Proceedings of Brain Ispired Cognitive Systems, 2010Google Scholar
  31. 31.
    P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol I, 2001, pp. 511–518Google Scholar
  32. 32.
    P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8Google Scholar
  33. 33.
    A. Treisman, G. Gelade, A feature-integration theory of attention. Cognit. Psychol. 12, 97–136 (1980)PubMedCrossRefGoogle Scholar
  34. 34.
    H. Nothdurft, Salience from feature contrast: additivity across dimensions. Vision Res. 40, 1183–1201 (2000)PubMedCrossRefGoogle Scholar
  35. 35.
    S. Onat, K. Libertus, P. König, Integrating audiovisual information for the control of overt attention. J. Vis. 7(11), 1–6 (2007)PubMedCrossRefGoogle Scholar
  36. 36.
    S. Engmann, B. ’t Hart, T. Sieren, S. Onat, P. König, W. Einhäuser, Saliency on a natural scene background: Effects of color and luminance contrast add linearly. Atten. Percept. Psychophys. 71, 1337–1352 (2009)Google Scholar
  37. 37.
    Z. Li, A saliency map in primary visual cortex. Trends Cogn. Sci. 6, 9–16 (2002)PubMedCrossRefGoogle Scholar
  38. 38.
    A. Koene, L. Zhaoping, Feature-specific interactions in salience from combined feature contrasts: evidence for a bottom-up saliency map in v1. J. Vis. 7(6), 1–14 (2007)CrossRefGoogle Scholar
  39. 39.
    L. Itti, C. Koch, Comparison of feature combination strategies for saliency-based visual attention systems. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol 3644, 1999, pp. 473–482Google Scholar
  40. 40.
    Y. Hu, X. Xie, W. Ma, L. Chia, D. Rajan, Salient region detection using weighted feature maps based on the human visual attention model. In: IEEE Pacific-Rim Conference on Multimedia, 2004, pp. 993–1000Google Scholar
  41. 41.
    C. Koch, Biophysics of Computation: Information Processing in Single Neurons (Oxford University Press, New York, 1999)Google Scholar
  42. 42.
    E. Craft, H. Schütze, E. Niebur, R. von der Heydt, A neural model of figure–ground organization. J. Neurophysiol. 97, 4310–4326 (2007)PubMedCrossRefGoogle Scholar
  43. 43.
    S. Mihalas, Y. Dong, R. von der Heydt, E. Niebur, Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. J. Vis. 10, 979–979 (2010)CrossRefGoogle Scholar
  44. 44.
    A. Nuthmann, J. Henderson, Object-based attentional selection in scene viewing. J. Vis. 10(8), 20, 1–19 (2010)Google Scholar
  45. 45.
    G. Edelman, Neural Darwinism: The Theory of Neuronal Group Selection (Basic Books, New York, 1987)Google Scholar
  46. 46.
    K. Friston, G. Tononi, G. Reeke, O. Sporns, G. Edelman, et al. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994)PubMedCrossRefGoogle Scholar
  47. 47.
    W. Einhauser, U. Rutishauser, E. Frady, S. Nadler, P. Konig, C. Koch, The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. J. Vis. 6(1), 1148–1158 (2006)PubMedGoogle Scholar
  48. 48.
    J. Xu, M. Jiang, S. Wang, M. Kankanhalli, Q. Zhao, Predicting human gaze beyond pixels. J. Vis. 14(1), 1–20, Article 28 (2014)Google Scholar
  49. 49.
    B. Russell, A. Torralba, K. Murphy, W. Freeman, Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)CrossRefGoogle Scholar
  50. 50.
    J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255Google Scholar
  51. 51.
    B. Tatler, The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. J. Vis. 7, 1–17 (2007)PubMedCrossRefGoogle Scholar
  52. 52.
    L. Zhang, M. Tong, T. Marks, H. Shan, G. Cottrell, Sun: a bayesian framework for saliency using natural statistics. J. Vis. 8, 1–20 (2008)Google Scholar
  53. 53.
    L. Zhang, M. Tong, G. Cottrell, Sunday: saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st Annual Cognitive Science Conference, 2009, pp. 2944–2949Google Scholar
  54. 54.
    B. Tatler, R. Baddeley, I. Gilchrist, Visual correlates of fixation selection: effects of scale and time. Vision Res. 45, 643–659 (2005)PubMedCrossRefGoogle Scholar
  55. 55.
    F. Schumann, W. Einhauser, J. Vockeroth, K. Bartl, E. Schneider, P. Konig, Salient features in gaze-aligned recordings of human visual input during free exploratoin of natural environments. J. Vis. 8(12), 1–17 (2008)PubMedCrossRefGoogle Scholar
  56. 56.
    F. Cristino, R. Baddeley, The nature of the visual representations involved in eye movements when walking down the street. Vis Cogn. 17, 880–903 (2009)CrossRefGoogle Scholar
  57. 57.
    B. Tatler, M. Hayhoe, M. Land, D. Ballard, Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 1–23 (2011)CrossRefGoogle Scholar
  58. 58.
    R. Peters, A. Iyer, L. Itti, C. Koch, Components of bottom-up gaze allocation in natural images. Vision Res. 45, 2397–2416 (2005)PubMedCrossRefGoogle Scholar
  59. 59.
    J. Xu, Z. Yang, J. Tsien, Emergence of visual saliency from natural scenes via contextmediated probability distributions coding. PLoS One 5, e15796 (2010)PubMedCentralPubMedCrossRefGoogle Scholar
  60. 60.
    V. Yanulevskaya, J. Marsman, F. Cornelissen, J. Geusebroek, An image statistics-based model for fixation prediction. Cogn. Comput. 3, 94–104 (2010)CrossRefGoogle Scholar
  61. 61.
    V. Navalpakkam, L. Itti, Modeling the influence of task on attention. Vision Res. 45, 205–231 (2005)PubMedCrossRefGoogle Scholar
  62. 62.
    W. Kienzle, F. Wichmann, B. Scholkopf, M. Franz, A nonparametric approach to bottom-up visual saliency. In: Advances in Neural Information Processing Systems, 2006, pp. 689–696Google Scholar
  63. 63.
    S. Mihalas, Y. Dong, R. von der Heydt, E. Niebur, Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. Proc. Natl. Acad. Sci. 108, 75–83 (2011)CrossRefGoogle Scholar
  64. 64.
    C. Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4, 219–227 (1985)PubMedGoogle Scholar
  65. 65.
    A. Leventhal, The Neural Basis of Visual Function: Vision and Visual Dysfunction (CRC Press, Boca Raton, 1991)Google Scholar
  66. 66.
    J. Elder, R. Goldberg, Ecological statistics of gestalt laws for the perceptual organization of contours. J. Vis. 2(5), 324–353 (2002)PubMedGoogle Scholar
  67. 67.
    N. Bruce, J. Tsotsos, Saliency based on information maximization. Adv. Neural Inform. Process. Syst. 18, 155 (2006)Google Scholar
  68. 68.
    S. Palmer, Vision Science: Photons to Phenomenology, vol. 1 (MIT Press, Cambridge, 1999)Google Scholar
  69. 69.
    P. Garrard, M. Ralph, J. Hodges, K. Patterson, Prototypicality, distinctiveness, and intercorrelation: analyses of the semantic attributes of living and nonliving concepts. Cogn. Neuropsychol. 18, 125–174 (2001)PubMedCrossRefGoogle Scholar
  70. 70.
    G. Cree, K. McRae, Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J. Exp. Psychol. Gen. 132, 163 (2003)Google Scholar
  71. 71.
    A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009). IEEE (2009), pp. 1778–1785Google Scholar
  72. 72.
    E. Simoncelli, W. Freeman, The steerable pyramid: a flexible architecture for multi-scale derivative computation. In: International Conference on Image Processing, vol III, 1995 pp. 444–447Google Scholar
  73. 73.
    A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)CrossRefGoogle Scholar
  74. 74.
    C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)CrossRefGoogle Scholar
  75. 75.
    R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)Google Scholar
  76. 76.
    Y. Freund, R. Schapire, Game theory, on-line prediction and boosting. In: Conference on Computational Learning Theory, 1996, pp. 325–332Google Scholar
  77. 77.
    R. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)CrossRefGoogle Scholar
  78. 78.
    J. Friedman, T. Hastle, R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Stat. 38, 337–374 (2000)CrossRefGoogle Scholar
  79. 79.
    A. Vezhnevets, V. Vezhnevets, Modest adaboost - teaching adaboost to generalize better. In: Graphicon. (2005)Google Scholar
  80. 80.
    R. Jin, Y. Liu, L. Si, J. Carbonell, A.G. Hauptmann, A new boosting algorithm using input-dependent regularizer. In: International Conference on Machine Learning, 2003Google Scholar
  81. 81.
    P. Khuwuthyakorn, A. Robles-Kelly, J. Zhou, Object of interest detection by saliency learning. In: European Conference on Computer Vision, vol 6312, 2010, pp. 636–649Google Scholar
  82. 82.
    T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H. Shum, Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33, 353–367 (2011)PubMedCrossRefGoogle Scholar
  83. 83.
    J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, 2001, pp. 282–289Google Scholar
  84. 84.
    T. Liu, N. Zheng, W. Ding, Z. Yuan, Video attention: learning to detect a salient object sequence. In: IEEE Conference on Pattern Recognition, 2008, pp. 1–4Google Scholar
  85. 85.
    R. Subramanian, H. Katti, N. Sebe, M. Kankanhalli, T. Chua, An eye fixation database for saliency detection in images. In: European Conference on Computer Vision, vol 6314, 2010, pp. 30–43Google Scholar
  86. 86.
    S. Mannan, C. Kennard, M. Husain, The role of visual salience in directing eye movements in visual object agnosia. Curr. Biol. 19, 247–248 (2009)CrossRefGoogle Scholar
  87. 87.
    L. Nummenmaa, A. Calder, Neural mechanisms of social attention. Trends Cogn. Sci. 13, 135–143 (2009)PubMedCrossRefGoogle Scholar
  88. 88.
    C. Friesen, A. Kingstone, The eyes have it! reflexive orienting is triggered by nonpredictive gaze. Psychon. Bull. Rev. 5, 490–495 (1998)CrossRefGoogle Scholar
  89. 89.
    C. Fowlkes, D. Martin, J. Malik, Local figure–ground cues are valid for natural images. J. Vis. 7(8), 2, 1–9 (2007)Google Scholar
  90. 90.
    P. Lang, M. Bradley, B. Cuthbert, (IAPS): Affective ratings of pictures and instruction manual. Technical Report, University of Florida. (2008)Google Scholar
  91. 91.
    L. Itti, Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 13, 1304–1318 (2004)PubMedCrossRefGoogle Scholar
  92. 92.
    L. Itti, Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn. 12, 1093–1123 (2005)CrossRefGoogle Scholar
  93. 93.
    R. Carmi, L. Itti, The role of memory in guiding attention during natural vision. J. Vis. 6, 898–914 (2006)PubMedCrossRefGoogle Scholar
  94. 94.
    R. Carmi, L. Itti, Visual causes versus correlates of attentional selection in dynamic scenes. Vision Res. 46, 4333–4345 (2006)PubMedCrossRefGoogle Scholar
  95. 95.
    X. Hou, L. Zhang, Dynamic visual attention: searching for coding length increments. In: Advances in Neural Information Processing Systems, 2008Google Scholar
  96. 96.
    D. Green, J. Swets, Signal Detection Theory and Psychophysics (Wiley, New York, 1966)Google Scholar
  97. 97.
    U. Rajashekar, I. van der Linde, A. Bovik, L. Cormack, Gaffe: a gaze-attentive fixation finding engine. IEEE Trans. Image Process. 17, 564–573 (2008)PubMedCrossRefGoogle Scholar
  98. 98.
    U. Rajashekar, L. Cormack, A. Bovik, Point of gaze analysis reveals visual search strategies. In: Proceedings of SPIE Human Vision and Electronic Imaging IX, vol 5292, 2004, pp. 296–306Google Scholar
  99. 99.
    S. Mannan, K. Ruddock, D. Wooding, The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. Spat. Vis. 10, 165–188 (1996)PubMedCrossRefGoogle Scholar
  100. 100.
    J. Henderson, J. Brockmole, M. Castelhano, M. Mack, Visual saliency does not account for eye movements during visual search in real-world scenes, in Eye Movements: A Window on Mind and Brain, ed. by R. van Gompel, M. Fischer, W. Murray, R. Hill (Elsevier, Amsterdam, 2007), pp. 537–562Google Scholar
  101. 101.
    S. Hacisalihzade, J. Allen, L. Stark, Visual perception and sequences of eye movement fixations: a stochastic modelling approach. IEEE Trans. Syst. Man Cybern. 22, 474–481 (1992)CrossRefGoogle Scholar
  102. 102.
    Y. Choi, A. Mosley, L. Stark, String editing analysis of human visual search. Optom. Vis. Sci. 72, 439–451 (1995)PubMedCrossRefGoogle Scholar
  103. 103.
    S.A. Brandt, L.W. Stark, Spontaneous eye movements during visual imagery reflect the content of the visual scene. J. Cogn. Neurosci. 9, 27–38 (1997)PubMedCrossRefGoogle Scholar
  104. 104.
    Y. Rubner, C. Tomasi, L. Guibas, The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 99–121 (2000)CrossRefGoogle Scholar
  105. 105.
    M. Dorr, T. Martinetz, K. Gegenfurtner, E. Barth, Variability of eye movements when viewing dynamic natural scenes. J. Vis. 10(28), 28:1–17 (2010)Google Scholar
  106. 106.
    D. Johnson, S. Sinanovic, Symmetrizing the kullback-leibler distance. Technical Report, Rice University. (2001)Google Scholar
  107. 107.
    M. Clauss, P. Bayerl, H. Neumann, A statistical measure for evaluating regions-of-interest based attention algorithms. In: Pattern Recognition: Lecture Notes in Computer Science, vol 3175, 2004, pp. 383–390Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.National University of SingaporeSingaporeSingapore
  2. 2.California Institute of TechnologyPasadenaUSA
  3. 3.Allen Institute for Brain ScienceSeattleUSA

Personalised recommendations