Advances in Learning Visual Saliency: From Image Primitives to Semantic Contents

Zhao, Qi; Koch, Christof

doi:10.1007/978-1-4614-8151-5_14

Qi Zhao² &
Christof Koch^3,4

1661 Accesses
2 Citations

Abstract

Humans and other primates shift their gaze to allocate processing resources to a subset of the visual input. Understanding and emulating the way that human observers free-view a natural scene has both scientific and economic impact. While previous research focused on low-level image features in saliency, the problem of “semantic gap” has recently attracted attention from vision researchers, and higher-level features have been proposed to fill the gap. Based on various features, machine learning has become a popular computational tool to mine human data in the exploration of how people direct their gaze when inspecting a visual scene. While learning saliency consistently boosts the performance of a saliency model, insights of what is learned inside the black box is also of great interest to both the human vision and computer vision communities. This chapter introduces recent advances in features that determine saliency, reviews related learning methods and insights drawn from learning outcomes, and discusses resources and metrics in saliency prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

L. Itti, C. Koch, E. Niebur, A model for saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20, 1254–1259 (1998)
Article Google Scholar
D. Parkhurst, K. Law, E. Niebur, Modeling the role of salience in the allocation of overt visual attention. Vision Res. 42, 107–123 (2002)
Article PubMed Google Scholar
A. Oliva, A. Torralba, M. Castelhano, J. Henderson, Top-down control of visual attention in object detection. In: International Conference on Image Processing, vol I, 2003, pp. 253–256
Google Scholar
D. Walther, T. Serre, T. Poggio, C. Koch, Modeling feature sharing between object detection and top-down attention. J. Vis. 5, 1041–1041 (2005)
Article Google Scholar
T. Foulsham, G. Underwood, What can saliency models predict about eye movements spatial and sequential aspects of fixations during encoding and recognition. J. Vis. 8, 601–617 (2008)
Article Google Scholar
W. Einhauser, M. Spain, P. Perona, Objects predict fixations better than early saliency. J. Vis. 8(18), 1–26(2008)
Google Scholar
C. Masciocchi, S. Mihalas, D. Parkhurst, E. Niebur, Everyone knows what is interesting: Salient locations which should be fixated. J. Vis. 9(25), 1–22 (2009)
PubMed Google Scholar
S. Chikkerur, T. Serre, C. Tan, T. Poggio, What and where: a bayesian inference theory of attention. Vision Res. 50, 2233–2247 (2010)
Article PubMed Google Scholar
V. Mahadevan, N. Vasconcelos, Spatiotemporal saliency in highly dynamic scenes. IEEE Trans. Pattern Anal. Mach. Intell. 32, 171–177 (2010)
Article PubMed Google Scholar
P. Reinagel, A. Zador, Natural scene statistics at the center of gaze. Network Comput. Neural Syst. 10, 341–350 (1999)
Article CAS Google Scholar
R. Baddeley, B. Tatler, High frequency edges (but not contrast) predict where we fixate: a bayesian system identification analysis. Vision Res. 46, 2824–2833 (2006)
Article PubMed Google Scholar
G. Krieger, I. Rentschler, G. Hauske, K. Schill, C. Zetzsche, Object and scene analysis by saccadic eye-movements: an investigation with higher-order statistics. Spat. Vis. 13, 201–214 (2000)
Article CAS PubMed Google Scholar
T. Jost, N. Ouerhani, R. von Wartburg, R. Muri, H. Hugli, Assessing the contribution of color in visual attention. Comput. Vis. Image Und. 100, 107–123 (2005)
Article Google Scholar
C. Privitera, L. Stark, Algorithms for defining visual regions-of-interest: comparison with eye fixations. IEEE Trans. Pattern Anal. Mach. Intell. 22, 970–982 (2000)
Article Google Scholar
M. Cerf, E. Frady, C. Koch, Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9(10), :1–15 (2009)
Google Scholar
T. Judd, K. Ehinger, F. Durand, A. Torralba, Learning to predict where humans look. In: IEEE International Conference on Computer Vision (2009)
Google Scholar
Q. Zhao, C. Koch, Learning a saliency map using fixated locations in natural scenes. J. Vis. 11(9), :1–15 (2011)
Google Scholar
Q. Zhao, C. Koch, Learning visual saliency. In: Conference on Information Sciences and Systems, 2011, pp. 1–6
Google Scholar
Q. Zhao, C. Koch, Learning visual saliency by combining feature maps in a nonlinear manner using adaboost. J. Vis. 12(22), 1–15 (2012)
Google Scholar
L. Itti, P. Baldi, Bayesian surprise attracts human attention. Adv. Neural Inform. Process. Syst. 19, 547–554 (2006)
Google Scholar
D. Gao, V. Mahadevan, N. Vasconcelos, The discriminant center-surround hypothesis for bottom-up saliency. In: Advances in Neural Information Processing Systems, 2007, pp. 497–504
Google Scholar
R. Raj, W. Geisler, R. Frazor, A. Bovik, Contrast statistics for foveated visual systems: fixation selection by minimizing contrast entropy. J. Opt. Soc. Am. A 22, 2039–2049 (2005)
Article Google Scholar
H. Seo, P. Milanfar, Static and space-time visual saliency detection by self-resemblance. J. Vis. 9(15), 1–27 (2009)
PubMed Google Scholar
N. Bruce, J. Tsotsos, Saliency, attention, and visual search: an information theoretic approach. J. Vis. 9, 1–24 (2009)
Article PubMed Google Scholar
A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000)
Article CAS PubMed Google Scholar
D. Field, What is the goal of sensory coding Neural Comput. 6, 559–601 (1994)
Article Google Scholar
W. Wang, Y. Wang, Q. Huang, W. Gao, Measuring visual saliency by site entropy rate. In: IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 2368–2375
Google Scholar
T. Avraham, M. Lindenbaum, Esaliency (extended saliency): meaningful attention using stochastic image modeling. IEEE Trans. Pattern Anal. Mach. Intell. 99, 693–708 (2009)
Google Scholar
J. Harel, C. Koch, P. Perona, Graph-based visual saliency. In: Advances in Neural Information Processing Systems, 2007, pp. 545–552
Google Scholar
A. Carbone, F. Pirri, Learning saliency. an ica based model using bernoulli mixtures. In Proceedings of Brain Ispired Cognitive Systems, 2010
Google Scholar
P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol I, 2001, pp. 511–518
Google Scholar
P. Felzenszwalb, D. McAllester, D. Ramanan, A discriminatively trained, multiscale, deformable part model. In: IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 1–8
Google Scholar
A. Treisman, G. Gelade, A feature-integration theory of attention. Cognit. Psychol. 12, 97–136 (1980)
Article CAS PubMed Google Scholar
H. Nothdurft, Salience from feature contrast: additivity across dimensions. Vision Res. 40, 1183–1201 (2000)
Article CAS PubMed Google Scholar
S. Onat, K. Libertus, P. König, Integrating audiovisual information for the control of overt attention. J. Vis. 7(11), 1–6 (2007)
Article PubMed Google Scholar
S. Engmann, B. ’t Hart, T. Sieren, S. Onat, P. König, W. Einhäuser, Saliency on a natural scene background: Effects of color and luminance contrast add linearly. Atten. Percept. Psychophys. 71, 1337–1352 (2009)
Google Scholar
Z. Li, A saliency map in primary visual cortex. Trends Cogn. Sci. 6, 9–16 (2002)
Article PubMed Google Scholar
A. Koene, L. Zhaoping, Feature-specific interactions in salience from combined feature contrasts: evidence for a bottom-up saliency map in v1. J. Vis. 7(6), 1–14 (2007)
Article Google Scholar
L. Itti, C. Koch, Comparison of feature combination strategies for saliency-based visual attention systems. In: Proceedings of SPIE Human Vision and Electronic Imaging, vol 3644, 1999, pp. 473–482
Google Scholar
Y. Hu, X. Xie, W. Ma, L. Chia, D. Rajan, Salient region detection using weighted feature maps based on the human visual attention model. In: IEEE Pacific-Rim Conference on Multimedia, 2004, pp. 993–1000
Google Scholar
C. Koch, Biophysics of Computation: Information Processing in Single Neurons (Oxford University Press, New York, 1999)
Google Scholar
E. Craft, H. Schütze, E. Niebur, R. von der Heydt, A neural model of figure–ground organization. J. Neurophysiol. 97, 4310–4326 (2007)
Article PubMed Google Scholar
S. Mihalas, Y. Dong, R. von der Heydt, E. Niebur, Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. J. Vis. 10, 979–979 (2010)
Article Google Scholar
A. Nuthmann, J. Henderson, Object-based attentional selection in scene viewing. J. Vis. 10(8), 20, 1–19 (2010)
Google Scholar
G. Edelman, Neural Darwinism: The Theory of Neuronal Group Selection (Basic Books, New York, 1987)
Google Scholar
K. Friston, G. Tononi, G. Reeke, O. Sporns, G. Edelman, et al. Value-dependent selection in the brain: simulation in a synthetic neural model. Neuroscience 59, 229–243 (1994)
Article CAS PubMed Google Scholar
W. Einhauser, U. Rutishauser, E. Frady, S. Nadler, P. Konig, C. Koch, The relation of phase noise and luminance contrast to overt attention in complex visual stimuli. J. Vis. 6(1), 1148–1158 (2006)
PubMed Google Scholar
J. Xu, M. Jiang, S. Wang, M. Kankanhalli, Q. Zhao, Predicting human gaze beyond pixels. J. Vis. 14(1), 1–20, Article 28 (2014)
Google Scholar
B. Russell, A. Torralba, K. Murphy, W. Freeman, Labelme: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008)
Article Google Scholar
J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255
Google Scholar
B. Tatler, The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. J. Vis. 7, 1–17 (2007)
Article PubMed Google Scholar
L. Zhang, M. Tong, T. Marks, H. Shan, G. Cottrell, Sun: a bayesian framework for saliency using natural statistics. J. Vis. 8, 1–20 (2008)
CAS Google Scholar
L. Zhang, M. Tong, G. Cottrell, Sunday: saliency using natural statistics for dynamic analysis of scenes. In: Proceedings of the 31st Annual Cognitive Science Conference, 2009, pp. 2944–2949
Google Scholar
B. Tatler, R. Baddeley, I. Gilchrist, Visual correlates of fixation selection: effects of scale and time. Vision Res. 45, 643–659 (2005)
Article PubMed Google Scholar
F. Schumann, W. Einhauser, J. Vockeroth, K. Bartl, E. Schneider, P. Konig, Salient features in gaze-aligned recordings of human visual input during free exploratoin of natural environments. J. Vis. 8(12), 1–17 (2008)
Article PubMed Google Scholar
F. Cristino, R. Baddeley, The nature of the visual representations involved in eye movements when walking down the street. Vis Cogn. 17, 880–903 (2009)
Article Google Scholar
B. Tatler, M. Hayhoe, M. Land, D. Ballard, Eye guidance in natural vision: reinterpreting salience. J. Vis. 11(5), 1–23 (2011)
Article Google Scholar
R. Peters, A. Iyer, L. Itti, C. Koch, Components of bottom-up gaze allocation in natural images. Vision Res. 45, 2397–2416 (2005)
Article PubMed Google Scholar
J. Xu, Z. Yang, J. Tsien, Emergence of visual saliency from natural scenes via contextmediated probability distributions coding. PLoS One 5, e15796 (2010)
Article CAS PubMed Central PubMed Google Scholar
V. Yanulevskaya, J. Marsman, F. Cornelissen, J. Geusebroek, An image statistics-based model for fixation prediction. Cogn. Comput. 3, 94–104 (2010)
Article Google Scholar
V. Navalpakkam, L. Itti, Modeling the influence of task on attention. Vision Res. 45, 205–231 (2005)
Article PubMed Google Scholar
W. Kienzle, F. Wichmann, B. Scholkopf, M. Franz, A nonparametric approach to bottom-up visual saliency. In: Advances in Neural Information Processing Systems, 2006, pp. 689–696
Google Scholar
S. Mihalas, Y. Dong, R. von der Heydt, E. Niebur, Mechanisms of perceptual organization provide auto-zoom and auto-localization for attention to objects. Proc. Natl. Acad. Sci. 108, 75–83 (2011)
Article Google Scholar
C. Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4, 219–227 (1985)
CAS PubMed Google Scholar
A. Leventhal, The Neural Basis of Visual Function: Vision and Visual Dysfunction (CRC Press, Boca Raton, 1991)
Google Scholar
J. Elder, R. Goldberg, Ecological statistics of gestalt laws for the perceptual organization of contours. J. Vis. 2(5), 324–353 (2002)
PubMed Google Scholar
N. Bruce, J. Tsotsos, Saliency based on information maximization. Adv. Neural Inform. Process. Syst. 18, 155 (2006)
Google Scholar
S. Palmer, Vision Science: Photons to Phenomenology, vol. 1 (MIT Press, Cambridge, 1999)
Google Scholar
P. Garrard, M. Ralph, J. Hodges, K. Patterson, Prototypicality, distinctiveness, and intercorrelation: analyses of the semantic attributes of living and nonliving concepts. Cogn. Neuropsychol. 18, 125–174 (2001)
Article CAS PubMed Google Scholar
G. Cree, K. McRae, Analyzing the factors underlying the structure and computation of the meaning of chipmunk, cherry, chisel, cheese, and cello (and many other such concrete nouns). J. Exp. Psychol. Gen. 132, 163 (2003)
Google Scholar
A. Farhadi, I. Endres, D. Hoiem, D. Forsyth, Describing objects by their attributes. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009 (CVPR 2009). IEEE (2009), pp. 1778–1785
Google Scholar
E. Simoncelli, W. Freeman, The steerable pyramid: a flexible architecture for multi-scale derivative computation. In: International Conference on Image Processing, vol III, 1995 pp. 444–447
Google Scholar
A. Oliva, A. Torralba, Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Article Google Scholar
C. Burges, A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Disc. 2, 121–167 (1998)
Article Google Scholar
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, C.-J. Lin, Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Google Scholar
Y. Freund, R. Schapire, Game theory, on-line prediction and boosting. In: Conference on Computational Learning Theory, 1996, pp. 325–332
Google Scholar
R. Schapire, Y. Singer, Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)
Article Google Scholar
J. Friedman, T. Hastle, R. Tibshirani, Additive logistic regression: a statistical view of boosting. Ann. Stat. 38, 337–374 (2000)
Article Google Scholar
A. Vezhnevets, V. Vezhnevets, Modest adaboost - teaching adaboost to generalize better. In: Graphicon. (2005)
Google Scholar
R. Jin, Y. Liu, L. Si, J. Carbonell, A.G. Hauptmann, A new boosting algorithm using input-dependent regularizer. In: International Conference on Machine Learning, 2003
Google Scholar
P. Khuwuthyakorn, A. Robles-Kelly, J. Zhou, Object of interest detection by saliency learning. In: European Conference on Computer Vision, vol 6312, 2010, pp. 636–649
Google Scholar
T. Liu, Z. Yuan, J. Sun, J. Wang, N. Zheng, X. Tang, H. Shum, Learning to detect a salient object. IEEE Trans. Pattern Anal. Mach. Intell. 33, 353–367 (2011)
Article PubMed Google Scholar
J. Lafferty, A. McCallum, F. Pereira, Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: International Conference on Machine Learning, 2001, pp. 282–289
Google Scholar
T. Liu, N. Zheng, W. Ding, Z. Yuan, Video attention: learning to detect a salient object sequence. In: IEEE Conference on Pattern Recognition, 2008, pp. 1–4
Google Scholar
R. Subramanian, H. Katti, N. Sebe, M. Kankanhalli, T. Chua, An eye fixation database for saliency detection in images. In: European Conference on Computer Vision, vol 6314, 2010, pp. 30–43
Google Scholar
S. Mannan, C. Kennard, M. Husain, The role of visual salience in directing eye movements in visual object agnosia. Curr. Biol. 19, 247–248 (2009)
Article Google Scholar
L. Nummenmaa, A. Calder, Neural mechanisms of social attention. Trends Cogn. Sci. 13, 135–143 (2009)
Article PubMed Google Scholar
C. Friesen, A. Kingstone, The eyes have it! reflexive orienting is triggered by nonpredictive gaze. Psychon. Bull. Rev. 5, 490–495 (1998)
Article Google Scholar
C. Fowlkes, D. Martin, J. Malik, Local figure–ground cues are valid for natural images. J. Vis. 7(8), 2, 1–9 (2007)
Google Scholar
P. Lang, M. Bradley, B. Cuthbert, (IAPS): Affective ratings of pictures and instruction manual. Technical Report, University of Florida. (2008)
Google Scholar
L. Itti, Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 13, 1304–1318 (2004)
Article PubMed Google Scholar
L. Itti, Quantifying the contribution of low-level saliency to human eye movements in dynamic scenes. Vis. Cogn. 12, 1093–1123 (2005)
Article Google Scholar
R. Carmi, L. Itti, The role of memory in guiding attention during natural vision. J. Vis. 6, 898–914 (2006)
Article PubMed Google Scholar
R. Carmi, L. Itti, Visual causes versus correlates of attentional selection in dynamic scenes. Vision Res. 46, 4333–4345 (2006)
Article PubMed Google Scholar
X. Hou, L. Zhang, Dynamic visual attention: searching for coding length increments. In: Advances in Neural Information Processing Systems, 2008
Google Scholar
D. Green, J. Swets, Signal Detection Theory and Psychophysics (Wiley, New York, 1966)
Google Scholar
U. Rajashekar, I. van der Linde, A. Bovik, L. Cormack, Gaffe: a gaze-attentive fixation finding engine. IEEE Trans. Image Process. 17, 564–573 (2008)
Article CAS PubMed Google Scholar
U. Rajashekar, L. Cormack, A. Bovik, Point of gaze analysis reveals visual search strategies. In: Proceedings of SPIE Human Vision and Electronic Imaging IX, vol 5292, 2004, pp. 296–306
Google Scholar
S. Mannan, K. Ruddock, D. Wooding, The relationship between the locations of spatial features and those of fixations made during visual examination of briefly presented images. Spat. Vis. 10, 165–188 (1996)
Article CAS PubMed Google Scholar
J. Henderson, J. Brockmole, M. Castelhano, M. Mack, Visual saliency does not account for eye movements during visual search in real-world scenes, in Eye Movements: A Window on Mind and Brain, ed. by R. van Gompel, M. Fischer, W. Murray, R. Hill (Elsevier, Amsterdam, 2007), pp. 537–562
Google Scholar
S. Hacisalihzade, J. Allen, L. Stark, Visual perception and sequences of eye movement fixations: a stochastic modelling approach. IEEE Trans. Syst. Man Cybern. 22, 474–481 (1992)
Article Google Scholar
Y. Choi, A. Mosley, L. Stark, String editing analysis of human visual search. Optom. Vis. Sci. 72, 439–451 (1995)
Article CAS PubMed Google Scholar
S.A. Brandt, L.W. Stark, Spontaneous eye movements during visual imagery reflect the content of the visual scene. J. Cogn. Neurosci. 9, 27–38 (1997)
Article CAS PubMed Google Scholar
Y. Rubner, C. Tomasi, L. Guibas, The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40, 99–121 (2000)
Article Google Scholar
M. Dorr, T. Martinetz, K. Gegenfurtner, E. Barth, Variability of eye movements when viewing dynamic natural scenes. J. Vis. 10(28), 28:1–17 (2010)
Google Scholar
D. Johnson, S. Sinanovic, Symmetrizing the kullback-leibler distance. Technical Report, Rice University. (2001)
Google Scholar
M. Clauss, P. Bayerl, H. Neumann, A statistical measure for evaluating regions-of-interest based attention algorithms. In: Pattern Recognition: Lecture Notes in Computer Science, vol 3175, 2004, pp. 383–390
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Singapore, Singapore, Singapore
Qi Zhao
California Institute of Technology, Pasadena, CA, USA
Christof Koch
Allen Institute for Brain Science, Seattle, WA, USA
Christof Koch

Authors

Qi Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Christof Koch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christof Koch .

Editor information

Editors and Affiliations

National University of Singapore, Singapore, Singapore
Zhi Yang

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhao, Q., Koch, C. (2014). Advances in Learning Visual Saliency: From Image Primitives to Semantic Contents. In: Yang, Z. (eds) Neural Computation, Neural Devices, and Neural Prosthesis. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8151-5_14

Download citation

DOI: https://doi.org/10.1007/978-1-4614-8151-5_14
Published: 04 March 2014
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8150-8
Online ISBN: 978-1-4614-8151-5
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics