Skip to main content

Bottom-Up Audio-Visual Attention for Scene Exploration

  • Chapter
  • First Online:
Multimodal Computational Attention for Scene Understanding and Robotics

Part of the book series: Cognitive Systems Monographs ((COSMOS,volume 30))

  • 710 Accesses

Abstract

We can differentiate between two attentional mechanisms: First, overt attention directs the sense organs toward salient stimuli to optimize the perception quality.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Oppenheim refers to \({\mathscr {F}}[f_p(x)] = \frac{1}{|F(\omega )|}{\mathscr {F}}[f(x)]\) with \(F(\omega ) = {\mathscr {F}}[f](\omega )\).

  2. 2.

    From a visual saliency perspective, it is not essential to define the case in \(\alpha \) that handles \(p=0\). However, this makes the DCT-II matrix orthogonal, but breaks the direct correspondence with a real-even DFT of half-shifted input. Even more, it is possible to entirely operate without normalization, i.e. remove the \(\alpha \) terms, which results in a scale change that is irrelevant for saliency calculation.

  3. 3.

    Please note that all operations in Eqs. 3.74 and 3.76 operate element-wise. We chose this simplified notation for its compactness and readability.

  4. 4.

    Please note that the traveling salesman problem (TSP)’s additional requirement to return to the starting city does not change the computational complexity.

References

  1. Achanta, R., Süsstrunk, S.: Saliency detection using maximum symmetric surround. In: Proceedings of the International Conference on Image Processing (2010)

    Google Scholar 

  2. Achanta, R., Hemami, S., Estrada, F., Süsstrunk, S.: Frequency-tuned salient region detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  3. Alley, R.E.: Algorithm Theoretical Basis Document for Decorrelation Stretch. NASA, JPL (1996)

    Google Scholar 

  4. Alsam, A., Sharma, P.: A robust metric for the evaluation of visual saliency algorithms. J. Opt. Soc. Am. (2013)

    Google Scholar 

  5. Asfour, T., Regenstein, K., Azad, P., Schröder, J., Bierbaum, A., Vahrenkamp, N., Dillmann, R.: ARMAR-III: an integrated humanoid platform for sensory-motor control. In: Humanoids (2006)

    Google Scholar 

  6. Asfour, T., Welke, K., Azad, P., Ude, A., Dillmann, R.: The Karlsruhe Humanoid Head. In: Humanoids (2008)

    Google Scholar 

  7. Andreopoulos, A., Hasler, S., Wersing, H., Janssen, H., Tsotsos, J., Körner, E.: Active 3D object localization using a humanoid robot. IEEE Trans. Robot. 47–64 (2010)

    Google Scholar 

  8. Barlow, H.: Possible principles underlying the transformation of sensory messages. Sens. Commun. 217–234 (1961)

    Google Scholar 

  9. Bell, A.J., Sejnowski, T.J.: The independent components of scenes are edge filters. Vis. Res. 37(23), 3327–3338 (1997)

    Article  Google Scholar 

  10. Begum, M., Karray, F., Mann, G.K.I., Gosine, R.G.: A probabilistic model of overt visual attention for cognitive robots. IEEE Trans. Syst. Man Cybern. B 40, 1305–1318 (2010)

    Article  Google Scholar 

  11. Bernardo, J.M.: Algorithm as 103 psi(digamma function) computation. Appl. Stat. 25, 315–317 (1976)

    Article  Google Scholar 

  12. Bian, P., Zhang, L.: Biological plausibility of spectral domain approach for spatiotemporal visual saliency. In: Proceedings of the Annual Conference on Neural Information Processing Systems (2009)

    Google Scholar 

  13. Bruce, N., Tsotsos, J.: Saliency, attention, and visual search: an information theoretic approach. J. Vis. 9(3), 1–24 (2009)

    Article  Google Scholar 

  14. Brown, M., Susstrunk, S., Fua, P.: Spatio-chromatic decorrelation by shift-invariant filtering. In: CVPR Workshop (2011)

    Google Scholar 

  15. Borji, A., Sihite, D., Itti, L.: What/where to look next? modeling top-down visual attention in complex interactive environments. IEEE Trans. Syst. Man Cybern. A 99 (2013)

    Google Scholar 

  16. Borji, A., Sihite, D.N., Itti, L.: Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2013)

    Article  MathSciNet  Google Scholar 

  17. Buchsbaum, G., Gottschalk, A.: Trichromacy, opponent colours coding and optimum colour information transmission in the retina. In: Proceedings of the Royal Society, vol. B, no. 220, pp. 89–113 (1983)

    Google Scholar 

  18. Butko, N., Zhang, L., Cottrell, G., Movellan, J.R.: Visual saliency model for robot cameras. In: Proceedings of the International Conference on Robotics and Automation (2008)

    Google Scholar 

  19. Cashon, C., Cohen, L.: The construction, deconstruction, and reconstruction of infant face perception. NOVA Science Publishers: ch, pp. 55–68. The development of face processing in infancy and early childhood, Current perspectives (2003)

    Google Scholar 

  20. Cerf, M., Harel, J., Einhäuser, W., Koch, C.: Predicting human gaze using low-level saliency combined with face detection. In: Proceedings of the Annual Conference on Neural Information Processing Systems (2007)

    Google Scholar 

  21. Cerf, M., Frady, P., Koch, C.: Subjects’ inability to avoid looking at faces suggests bottom-up attention allocation mechanism for faces. In: Proceedings of the Society for Neuroscience (2008)

    Google Scholar 

  22. Cerf, M., Frady, E.P., Koch, C.: Faces and text attract gaze independent of the task: experimental data and computer model. J. Vis. 9 (2009)

    Google Scholar 

  23. CLEAR2007: Classification of events, activities and relationships evaluation and workshop. http://www.clear-evaluation.org

  24. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 603–619 (2002)

    Google Scholar 

  25. Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. MIT Press and McGraw-Hill (1990)

    Google Scholar 

  26. Cox, R.T.: Probability, frequency, and reasonable expectation. Am. J. Phys. 14, 1–13 (1964)

    Article  MathSciNet  MATH  Google Scholar 

  27. Dankers, A., Barnes, N., Zelinsky, A.: A reactive vision system: active-dynamic saliency. In: Proceedings of the International Conference on Computer Vision Systems (2007)

    Google Scholar 

  28. DiBiase, J.H., Silverman, H.F., Brandstein, M.S.: Robust localization in reverberant rooms, ch. 8, pp. 157–180. Springer (2001)

    Google Scholar 

  29. Dragoi, V., Sharma, J., Miller, E.K., Sur, M.: Dynamics of neuronal sensitivity in visual cortex and local feature discrimination. Nat. Neurosci. 883–891 (2002)

    Google Scholar 

  30. Duan, L., Wu, C., Miao, J., Qing, L., Fu, Y.: Visual saliency detection by spatially weighted dissimilarity. In: Proceedings of the Interantional Conference on Computer Vision and Pattern Recognition (2011)

    Google Scholar 

  31. Duncan, J.: Selective attention and the organization of visual information. J. Exp. Psychol.: General 113(4), 501–517 (1984)

    Google Scholar 

  32. Ell, T.: Quaternion-fourier transforms for analysis of two-dimensional linear time-invariant partial differential systems. In: International Conference Decision and Control (1993)

    Google Scholar 

  33. Ell, T., Sangwine, S.: Hypercomplex fourier transforms of color images. IEEE Trans. Image Process. 16(1), 22–35 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  34. Egly, R., Driver, J., Rafal, R.D.: Shifting visual attention between objects and locations: evidence from normal and parietal lesion subjects. J. Exp. Psychol.: General, 123(2) (1994)

    Google Scholar 

  35. Ehrgott, M.: Multicriteria Optimization. Springer (2005)

    Google Scholar 

  36. Eriksen, C.W.: St James, J.D.: Visual attention within and around the field of focal attention: a zoom lens model. Percept. Psychophys. 40(4), 225–240 (1986)

    Article  Google Scholar 

  37. Essa, I.: Ubiquitous sensing for smart and aware environments. IEEE Pers. Commun. 7(5), 47–49 (2000)

    Article  Google Scholar 

  38. Fleming, K.A., Peters II, R.A., Bodenheimer, R.E.: Image mapping and visual attention on a sensory ego-sphere In: Proceedings of the International Conference on Intelligent Robotics and Systems (2006)

    Google Scholar 

  39. Feng, W., Hu, B.: Quaternion discrete cosine transform and its application in color template matching. In: International Conference on Image and Signal Processing, pp. 252–256 (2008)

    Google Scholar 

  40. Frintrop, S., Rome, E., Christensen, H.I.: Computational visual attention systems and their cognitive foundation: a survey. ACM Trans. Appl. Percept. 7(1), 6:1–6:39 (2010)

    Google Scholar 

  41. Fröba, B., Ernst, A.: Face detection with the modified census transform. In: Proceedings of the International Conference on Automatic Face and Gesture Recognition (2004)

    Google Scholar 

  42. Gao, D., Mahadevan, V., Vasconcelos, N.: On the plausibility of the discriminant center-surround hypothesis for visual saliency. J. Vis. 8(7), 1–18 (2008)

    Article  Google Scholar 

  43. Geusebroek, J.M., van den Boomgaard, R., Smeulders, A.W.M., Geerts, H.: Color invariance. IEEE Trans. Pattern Anal. Mach. Intell. 23(12), 1338–1350 (2001)

    Article  Google Scholar 

  44. Geusebroek, J.-M., Smeulders, A., van de Weijer, J.: Fast anisotropic gauss filtering. IEEE Trans. Image Process. 12(8), 938–943 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  45. Gillespie, A.R., Kahle, A.B., Walker, R.E.: Color enhancement of highly correlated images. II. Channel ratio and chromaticity transformation techniques. Remote Sens. Environ. 22(3), 343–365 (1987)

    Article  Google Scholar 

  46. Gillies, D.: The subjective theory. In: Philosophical Theories of Probability. Routledge, ch. 4 (2000)

    Google Scholar 

  47. Goferman, S., Zelnik-Manor, L., Tal, A.: Context-aware saliency detection. IEEE Trans. Pattern Anal. Mach, Intell (2012)

    Google Scholar 

  48. Guo, C., Zhang, L.: A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans. Image Process. 19, 185–198 (2010)

    Article  MathSciNet  Google Scholar 

  49. Guo, C., Ma, Q., Zhang, L.: Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2008)

    Google Scholar 

  50. Hall, D., Linas, J.: Handbook of Multisensor Data Fusion: Theory and Practice. CRC Press (2008)

    Google Scholar 

  51. Hamilton, W.R.: Elements of Quaternions. University of Dublin Press (1866)

    Google Scholar 

  52. Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: Proceedings of the Annual Conference on Neural Information Processing Systems (2007)

    Google Scholar 

  53. Heeger, D.J., Bergen, J.R.: Pyramid-based texture analysis/synthesis. In: Proceedings of the Techniques Annual Conference Special Interest Group on Graphics and Interactive, pp. 229–238 (1995)

    Google Scholar 

  54. Henderson, J.M.: Human gaze control during real-world scene perception. Trends Cogn. Sci. 498–504 (2003)

    Google Scholar 

  55. Heracles, M., Körner, U., Michalke, T., Sagerer, G., Fritsch, J., Goerick, C.: A dynamic attention system that reorients to unexpected motion in real-world traffic environments. In: Proceedings of the International Conference on. Intelligent Robots and Systems (2009)

    Google Scholar 

  56. Hering, E.: Outlines of a Theory of the Light Sense. Harvard University Press (1964)

    Google Scholar 

  57. Hershey, J., Olsen, P.: Approximating the kullback leibler divergence between gaussian mixture models. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (2007)

    Google Scholar 

  58. Hodge, V.J., Austin, J.: A survey of outlier detection methodologies. Artif. Intell. Rev. 22, 85–126 (2004)

    Article  MATH  Google Scholar 

  59. Holsopple, J., Yang, S.: Designing a data fusion system using a top-down approach. In: Proceedings of the International Conference for Military Communications (2009)

    Google Scholar 

  60. Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2007)

    Google Scholar 

  61. Hou, X., Harel, J., Koch, C.: Image signature: highlighting sparse salient regions. IEEE Trans. Pattern Anal. Mach. Intell. 34(1), 194–201 (2012)

    Article  Google Scholar 

  62. Huang, T., Burnett, J., Deczky, A.: The importance of phase in image processing filters. IEEE Trans. Acoust. Speech Signal Process. 23(6), 529–542 (1975)

    Article  Google Scholar 

  63. Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vis. Res. 49(10), 1295–1306 (2009)

    Article  Google Scholar 

  64. Itti, L., Baldi, P.F.: A principled approach to detecting surprising events in video. In: Proceedings of the International Conference on Image Processing Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  65. Itti, L., Baldi, P.F.: Bayesian surprise attracts human attention. In: Proceedings of the Annual Conference on Neural Information Processing Systems (2006)

    Google Scholar 

  66. Itti, L., Koch, C.: A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 40(10–12), 1489–1506 (2000)

    Article  Google Scholar 

  67. Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998)

    Article  Google Scholar 

  68. Jaynes, E.T.: Probability Theory. The Logic of Science Cambridge University Press (2003)

    Google Scholar 

  69. Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: Proceedings of the International Conference on Computer Vision (2009)

    Google Scholar 

  70. Judd, T., Durand, F., Torralba, A.: Fixations on low-resolution images. J. Vis. 11(4) (2011)

    Google Scholar 

  71. Judd, T., Durand, F., Torralba, A.: A benchmark of computational models of saliency to predict human fixations. Technical Report, MIT (2012)

    Google Scholar 

  72. Johnson, D., McGeoch, L.: The traveling salesman problem: a case study in local optimization. Local search in combinatorial optimization, pp. 215–310 (1997)

    Google Scholar 

  73. Jost, T., Ouerhani, N., von Wartburg, R., Mäuri, R., Häugli, H.: Assessing the contribution of color in visual attention. Comput. Vis. Image Underst. 100, 107–123 (2005)

    Article  Google Scholar 

  74. Kalinli, O.: Biologically inspired auditory attention models with applications in speech and audio processing, Ph.D. dissertation, University of Southern California, Los Angeles, CA, USA (2009)

    Google Scholar 

  75. Kalinli, O., Narayanan, S.: Prominence detection using auditory attention cues and task-dependent high level information. IEEE Trans. Audio Speech Lang. Proc. 17(5), 1009–1024 (2009)

    Article  Google Scholar 

  76. Kahneman, D., Treisman, A.: Varieties of Attention. Academic Press (2000), ch. Changing views of attention and automaticity, pp. 26–61

    Google Scholar 

  77. Kahneman, D., Treisman, A., Gibbs, B.J.: The reviewing of object files: object-specific integration of information. Cogn. Psychol. 24(2), 175–219 (1992)

    Article  Google Scholar 

  78. Kayser, C., Petkov, C.I., Lippert, M., Logothetis, N.K.: Mechanisms for allocating auditory attention: an auditory saliency map. Curr. Biol. 15(21), 1943–1947 (2005)

    Article  Google Scholar 

  79. Klin, A., Jones, W., Schultz, R., Volkmar, F., Cohen, D.: Visual fixation patterns during viewing of naturalistic social situations as predictors of social competence in individuals with autism. Arch. Gen. Psychiatry 59(9), 809–816 (2002)

    Article  Google Scholar 

  80. Kootstra, G., Nederveen, A., de Boer, B.: Paying attention to symmetry. In: Proceedings of the British Conference on Computer Vision (2008)

    Google Scholar 

  81. Kühn, B., Belkin, A., Swerdlow, A., Machmer, T., Beyerer, J., Kroschel, K.: Knowledge-driven opto-acoustic scene analysis based on an object-oriented world modelling approach for humanoid robots. In: Proceedings of the 41st International Symposium Robotics and 6th German Conference on Robotics (2010)

    Google Scholar 

  82. Li, J., Levine, M.D., An, X., He, H.: Saliency detection based on frequency and spatial domain analysis. In: Proceedings of the British Conference on Computer Vision (2011)

    Google Scholar 

  83. Liang, Y., Simoncelli, E., Lei, Z.: Color channels decorrelation by ica transformation in the wavelet domain for color texture analysis and synthesis. Proceedings of the International Conference on Computer Vision and Pattern Recognition 1, 606–611 (2000)

    Google Scholar 

  84. Lichtenauer, J., Hendriks, E. Reinders, M.: Isophote properties as features for object detection. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (2005)

    Google Scholar 

  85. Lin, K.-H., Zhuang, X., Goudeseune, C., King, S., Hasegawa-Johnson, M., Huang, T.S.: Improving faster-than-real-time human acoustic event detection by saliency-maximized audio visualization. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (2012)

    Google Scholar 

  86. Lu, S., Lim, J.-H.: Saliency modeling from image histograms. In: Proceedings of the European Conference on Computer Vision (2012)

    Google Scholar 

  87. Luo, W., Li, H., Liu, G., Ngan, K.N.: Global salient information maximization for saliency detection. Signal Process.: Image Commun. 27, 238–248 (2012)

    Google Scholar 

  88. Machmer, T., Moragues, J., Swerdlow, A., Vergara, L., Gosalbez-Castillo, J., Kroschel, K.: Robust impulsive sound source localization by means of an energy detector for temporal alignment and pre-classification. In: Proceedings of the European Signal Processing of Conference (2009)

    Google Scholar 

  89. Machmer, T., Swerdlow, A., Kühn, B., Kroschel, K.: Hierarchical, knowledge-oriented opto-acoustic scene analysis for humanoid robots and man-machine interaction. In: Proceedings of the International Conference on Robotics and Automation (2010)

    Google Scholar 

  90. Meger, D., Forssén, P.-E., Lai, K., Helmer, S., McCann, S., Southey, T., Baumann, M., Little, J.J., Lowe, D.G.: Curious George: an attentive semantic robot. In: IROS Workshop: From sensors to human spatial concepts (2007)

    Google Scholar 

  91. Meur, O.L., Callet, P.L., Barba, D.: Predicting visual fixations on video based on low-level visual features. J. Vis. 47(19), 2483–2498 (2006)

    Google Scholar 

  92. Muller, J.R., Metha, A.B., Krauskopf, J., Lennie, P.: Rapid adaptation in visual cortex to the structure of images. Science 285, 1405–1408 (1999)

    Article  Google Scholar 

  93. Nakajima, J., Sugimoto, A., Kawamoto, K.: Incorporating audio signals into constructing a visual saliency map. In: Klette, R., Rivera, M., Satoh, S. (eds.) Image and Video Technology, Series Lecture Notes in Computer Science, vol. 8333. Springer, Berlin, Heidelberg (2014)

    Google Scholar 

  94. Olmos, A., Kingdom, F.A.A.: A biologically inspired algorithm for the recovery of shading and reflectance images. Perception 33, 1463–1473 (2004)

    Article  Google Scholar 

  95. Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 (1996)

    Article  Google Scholar 

  96. Onat, S., Libertus, K., König, P.: Integrating audiovisual information for the control of overt attention. J. Vis. 7(10) (2007)

    Google Scholar 

  97. Oppenheim, A., Lim, J.: The importance of phase in signals. Proc. IEEE 69(5), 529–541 (1981)

    Article  Google Scholar 

  98. Orabona, F., Metta, G., Sandini, G.: A proto-object based visual attention model. In: Paletta, L., Rome, E. (eds.) Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint, pp. 198–215 (2008)

    Google Scholar 

  99. Parkhurst, D., Law, K., Niebur, E.: Modeling the role of salience in the allocation of overt visual attention. Vis. Res. 42(1), 107–123 (2002)

    Article  Google Scholar 

  100. Pascale, D.: A review of RGB color spaces...from xyY to R’G’B’ (2008)

    Google Scholar 

  101. Peters, R.J., Itti, L.: Applying computational tools to predict gaze direction in interactive visual environments. ACM Trans. Appl. Percept. 5(2) (2008)

    Google Scholar 

  102. Peters, R., Itti, L.: The role of fourier phase information in predicting saliency. J. Vis. 8(6), 879 (2008)

    Article  Google Scholar 

  103. Peters, R., Iyer, A., Itti, L., Koch, C.: Components of bottom-up gaze allocation in natural images. Vis. Res. 45(18), 2397–2416 (2005)

    Article  Google Scholar 

  104. Posner, M.I.: Orienting of attention. Q. J. Exp. Psychol. 32(1), 3–25 (1980)

    Article  MathSciNet  Google Scholar 

  105. Rajashekar, U., Bovik, A.C., Cormack, L.K.: Visual search in noise: revealing the influence of structural cues by gaze-contingent classïňA̧cation image analysis. J. Vis. 6(4), 379–386 (2006)

    Article  Google Scholar 

  106. Ramenahalli, S., Mendat, D.R., Dura-Bernal, S., Culurciello, E., Niebur, E., Andreou, A.: Audio-visual saliency map: overview, basic models and hardware implementation. In: Annual Conference on Information Sciences and Systems (2013)

    Google Scholar 

  107. Rao, R.P., Ballard, D.H.: Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat. Neurosci. 79–87 (1999)

    Google Scholar 

  108. Ratliff, F.: Mach Bands: Quantitative Studies on Neural Networks in the Retina. Holden-Day, San Francisco (1965)

    Google Scholar 

  109. Reinhard, E., Pouli, T.: Colour spaces for colour transfer. Computational Color Imaging, series Lecture Notes in Computer Science 6626, 1–15 (2011)

    Article  Google Scholar 

  110. Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21(5), 34–41 (2001)

    Article  Google Scholar 

  111. Rensink, R.A.: The dynamic representation of scenes. Vis. Cogn. 7, 17–42 (2000)

    Article  Google Scholar 

  112. Rensink, R.A.: Seeing, sensing, and scrutinizing. Vis. Res. 40, 1469–1487 (2000)

    Article  Google Scholar 

  113. Riche, N., Duvinage, M., Mancas, M., Gosselin, B., Dutoit, T.: Saliency and human fixations: state-of-the-art and study of comparison metrics. In: Proceedings of the International Conference on Computer Vision (2013)

    Google Scholar 

  114. RobotCub Consortium: iCub—an open source cognitive humanoid robotic platform. http://www.icub.org

  115. Ruderman, D., Cronin, T., Chiao, C.: Statistics of cone responses to natural images: implications for visual coding. J. Opt. Soc. Am. 15(8), 2036–2045 (1998)

    Article  Google Scholar 

  116. Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J., Pfeifer, R.: Multimodal saliency-based bottom-up attention: a framework for the humanoid robot iCub. In: Proceedings of the International Conference on Robotics and Automation (2008)

    Google Scholar 

  117. Roelfsema, P.R., Lamme, V.A.F., Spekreijse, H.: Object-based attention in the primary visual cortex of the macaque monkey. Nature 395, 376–381 (1998)

    Article  Google Scholar 

  118. Sangwine, S.J.: Fourier transforms of colour images using quaternion or hypercomplex, numbers. Electron. Lett. 32(21), 1979–1980 (1996)

    Article  Google Scholar 

  119. Sangwine, S., Ell, T.: Colour image filters based on hypercomplex convolution. IEEE Proc. Vis. Image Signal Process. 147(2), 89–93 (2000)

    Article  Google Scholar 

  120. Saidi, F., Stasse, O., Yokoi, K., Kanehiro, F.: Online object search with a humanoid robot. In: Proceedings of the International Conference on Intelligent Robots and Systems (2007)

    Google Scholar 

  121. Schauerte, B., Richarz, J., Plötz, T., Thurau, C., Fink, G.A.: Multi-modal and multi-camera attention in smart environments. In: Proceedings of the 11th International Conference on Multimodal Interfaces (ICMI). ACM, Cambridge, MA, USA, Nov 2009

    Google Scholar 

  122. Schauerte, B., Richarz, J., Fink, G.A.: Saliency-based identification and recognition of pointed-at objects. In: Proceedings of the 23rd International Conference on Intelligent Robots and Systems (IROS). IEEE/RSJ, Taipei, Taiwan, Oct. 2010

    Google Scholar 

  123. Schauerte, B., Fink, G.A.: Focusing computational visual attention in multi-modal human-robot interaction. In: Proceedings of the 12th International Conference on Multimodal Interfaces and 7th Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI). ACM, Beijing, China, Nov. 2010

    Google Scholar 

  124. Schnupp, J., Nelken, I., King, A.: Auditory Neuroscience. MIT Press (2011)

    Google Scholar 

  125. Serences, J.T., Yantis, S.: Selective visual attention and perceptual coherence. Trends Cogn. Sci. 10(1), 38–45 (2006)

    Article  Google Scholar 

  126. Shic, F., Scassellati, B.: A behavioral analysis of computational models of visual attention. Int. J. Comput. Vis. 73, 159–177 (2007)

    Article  Google Scholar 

  127. Shulman, G.L., Wilson, J.: Spatial frequency and selective attention to spatial location. Perception 16(1), 103–111 (1987)

    Article  Google Scholar 

  128. Simion, C., Shimojo, S.: Early interactions between orienting, visual sampling and decision making in facial preference. Vis. Res. 46(20), 3331–3335 (2006)

    Article  Google Scholar 

  129. Smith, T., Guild, J.: The C.I.E. colorimetric standards and their use. Trans. Opt. Soc. 33(3), 73 (1931)

    Article  Google Scholar 

  130. Song, G., Pellerin, D., Granjon, L.: How different kinds of sound in videos can influence gaze. In: Interantional Workshop on Image Analysis for Multimedia Interactive Services (2012)

    Google Scholar 

  131. Tatler, B., Baddeley, R., Gilchrist, I.: Visual correlates of fixation selection: effects of scale and time. J. Vis. 45(5), 643–659 (2005)

    Google Scholar 

  132. Temko, A., Malkin, R., Zieger, C., Macho, D., Nadeu, C., Omologo, M.: Clear evaluation of acoustic event detection and classification systems. In: Stiefelhagen, R., Garofolo, J. (eds.) Series Lecture Notes in Computer Science, vol. 4122, pp. 311–322. Springer, Berlin, Heidelberg (2007)

    Google Scholar 

  133. Tipper, S.P., Driver, J., Weaver, B.: Object-centred inhibition of return of visual attention. Q. J. Exp. Psychol. 43, 289–298 (1991)

    Article  Google Scholar 

  134. Torralba, A., Oliva, A., Castelhano, M.S., Henderson, J.M.: Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol. Rev. 113(4) (2006)

    Google Scholar 

  135. Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12(1), 97–136 (1980)

    Article  Google Scholar 

  136. Tsotsos, J.K.: The complexity of perceptual search tasks. In: Proceedings of the International Joint Conference on Artificial Intelligence (1989)

    Google Scholar 

  137. Tsotsos, J.K.: Behaviorist intelligence and the scaling problem. Artif. Intell. 75, 135–160 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  138. Tsotsos, J.K.: A Computational Perspective on Visual Attention. The MIT Press (2011)

    Google Scholar 

  139. van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworth (1979)

    Google Scholar 

  140. Vijayakumar, S., Conradt, J., Shibata, T., Schaal, S.: Overt visual attention for a humanoid robot. In: Proceedings of the International Conference on Intelligent Robotics and Systems (2001)

    Google Scholar 

  141. Walther, D., Koch, C.: Modeling attention to salient proto-objects. Neural Networks 19(9), 1395–1407 (2006)

    Article  MATH  Google Scholar 

  142. Wang, C.-A., Boehnke, S., Munoz, D.: Pupil dilation evoked by a salient auditory stimulus facilitates saccade reaction times to a visual stimulus. J. Vis. 12(9), 1254 (2012)

    Article  Google Scholar 

  143. Welke, K.: Memory-based active visual search for humanoid robots, Ph.D. dissertation, Karlsruhe Institute of Technology (2011)

    Google Scholar 

  144. Welke, K., Asfour, T., Dillmann, R..: Active multi-view object search on a humanoid head. In: Proceedings of the International Conference on Robotics and Automation (2009)

    Google Scholar 

  145. Welke, K., Asfour, T., Dillmann, R.: Inhibition of return in the bayesian strategy to active visual search. In: Proceedings of the International Conference on Machine Vision Applications (2011)

    Google Scholar 

  146. Wegener, I.: Theoretische Informatik—eine algorithmenorientierte Einführung. Teubner (2005)

    Google Scholar 

  147. Wikimedia Common (Googolplexbyte): Diagram of the opponent process. http://commons.wikimedia.org/wiki/File:Diagram_of_the_opponent_process.png, retrieved 3 April 2014, License CC BY-SA 3.0

  148. Winkler, S., Subramanian, R.: Overview of eye tracking datasets. In: International Workshop on Quality of Multimedia Experience (2013)

    Google Scholar 

  149. Wu, P.-H., Chen, C.-C., Ding, J.-J., Hsu, C.-Y., Huang, Y.-W.: Salient region detection improved by principle component analysis and boundary information. IEEE Trans. Image Process. 22(9), 3614–3624 (2013)

    Article  Google Scholar 

  150. Xu, T., Chenkov, N., Kühnlenz, K., Buss, M.: Autonomous switching of top-down and bottom-up attention selection for vision guided mobile robots. In: Proceedings of the International Conference on Intelligent Robotics and Systems (2009)

    Google Scholar 

  151. Xu, T., Pototschnig, T., Kühnlenz, K., Buss, M.: A high-speed multi-GPU implementation of bottom-up attention using CUDA. In: Proceedings of the International Conference on Robotics and Automation (2009)

    Google Scholar 

  152. Yu, Y., Gu, J., Mann, G., Gosine, R.: Development and evaluation of object-based visual attention for automatic perception of robots. IEEE Trans. Autom. Sci. Eng. 10(2), 365–379 (2013)

    Article  Google Scholar 

  153. Zadeh, L.: Fuzzy sets. Inform. Control 8(3), 338–353 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  154. Zhao, Q., Koch, C.: Learning a saliency map using fixated locations in natural scenes. J. Vis. 11(3), 1–15 (2011)

    Article  Google Scholar 

  155. Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.W.: Sun: a bayesian framework for saliency using natural statistics. J. Vis. 8(7) (2008)

    Google Scholar 

  156. Zhou, J., Jin, Z., Yang, J.: Multiscale saliency detection using principle component analysis. In: International Joint Conference on Neural Networks, pp. 1–6 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boris Schauerte .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Schauerte, B. (2016). Bottom-Up Audio-Visual Attention for Scene Exploration. In: Multimodal Computational Attention for Scene Understanding and Robotics. Cognitive Systems Monographs, vol 30. Springer, Cham. https://doi.org/10.1007/978-3-319-33796-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33796-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33794-4

  • Online ISBN: 978-3-319-33796-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics