Visual Attention Driven by Auditory Cues

Nakajima, Jiro; Kimura, Akisato; Sugimoto, Akihiro; Kashino, Kunio

doi:10.1007/978-3-319-14442-9_7

Jiro Nakajima²⁰,
Akisato Kimura²¹,
Akihiro Sugimoto²² &
…
Kunio Kashino²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8936))

Included in the following conference series:

International Conference on Multimedia Modeling

3837 Accesses
2 Citations

Abstract

Human visual attention can be modulated not only by visual stimuli but also by ones from other modalities such as audition. Hence, incorporating auditory information into a human visual attention model would be a key issue for building more sophisticated models. However, the way of integrating multiple pieces of information arising from audio-visual domains still remains a challenging problem. This paper proposes a novel computational model of human visual attention driven by auditory cues. Founded on the Bayesian surprise model that is considered to be promising in the literature, our model uses surprising auditory events to serve as a clue for selecting synchronized visual features and then emphasizes the selected features to form the final surprise map. Our approach to audio-visual integration focuses on using effective visual features alone but not all available features for simulating visual attention with the help of auditory information. Experiments using several video clips show that our proposed model can better simulate eye movements of human subjects than other existing models in spite that our model uses a smaller number of visual features.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahveninen, J., Jaaskelainen, I.P., Belliveau, J.W., Hamalainen, M., Lin, F.H., Raij, T.: Dissociable influences of auditory object vs. spatial attention on visual system oscillatory activity. PLoS One 7(6), e38511 (2012)
Google Scholar
Begum, M., Karray, F.: Visual attention for robotic cognition: A survey. IEEE Transactions on Autonomous Mental Development 3(1), 92–105 (2011)
Article Google Scholar
Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(1), 185–207 (2013)
Article MathSciNet Google Scholar
Van der Burg, E., Cass, J., Olivers, C.N.L., Theeuwes, J., Alais, D.: Efficient visual search from synchronized auditory signals requires transient audiovisual events. PLoS One 5(5), e10664 (2010)
Google Scholar
Evangelopoulos, G., Zlatintsi, A., Potamianos, A., Maragos, P., Rapantzikos, K., Skoumas, G., Avrithis, Y.: Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention. IEEE Transactions on Multimedia 15(7), 1553–1568 (2013)
Article Google Scholar
Gao, D., Han, S., Vasconcelos, N.: Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(6), 989–1005 (2009)
Article Google Scholar
Itti, L., Dhavale, N., Pighin, F.: Realistic avatar eye and head animation using a neurobiological model of visual attention. In: Proc. SPIE 48th Annual International Symposium on Optical Science and Technology, vol. 5200, pp. 64–78. SPIE Press, Bellingham (2003)
Google Scholar
Itti, L., Baldi, P.: Bayesian surprise attracts human attention. Vision Research 49(10), 1295–1306 (2009)
Article Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998)
Article Google Scholar
Kayser, C., Petkov, C., Lippert, M., Logothesis, N.: Mechanisms for allocating auditory attention: An auditory saliency map. Current Biology 15, 1943–1947 (2005)
Article Google Scholar
Kimura, A., Yonetani, R., Hirayama, T.: Computational models of human visual attention and their implementations: A survey. IEICE Transactions 96-D(3), 562–578 (2013)
Google Scholar
Ma, Y.F., Hua, X.S., Lu, L., Zhang, H.J.: A generic framework of user attention model and its application in video summarization. IEEE Transactions on Multimedia 7(5), 907–919 (2005)
Article Google Scholar
Miyazato, K., Kimura, A., Takagi, S., Yamato, J.: Real-time estimation of human visual attention with dynamic Bayesian network and MCMC-based particle filter. In: ICME, pp. 250–257. IEEE (2009)
Google Scholar
Nakajima, J., Sugimoto, A., Kawamoto, K.: Incorporating audio signals into constructing a visual saliency map. In: Klette, R., Rivera, M., Satoh, S. (eds.) PSIVT 2013. LNCS, vol. 8333, pp. 468–480. Springer, Heidelberg (2014)
Chapter Google Scholar
Ngo, C.W., Ma, Y.F., Zhang, H.J.: Video summarization and scene detection by graph modeling. IEEE Transactions on Circuits and Systems for Video Technology 15(2), 296–305 (2005)
Article Google Scholar
Pang, D., Kimura, A., Takeuchi, T., Yamato, J., Kashino, K.: A stochastic model of selective visual attention with a dynamic Bayesian network. In: Proc. IEEE International Conference on Multimedia and Expo. (ICME), pp. 1073–1076. IEEE (2008)
Google Scholar
Rolf, M., Asada, M.: Visual attention by audiovisual signal-level synchrony. In: Proc. 9th ACM/IEEE International Conference on Human-Robot Interaction Workshop on Attention Models in Robotics: Visual Systems for Better HRI (2014)
Google Scholar
Ruesch, J., Lopes, M., Bernardino, A., Hornstein, J., Santos-Victor, J., Pfeifer, R.: Multimodal saliency-based bottom-up attention a framework for the humanoid robot iCub. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 962–967 (2008)
Google Scholar
Schauerte, B., Kühn, B., Kroschel, K., Stiefelhagen, R.: Multimodal saliency-based attention for object-based scene analysis. In: Proc. 24th International Conference on Intelligent Robots and Systems (IROS). IEEE/RSJ (2011)
Google Scholar
Schauerte, B., Stiefelhagen, R.: Wow! Bayesian surprise for salient acoustic event detection. In: Proc. 38th International Conference on Acoustics, Speech, and Signal Processing, (ICASSP) (2013)
Google Scholar
Spexard, T., Hanheide, M., Sagerer, G.: Human-oriented interaction with an anthropomorphic robot. IEEE Transactions on Robotics 23(5), 852–862 (2007)
Article Google Scholar
Tsuchida, T., Cottrell, G.: Auditory saliency using natural statistics. In: Proc. Annual Meeting of the Cognitive Science (CogSci), pp. 1048–1053 (2012)
Google Scholar
Wolfe, J., Cave, K., Franzel, S.: Guided search: an alternative to the feature integration model for visual search. Journal of Experimental Psychology: Human Perception and Performance 15(3), 419–433 (1989)
Google Scholar
Zhang, L., Tong, M.H., Marks, T.K., Shan, H., Cottrell, G.W.: SUN: A Bayesian framework for saliency using natural statistics. Journal of Vision 8(7) (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Chiba University, Japan
Jiro Nakajima
Communication Science Laboratories, NTT Corporation, Japan
Akisato Kimura & Kunio Kashino
National Institute of Informatics, Japan
Akihiro Sugimoto

Authors

Jiro Nakajima
View author publications
You can also search for this author in PubMed Google Scholar
Akisato Kimura
View author publications
You can also search for this author in PubMed Google Scholar
Akihiro Sugimoto
View author publications
You can also search for this author in PubMed Google Scholar
Kunio Kashino
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Technology, P.O. Box 123, 2007, Sydney, NSW, Australia
Xiangjian He
University of Newcastle, University Dr, Callaghan, 2308, NSW, Australia
Suhuai Luo
University of Technology, P.O. Box 123, 2007, Sydney, NSW, Australia
Dacheng Tao & Muhammad Abul Hasan &
National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, P.R. China
Changsheng Xu
Shanghai Jitotong University, 800 Dong Chuan Rd, 200240, Shanghai, China
Jie Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nakajima, J., Kimura, A., Sugimoto, A., Kashino, K. (2015). Visual Attention Driven by Auditory Cues. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8936. Springer, Cham. https://doi.org/10.1007/978-3-319-14442-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-14442-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14441-2
Online ISBN: 978-3-319-14442-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics