Abstract
Automatic video annotation is a critical step for content-based video retrieval and browsing. Detecting the focus of interest in video frames automatically can benefit the tedious manual labeling process. However, producing an appropriate extent of visually salient regions in video sequences is a challenging task. Therefore, in this work, we propose a novel approach for modeling dynamic visual attention based on spatiotemporal analysis. Our model first detects salient points in three-dimensional video volumes, and then uses the points as seeds to search the extent of salient regions in a novel motion attention map. To determine the extent of attended regions, we use the maximum entropy in the spatial domain to analyze the dynamics derived by spatiotemporal analysis. Our experiment results show that the proposed dynamic visual attention model achieves high precision value of 70% and reveals its robustness in successive video volumes.
Similar content being viewed by others
References
Bollmann M, Hoischen R, Mertsching B (1997) Integration of static and dynamic scene features guiding visual attention. Proc. DAGM-Symposium, pp. 483–490
Courty N, Marchand E, Arnaldi B (2003) A new application for saliency maps: Synthetic vision of autonomous actors. Proc. International Conference on Image Processing, Barcelona, Spain
Harris C, Stephens M (1988) A combined corner and edge detector. In Alvey Vision Conference, pp. 147–151
Ho CC, Cheng WH, Pan TJ, Wu JL (2003) A user-attention based focus detection framework and its application. Proceedings of the International Conference on Information, Communications and Signal Processing and Fourth Pacific-Rim Conference on Multimedia 3:1315–1319
Itti L, Koch C (2001) Computational modeling of visual attention. Neuroscience 2:1–11
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
James W (1980/1981) The principles of psychology. Harvard University Press, Cambridge
Laptev L, Lindeberg T (2003) Space-time interest points. Proc. IEEE International Conference on Computer Vision, pp. 432–439
Li S, Lee MC (2007) An efficient spatiotemporal attention model and its application to shot matching. IEEE Trans Circ Syst Video Tech 17(10):1383–1387
Lucas B, Kanade T (1981) An iterative image registration technique with an application to stereo vision. Proc. International Joint Conference on Artificial Intelligence, pp. 674–679
Ma, YF, Lu L, Zhang HJ, Li M (2002) A user attention model for video summarization. Proc ACM Multimed, pp. 533–541
Navalpakkam V, Itti L (2006) An integrated model of top-down and bottom-up attention for optimizing detection speed. Proc IEEE CVPR 2:2049–2056
Pal NR, Pal SK (1993) Entropy: a new definition and its applications. IEEE Trans Syst Man Cybern 21(5):1260–1270
Shic F, Scassellati B (2007) A behavioral analysis of computational models of visual attention. Int J Comput Vis 73(2):159–177
Shih CC, Tyan HR, Mark Liao HY (2001) Shot change detection based on the reynolds transport theorem. Proc. Second IEEE Pacific Rim Conference on Multimedia, Oct. 24–26, Beijing, China, LNCS 2195, pp. 819–824
Su CW, Mark Liao HY, Tyan HR, Fan KC, Chen L-H (2005) A motion-tolerant dissolve detection algorithm. IEEE Trans on Multimedia 7(6), December 2005
Zhai Y, Shah M (2006) Visual attention detection in video sequences using spatiotemporal cues. Proc ACM Multimedia, pp. 815–824
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, DY. Modelling salient visual dynamics in videos. Multimed Tools Appl 53, 271–284 (2011). https://doi.org/10.1007/s11042-010-0511-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0511-5