Abstract
This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.
Similar content being viewed by others
References
Frintrop S, Rome E, Christensen H (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept 7(1):1–39
Li H, Ngan KN (2011) Learning to extract focused objects from low dof images. IEEE Trans Circuits Syst Video Technol 21(11):1571–1580
Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198
Rapantzikos K, Avrithis Y (2005) An enhanced spatiotemporal visual attention model for sports video analysis. In: International Workshop on content-based Multimedia indexing (CBMI)
Lang C, Xu D, Jiang Y (2009) Shot type classification in sports video based on visual attention. In: Proceedings of the International Conference on Computational Intelligence and Natural Computing, pp 336–339
Li H, Ngan KN, Liu Q (2009) Faceseg: Automatic face segmentation for real-time video. IEEE Trans Multimedia 11(1):77–88
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Oliva A, Torralba A, Castelhano MS, Henderson JM (2003) Top-down control of visual attention in object detection. In: Proc. Int. Conf. Image Processing (ICIP), vol 1. IEEE, pp 253–256
Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–8
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems. IEEE, pp 545–552
Li H, Ngan KN (2008) Saliency model based face segmentation in head-and-shoulder video sequences. J Vis Commun Image Represent 19(5):320–333
Bur A, Hügli H (2007) Optimal cue combination for saliency computation: a comparison with human vision. In: Nature Inspired Problem-Solving Methods in Knowledge Engineering, pp 109–118
Gao D, Mahadevan V, Vasconcelos N (2008) On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis 8(7):1–18
Cerf M, Harel J, Einhäuser W, Koch C (2008) Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, vol 20
Judd T, Ehinger K, Durand F, Torralba A (2009) Larning to predict where humans look. In: Proc. Int. Conf. Comput. Vision (ICCV), pp 2106–2113
Bruce ND, Tsotsos JK (2009) Saliency, attention, and visual search: an information theoretic approach. J Vis 9(3):1–24
Li H, Ngan KN (2011) A co-saliency model of image pairs. IEEE Trans Image Process 20(12):3365–3375
Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201
Schauerte B, Stiefelhagen R, Fraunhofer I (2012) Predicting human gaze using quaternion dct image signature saliency and face detection. In: Proc. Workshop on the Applications of Computer Vision, pp 137–144
Luo W, Li H, Liu G (2012) Global salient information maximization for saliency detection. Signal Process: Image Commun 27(3):238–248
Toet A (2011) Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans Pattern Anal Mach Intell 33(11):2131–2146
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neuro 2(3):194–203
You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Video Technol 17(3):273–285
Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306
Chen D-Y (2011) Modelling salient visual dynamics in videos. Multimed Tools Appl 53(1):271–284
Mahadevan V, Vasconcelos N (2008) Background subtraction in highly dynamic scenes. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–6
Chang C, Hsieh K, Chiang M, Wu J (2010) Virtual spotlighted advertising for tennis videos. J Vis Commun Image Represent 21(7):595–612
Wu B, Xu L, Liu G (2013) A visual attention model for news video. In: Proceedings of 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp 941–944
Feng W, Hu B (2008) Quaternion discrete cosine transform and its application in color template matching. In: Congress on Image and Signal Processing (CISP’08), vol 2, pp 252–256
Kim W, Jung C, Kim C (2011) Spatiotemporal saliency detection and its applications in static and dynamic scenes. IEEE Trans Circuits Syst Video Technol 21(4):446–456
Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 2376–2383
Cheng M, Zhang G, Mitra N, Huang X, Hu S (2011) Global contrast based salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 409–416
Pritch Y, Krahenbuhl P, Perazzi F, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 733–740
Ho S, Wechsler H (2007) Detecting changes in unlabeled data streams using martingale. In: Proceedings of the 20th international joint conference on Artifical intelligence, pp 1912–1917
Engel S, Zhang X, Wandell B et al (1997) Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 388(6637):68–70
Bian P, Zhang L (2010) Visual saliency: a biologically plausible contourlet-like frequency domain approach. Cogn neurodyn 4(3):189–198
Ell T, Sangwine S (2007) Hypercomplex fourier transforms of color images. IEEE Trans Image Process 16(1):22–35
Liu C (2009) Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. dissertation, Massachusetts Institute of Technology
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8
Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 13–18
Wu B, Xu L, Zeng L, Wang Z, Wang Y (2013) A unified framework for spatiotemporal salient region detection. EURASIP J Image Video Process 2013(1):1–16
Borji A, Itti L (2012) Exploiting local and global patch rarities for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 478–485
Cevikalp H, Triggs W (2012) Efficient object detection using cascades of nearest convex model classifiers. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)
Everingham M, Sivic J, Zisserman A (2006) “hello! my name is... buffy” –automatic naming of characters in tv video. In: BMVC
Benoit A, Caplier A (2010) Fusing bio-inspired vision data for simplified high level scene interpretation: application to face motion analysis. Comput Vis Image Underst 114(7):774–789
Zhang D, Qi W, Zhang H (2001) A new shot boundary detection algorithm. Advances in Multimedia Information Processing-PCM 2001, pp 63–70
Sun J, Kang SB, Xu Z-B, Tang X, Shum H-Y (2007) Flash cut: Foreground extraction with flash and no-flash image pairs. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8
Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp 155–162
Itti L (2008) Crcns data sharing: eye movements during free-viewing of natural videos. In: Collaborative research in computational neuroscience annual meeting
Hadizadeh H, Enriquez M, Bajic I (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903
Borji A (2012) Boosting bottom-up and top-down visual features for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)
Miyazato K, Kimura A, Takagi S, Yamato J (2009) Real-time estimation of human visual attention with dynamic bayesian network and mcmc-based particle filter. In: Proc. IEEE Conf. Multimedia and Expo. (ICME), pp 250–257
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, B., Xu, L. Integrating bottom-up and top-down visual stimulus for saliency detection in news video. Multimed Tools Appl 73, 1053–1075 (2014). https://doi.org/10.1007/s11042-013-1530-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1530-9