Integrating bottom-up and top-down visual stimulus for saliency detection in news video

Wu, Bo; Xu, Linfeng

doi:10.1007/s11042-013-1530-9

Integrating bottom-up and top-down visual stimulus for saliency detection in news video

Published: 05 June 2013

Volume 73, pages 1053–1075, (2014)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Bo Wu^1,2 &
Linfeng Xu¹

822 Accesses
11 Citations
Explore all metrics

Abstract

This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Frintrop S, Rome E, Christensen H (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept 7(1):1–39
Article Google Scholar
Li H, Ngan KN (2011) Learning to extract focused objects from low dof images. IEEE Trans Circuits Syst Video Technol 21(11):1571–1580
Article MATH Google Scholar
Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198
Article MathSciNet Google Scholar
Rapantzikos K, Avrithis Y (2005) An enhanced spatiotemporal visual attention model for sports video analysis. In: International Workshop on content-based Multimedia indexing (CBMI)
Lang C, Xu D, Jiang Y (2009) Shot type classification in sports video based on visual attention. In: Proceedings of the International Conference on Computational Intelligence and Natural Computing, pp 336–339
Li H, Ngan KN, Liu Q (2009) Faceseg: Automatic face segmentation for real-time video. IEEE Trans Multimedia 11(1):77–88
Article Google Scholar
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259
Article Google Scholar
Oliva A, Torralba A, Castelhano MS, Henderson JM (2003) Top-down control of visual attention in object detection. In: Proc. Int. Conf. Image Processing (ICIP), vol 1. IEEE, pp 253–256
Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817
Article Google Scholar
Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–8
Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems. IEEE, pp 545–552
Li H, Ngan KN (2008) Saliency model based face segmentation in head-and-shoulder video sequences. J Vis Commun Image Represent 19(5):320–333
Article Google Scholar
Bur A, Hügli H (2007) Optimal cue combination for saliency computation: a comparison with human vision. In: Nature Inspired Problem-Solving Methods in Knowledge Engineering, pp 109–118
Gao D, Mahadevan V, Vasconcelos N (2008) On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis 8(7):1–18
Article Google Scholar
Cerf M, Harel J, Einhäuser W, Koch C (2008) Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, vol 20
Judd T, Ehinger K, Durand F, Torralba A (2009) Larning to predict where humans look. In: Proc. Int. Conf. Comput. Vision (ICCV), pp 2106–2113
Bruce ND, Tsotsos JK (2009) Saliency, attention, and visual search: an information theoretic approach. J Vis 9(3):1–24
Article Google Scholar
Li H, Ngan KN (2011) A co-saliency model of image pairs. IEEE Trans Image Process 20(12):3365–3375
Article MathSciNet Google Scholar
Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201
Article Google Scholar
Schauerte B, Stiefelhagen R, Fraunhofer I (2012) Predicting human gaze using quaternion dct image signature saliency and face detection. In: Proc. Workshop on the Applications of Computer Vision, pp 137–144
Luo W, Li H, Liu G (2012) Global salient information maximization for saliency detection. Signal Process: Image Commun 27(3):238–248
MathSciNet Google Scholar
Toet A (2011) Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans Pattern Anal Mach Intell 33(11):2131–2146
Article Google Scholar
Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neuro 2(3):194–203
Article Google Scholar
You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Video Technol 17(3):273–285
Article Google Scholar
Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306
Article Google Scholar
Chen D-Y (2011) Modelling salient visual dynamics in videos. Multimed Tools Appl 53(1):271–284
Article Google Scholar
Mahadevan V, Vasconcelos N (2008) Background subtraction in highly dynamic scenes. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–6
Chang C, Hsieh K, Chiang M, Wu J (2010) Virtual spotlighted advertising for tennis videos. J Vis Commun Image Represent 21(7):595–612
Article Google Scholar
Wu B, Xu L, Liu G (2013) A visual attention model for news video. In: Proceedings of 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp 941–944
Feng W, Hu B (2008) Quaternion discrete cosine transform and its application in color template matching. In: Congress on Image and Signal Processing (CISP’08), vol 2, pp 252–256
Kim W, Jung C, Kim C (2011) Spatiotemporal saliency detection and its applications in static and dynamic scenes. IEEE Trans Circuits Syst Video Technol 21(4):446–456
Article MathSciNet Google Scholar
Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 2376–2383
Cheng M, Zhang G, Mitra N, Huang X, Hu S (2011) Global contrast based salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 409–416
Pritch Y, Krahenbuhl P, Perazzi F, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 733–740
Ho S, Wechsler H (2007) Detecting changes in unlabeled data streams using martingale. In: Proceedings of the 20th international joint conference on Artifical intelligence, pp 1912–1917
Engel S, Zhang X, Wandell B et al (1997) Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 388(6637):68–70
Article Google Scholar
Bian P, Zhang L (2010) Visual saliency: a biologically plausible contourlet-like frequency domain approach. Cogn neurodyn 4(3):189–198
Article Google Scholar
Ell T, Sangwine S (2007) Hypercomplex fourier transforms of color images. IEEE Trans Image Process 16(1):22–35
Article MATH MathSciNet Google Scholar
Liu C (2009) Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. dissertation, Massachusetts Institute of Technology
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8
Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 13–18
Wu B, Xu L, Zeng L, Wang Z, Wang Y (2013) A unified framework for spatiotemporal salient region detection. EURASIP J Image Video Process 2013(1):1–16
Article Google Scholar
Borji A, Itti L (2012) Exploiting local and global patch rarities for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 478–485
Cevikalp H, Triggs W (2012) Efficient object detection using cascades of nearest convex model classifiers. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)
Everingham M, Sivic J, Zisserman A (2006) “hello! my name is... buffy” –automatic naming of characters in tv video. In: BMVC
Benoit A, Caplier A (2010) Fusing bio-inspired vision data for simplified high level scene interpretation: application to face motion analysis. Comput Vis Image Underst 114(7):774–789
Article Google Scholar
Zhang D, Qi W, Zhang H (2001) A new shot boundary detection algorithm. Advances in Multimedia Information Processing-PCM 2001, pp 63–70
Sun J, Kang SB, Xu Z-B, Tang X, Shum H-Y (2007) Flash cut: Foreground extraction with flash and no-flash image pairs. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8
Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp 155–162
Itti L (2008) Crcns data sharing: eye movements during free-viewing of natural videos. In: Collaborative research in computational neuroscience annual meeting
Hadizadeh H, Enriquez M, Bajic I (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903
Article MathSciNet Google Scholar
Borji A (2012) Boosting bottom-up and top-down visual features for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)
Miyazato K, Kimura A, Takagi S, Yamato J (2009) Real-time estimation of human visual attention with dynamic bayesian network and mcmc-based particle filter. In: Proc. IEEE Conf. Multimedia and Expo. (ICME), pp 250–257

Download references

Author information

Authors and Affiliations

School of Electronic Engineering, University of Electronic Science and Technology of China, Chengdu, 610073, China
Bo Wu & Linfeng Xu
College of Physics and Information Engineering, Henan Normal University, Xinxiang, 453007, China
Bo Wu

Authors

Bo Wu
View author publications
You can also search for this author in PubMed Google Scholar
Linfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bo Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, B., Xu, L. Integrating bottom-up and top-down visual stimulus for saliency detection in news video. Multimed Tools Appl 73, 1053–1075 (2014). https://doi.org/10.1007/s11042-013-1530-9

Download citation

Published: 05 June 2013
Issue Date: December 2014
DOI: https://doi.org/10.1007/s11042-013-1530-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Integrating bottom-up and top-down visual stimulus for saliency detection in news video

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

PVT v2: Improved baselines with Pyramid Vision Transformer

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Integrating bottom-up and top-down visual stimulus for saliency detection in news video

Abstract

Access this article

Similar content being viewed by others

Attention mechanisms in computer vision: A survey

PVT v2: Improved baselines with Pyramid Vision Transformer

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation