Skip to main content
Log in

Integrating bottom-up and top-down visual stimulus for saliency detection in news video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Frintrop S, Rome E, Christensen H (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept 7(1):1–39

    Article  Google Scholar 

  2. Li H, Ngan KN (2011) Learning to extract focused objects from low dof images. IEEE Trans Circuits Syst Video Technol 21(11):1571–1580

    Article  MATH  Google Scholar 

  3. Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198

    Article  MathSciNet  Google Scholar 

  4. Rapantzikos K, Avrithis Y (2005) An enhanced spatiotemporal visual attention model for sports video analysis. In: International Workshop on content-based Multimedia indexing (CBMI)

  5. Lang C, Xu D, Jiang Y (2009) Shot type classification in sports video based on visual attention. In: Proceedings of the International Conference on Computational Intelligence and Natural Computing, pp 336–339

  6. Li H, Ngan KN, Liu Q (2009) Faceseg: Automatic face segmentation for real-time video. IEEE Trans Multimedia 11(1):77–88

    Article  Google Scholar 

  7. Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259

    Article  Google Scholar 

  8. Oliva A, Torralba A, Castelhano MS, Henderson JM (2003) Top-down control of visual attention in object detection. In: Proc. Int. Conf. Image Processing (ICIP), vol 1. IEEE, pp 253–256

  9. Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817

    Article  Google Scholar 

  10. Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–8

  11. Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems. IEEE, pp 545–552

  12. Li H, Ngan KN (2008) Saliency model based face segmentation in head-and-shoulder video sequences. J Vis Commun Image Represent 19(5):320–333

    Article  Google Scholar 

  13. Bur A, Hügli H (2007) Optimal cue combination for saliency computation: a comparison with human vision. In: Nature Inspired Problem-Solving Methods in Knowledge Engineering, pp 109–118

  14. Gao D, Mahadevan V, Vasconcelos N (2008) On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis 8(7):1–18

    Article  Google Scholar 

  15. Cerf M, Harel J, Einhäuser W, Koch C (2008) Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, vol 20

  16. Judd T, Ehinger K, Durand F, Torralba A (2009) Larning to predict where humans look. In: Proc. Int. Conf. Comput. Vision (ICCV), pp 2106–2113

  17. Bruce ND, Tsotsos JK (2009) Saliency, attention, and visual search: an information theoretic approach. J Vis 9(3):1–24

    Article  Google Scholar 

  18. Li H, Ngan KN (2011) A co-saliency model of image pairs. IEEE Trans Image Process 20(12):3365–3375

    Article  MathSciNet  Google Scholar 

  19. Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201

    Article  Google Scholar 

  20. Schauerte B, Stiefelhagen R, Fraunhofer I (2012) Predicting human gaze using quaternion dct image signature saliency and face detection. In: Proc. Workshop on the Applications of Computer Vision, pp 137–144

  21. Luo W, Li H, Liu G (2012) Global salient information maximization for saliency detection. Signal Process: Image Commun 27(3):238–248

    MathSciNet  Google Scholar 

  22. Toet A (2011) Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans Pattern Anal Mach Intell 33(11):2131–2146

    Article  Google Scholar 

  23. Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neuro 2(3):194–203

    Article  Google Scholar 

  24. You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Video Technol 17(3):273–285

    Article  Google Scholar 

  25. Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306

    Article  Google Scholar 

  26. Chen D-Y (2011) Modelling salient visual dynamics in videos. Multimed Tools Appl 53(1):271–284

    Article  Google Scholar 

  27. Mahadevan V, Vasconcelos N (2008) Background subtraction in highly dynamic scenes. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–6

  28. Chang C, Hsieh K, Chiang M, Wu J (2010) Virtual spotlighted advertising for tennis videos. J Vis Commun Image Represent 21(7):595–612

    Article  Google Scholar 

  29. Wu B, Xu L, Liu G (2013) A visual attention model for news video. In: Proceedings of 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp 941–944

  30. Feng W, Hu B (2008) Quaternion discrete cosine transform and its application in color template matching. In: Congress on Image and Signal Processing (CISP’08), vol 2, pp 252–256

  31. Kim W, Jung C, Kim C (2011) Spatiotemporal saliency detection and its applications in static and dynamic scenes. IEEE Trans Circuits Syst Video Technol 21(4):446–456

    Article  MathSciNet  Google Scholar 

  32. Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 2376–2383

  33. Cheng M, Zhang G, Mitra N, Huang X, Hu S (2011) Global contrast based salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 409–416

  34. Pritch Y, Krahenbuhl P, Perazzi F, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 733–740

  35. Ho S, Wechsler H (2007) Detecting changes in unlabeled data streams using martingale. In: Proceedings of the 20th international joint conference on Artifical intelligence, pp 1912–1917

  36. Engel S, Zhang X, Wandell B et al (1997) Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 388(6637):68–70

    Article  Google Scholar 

  37. Bian P, Zhang L (2010) Visual saliency: a biologically plausible contourlet-like frequency domain approach. Cogn neurodyn 4(3):189–198

    Article  Google Scholar 

  38. Ell T, Sangwine S (2007) Hypercomplex fourier transforms of color images. IEEE Trans Image Process 16(1):22–35

    Article  MATH  MathSciNet  Google Scholar 

  39. Liu C (2009) Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. dissertation, Massachusetts Institute of Technology

  40. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8

  41. Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 13–18

  42. Wu B, Xu L, Zeng L, Wang Z, Wang Y (2013) A unified framework for spatiotemporal salient region detection. EURASIP J Image Video Process 2013(1):1–16

    Article  Google Scholar 

  43. Borji A, Itti L (2012) Exploiting local and global patch rarities for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 478–485

  44. Cevikalp H, Triggs W (2012) Efficient object detection using cascades of nearest convex model classifiers. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)

  45. Everingham M, Sivic J, Zisserman A (2006) “hello! my name is... buffy” –automatic naming of characters in tv video. In: BMVC

  46. Benoit A, Caplier A (2010) Fusing bio-inspired vision data for simplified high level scene interpretation: application to face motion analysis. Comput Vis Image Underst 114(7):774–789

    Article  Google Scholar 

  47. Zhang D, Qi W, Zhang H (2001) A new shot boundary detection algorithm. Advances in Multimedia Information Processing-PCM 2001, pp 63–70

  48. Sun J, Kang SB, Xu Z-B, Tang X, Shum H-Y (2007) Flash cut: Foreground extraction with flash and no-flash image pairs. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8

  49. Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp 155–162

  50. Itti L (2008) Crcns data sharing: eye movements during free-viewing of natural videos. In: Collaborative research in computational neuroscience annual meeting

  51. Hadizadeh H, Enriquez M, Bajic I (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903

    Article  MathSciNet  Google Scholar 

  52. Borji A (2012) Boosting bottom-up and top-down visual features for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)

  53. Miyazato K, Kimura A, Takagi S, Yamato J (2009) Real-time estimation of human visual attention with dynamic bayesian network and mcmc-based particle filter. In: Proc. IEEE Conf. Multimedia and Expo. (ICME), pp 250–257

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bo Wu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, B., Xu, L. Integrating bottom-up and top-down visual stimulus for saliency detection in news video. Multimed Tools Appl 73, 1053–1075 (2014). https://doi.org/10.1007/s11042-013-1530-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-013-1530-9

Keywords

Navigation