Multimedia Tools and Applications

, Volume 73, Issue 3, pp 1053–1075 | Cite as

Integrating bottom-up and top-down visual stimulus for saliency detection in news video

  • Bo WuEmail author
  • Linfeng Xu


This paper presents a new attention model for detecting visual saliency in news video. In the proposed model, bottom-up (low level) features and top-down (high level) factors are used to compute bottom-up saliency and top-down saliency respectively. Then, the two saliency maps are fused after a normalization operation. In the bottom-up attention model, we use quaternion discrete cosine transform in multi-scale and multiple color spaces to detect static saliency. Meanwhile, multi-scale local motion and global motion conspicuity maps are computed and integrated into motion saliency map. To effectively suppress the background motion noise, a simple histogram of average optical flow is adopted to calculate motion contrast. Then, the bottom-up saliency map is obtained by combining the static and motion saliency maps. In the top-down attention model, we utilize high level stimulus in news video, such as face, person, car, speaker, and flash, to generate the top-down saliency map. The proposed method has been extensively tested by using three popular evaluation metrics over two widely used eye-tracking datasets. Experimental results demonstrate the effectiveness of our method in saliency detection of news videos compared to several state-of-the-art methods.


Visual saliency Bottom-up attention Top-down attention News video 


  1. 1.
    Frintrop S, Rome E, Christensen H (2010) Computational visual attention systems and their cognitive foundations: a survey. ACM Trans Appl Percept 7(1):1–39CrossRefGoogle Scholar
  2. 2.
    Li H, Ngan KN (2011) Learning to extract focused objects from low dof images. IEEE Trans Circuits Syst Video Technol 21(11):1571–1580CrossRefzbMATHGoogle Scholar
  3. 3.
    Guo C, Zhang L (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198CrossRefMathSciNetGoogle Scholar
  4. 4.
    Rapantzikos K, Avrithis Y (2005) An enhanced spatiotemporal visual attention model for sports video analysis. In: International Workshop on content-based Multimedia indexing (CBMI)Google Scholar
  5. 5.
    Lang C, Xu D, Jiang Y (2009) Shot type classification in sports video based on visual attention. In: Proceedings of the International Conference on Computational Intelligence and Natural Computing, pp 336–339Google Scholar
  6. 6.
    Li H, Ngan KN, Liu Q (2009) Faceseg: Automatic face segmentation for real-time video. IEEE Trans Multimedia 11(1):77–88CrossRefGoogle Scholar
  7. 7.
    Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 20(11):1254–1259CrossRefGoogle Scholar
  8. 8.
    Oliva A, Torralba A, Castelhano MS, Henderson JM (2003) Top-down control of visual attention in object detection. In: Proc. Int. Conf. Image Processing (ICIP), vol 1. IEEE, pp 253–256Google Scholar
  9. 9.
    Le Meur O, Le Callet P, Barba D, Thoreau D (2006) A coherent computational approach to model bottom-up visual attention. IEEE Trans Pattern Anal Mach Intell 28(5):802–817CrossRefGoogle Scholar
  10. 10.
    Hou X, Zhang L (2007) Saliency detection: a spectral residual approach. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–8Google Scholar
  11. 11.
    Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in Neural Information Processing Systems. IEEE, pp 545–552Google Scholar
  12. 12.
    Li H, Ngan KN (2008) Saliency model based face segmentation in head-and-shoulder video sequences. J Vis Commun Image Represent 19(5):320–333CrossRefGoogle Scholar
  13. 13.
    Bur A, Hügli H (2007) Optimal cue combination for saliency computation: a comparison with human vision. In: Nature Inspired Problem-Solving Methods in Knowledge Engineering, pp 109–118Google Scholar
  14. 14.
    Gao D, Mahadevan V, Vasconcelos N (2008) On the plausibility of the discriminant center-surround hypothesis for visual saliency. J Vis 8(7):1–18CrossRefGoogle Scholar
  15. 15.
    Cerf M, Harel J, Einhäuser W, Koch C (2008) Predicting human gaze using low-level saliency combined with face detection. In: Advances in Neural Information Processing Systems, vol 20Google Scholar
  16. 16.
    Judd T, Ehinger K, Durand F, Torralba A (2009) Larning to predict where humans look. In: Proc. Int. Conf. Comput. Vision (ICCV), pp 2106–2113Google Scholar
  17. 17.
    Bruce ND, Tsotsos JK (2009) Saliency, attention, and visual search: an information theoretic approach. J Vis 9(3):1–24CrossRefGoogle Scholar
  18. 18.
    Li H, Ngan KN (2011) A co-saliency model of image pairs. IEEE Trans Image Process 20(12):3365–3375CrossRefMathSciNetGoogle Scholar
  19. 19.
    Hou X, Harel J, Koch C (2012) Image signature: highlighting sparse salient regions. IEEE Trans Pattern Anal Mach Intell 34(1):194–201CrossRefGoogle Scholar
  20. 20.
    Schauerte B, Stiefelhagen R, Fraunhofer I (2012) Predicting human gaze using quaternion dct image signature saliency and face detection. In: Proc. Workshop on the Applications of Computer Vision, pp 137–144Google Scholar
  21. 21.
    Luo W, Li H, Liu G (2012) Global salient information maximization for saliency detection. Signal Process: Image Commun 27(3):238–248MathSciNetGoogle Scholar
  22. 22.
    Toet A (2011) Computational versus psychophysical bottom-up image saliency: a comparative evaluation study. IEEE Trans Pattern Anal Mach Intell 33(11):2131–2146CrossRefGoogle Scholar
  23. 23.
    Itti L, Koch C (2001) Computational modelling of visual attention. Nat Rev Neuro 2(3):194–203CrossRefGoogle Scholar
  24. 24.
    You J, Liu G, Sun L, Li H (2007) A multiple visual models based perceptive analysis framework for multilevel video summarization. IEEE Trans Circuits Syst Video Technol 17(3):273–285CrossRefGoogle Scholar
  25. 25.
    Itti L, Baldi P (2009) Bayesian surprise attracts human attention. Vision Res 49(10):1295–1306CrossRefGoogle Scholar
  26. 26.
    Chen D-Y (2011) Modelling salient visual dynamics in videos. Multimed Tools Appl 53(1):271–284CrossRefGoogle Scholar
  27. 27.
    Mahadevan V, Vasconcelos N (2008) Background subtraction in highly dynamic scenes. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 1–6Google Scholar
  28. 28.
    Chang C, Hsieh K, Chiang M, Wu J (2010) Virtual spotlighted advertising for tennis videos. J Vis Commun Image Represent 21(7):595–612CrossRefGoogle Scholar
  29. 29.
    Wu B, Xu L, Liu G (2013) A visual attention model for news video. In: Proceedings of 2013 IEEE International Symposium on Circuits and Systems (ISCAS), pp 941–944Google Scholar
  30. 30.
    Feng W, Hu B (2008) Quaternion discrete cosine transform and its application in color template matching. In: Congress on Image and Signal Processing (CISP’08), vol 2, pp 252–256Google Scholar
  31. 31.
    Kim W, Jung C, Kim C (2011) Spatiotemporal saliency detection and its applications in static and dynamic scenes. IEEE Trans Circuits Syst Video Technol 21(4):446–456CrossRefMathSciNetGoogle Scholar
  32. 32.
    Goferman S, Zelnik-Manor L, Tal A (2010) Context-aware saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 2376–2383Google Scholar
  33. 33.
    Cheng M, Zhang G, Mitra N, Huang X, Hu S (2011) Global contrast based salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 409–416Google Scholar
  34. 34.
    Pritch Y, Krahenbuhl P, Perazzi F, Hornung A (2012) Saliency filters: contrast based filtering for salient region detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR), pp 733–740Google Scholar
  35. 35.
    Ho S, Wechsler H (2007) Detecting changes in unlabeled data streams using martingale. In: Proceedings of the 20th international joint conference on Artifical intelligence, pp 1912–1917Google Scholar
  36. 36.
    Engel S, Zhang X, Wandell B et al (1997) Colour tuning in human visual cortex measured with functional magnetic resonance imaging. Nature 388(6637):68–70CrossRefGoogle Scholar
  37. 37.
    Bian P, Zhang L (2010) Visual saliency: a biologically plausible contourlet-like frequency domain approach. Cogn neurodyn 4(3):189–198CrossRefGoogle Scholar
  38. 38.
    Ell T, Sangwine S (2007) Hypercomplex fourier transforms of color images. IEEE Trans Image Process 16(1):22–35CrossRefzbMATHMathSciNetGoogle Scholar
  39. 39.
    Liu C (2009) Beyond pixels: exploring new representations and applications for motion analysis. Ph.D. dissertation, Massachusetts Institute of TechnologyGoogle Scholar
  40. 40.
    Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8Google Scholar
  41. 41.
    Ballas N, Delezoide B, Prêteux F (2011) Trajectories based descriptor for dynamic events annotation. In: Proceedings of the 2011 joint ACM workshop on Modeling and representing events. ACM, pp 13–18Google Scholar
  42. 42.
    Wu B, Xu L, Zeng L, Wang Z, Wang Y (2013) A unified framework for spatiotemporal salient region detection. EURASIP J Image Video Process 2013(1):1–16CrossRefGoogle Scholar
  43. 43.
    Borji A, Itti L (2012) Exploiting local and global patch rarities for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 478–485Google Scholar
  44. 44.
    Cevikalp H, Triggs W (2012) Efficient object detection using cascades of nearest convex model classifiers. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)Google Scholar
  45. 45.
    Everingham M, Sivic J, Zisserman A (2006) “hello! my name is... buffy” –automatic naming of characters in tv video. In: BMVCGoogle Scholar
  46. 46.
    Benoit A, Caplier A (2010) Fusing bio-inspired vision data for simplified high level scene interpretation: application to face motion analysis. Comput Vis Image Underst 114(7):774–789CrossRefGoogle Scholar
  47. 47.
    Zhang D, Qi W, Zhang H (2001) A new shot boundary detection algorithm. Advances in Multimedia Information Processing-PCM 2001, pp 63–70Google Scholar
  48. 48.
    Sun J, Kang SB, Xu Z-B, Tang X, Shum H-Y (2007) Flash cut: Foreground extraction with flash and no-flash image pairs. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR). IEEE, pp 1–8Google Scholar
  49. 49.
    Bruce N, Tsotsos J (2006) Saliency based on information maximization. In: Advances in Neural Information Processing Systems, pp 155–162Google Scholar
  50. 50.
    Itti L (2008) Crcns data sharing: eye movements during free-viewing of natural videos. In: Collaborative research in computational neuroscience annual meetingGoogle Scholar
  51. 51.
    Hadizadeh H, Enriquez M, Bajic I (2012) Eye-tracking database for a set of standard video sequences. IEEE Trans Image Process 21(2):898–903CrossRefMathSciNetGoogle Scholar
  52. 52.
    Borji A (2012) Boosting bottom-up and top-down visual features for saliency detection. In: Proc. IEEE Conf. Comput. Vision Patt. Recog. (CVPR)Google Scholar
  53. 53.
    Miyazato K, Kimura A, Takagi S, Yamato J (2009) Real-time estimation of human visual attention with dynamic bayesian network and mcmc-based particle filter. In: Proc. IEEE Conf. Multimedia and Expo. (ICME), pp 250–257Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.School of Electronic EngineeringUniversity of Electronic Science and Technology of ChinaChengduChina
  2. 2.College of Physics and Information EngineeringHenan Normal UniversityXinxiangChina

Personalised recommendations