An Integrated Approach to Visual Attention Modeling for Saliency Detection in Videos

Part of the Advances in Pattern Recognition book series (ACVPR)


In this chapter, we present a framework to learn and predict regions of interest in videos, based on human eye movements. In our approach, the eye gaze information of several users are recorded as they watch videos that are similar, and belong to a particular application domain. This information is used to train a classifier to learn low-level video features from regions that attracted the visual attention of users. Such a classifier is combined with vision-based approaches to provide an integrated framework to detect salient regions in videos. Till date, saliency prediction has been viewed from two different perspectives, namely visual attention modeling and spatiotemporal interest point detection. These approaches have largely been vision-based. They detect regions having a predefined set of characteristics such as complex motion or high contrast, for all kinds of videos. However, what is ‘interesting’ varies from one application to another. By learning features of regions that capture the attention of viewers while watching a video, we aim to distinguish those that are actually salient in the given context, from the rest. The integrated approach ensures that both regions with anticipated content (top–down attention) and unanticipated content (bottom–up attention) are predicted by the proposed framework as salient. In our experiments with news videos of popular channels, the results show a significant improvement in the identification of relevant salient regions in such videos, when compared with existing approaches.


Visual Attention Interest Point Salient Region Saliency Detection News Video 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatiotemporal features. In: 2nd Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005) Google Scholar
  2. 2.
    Duchowski, A.: A breadth-first survey of eye-tracking applications. Behav. Res. Methods Instrum. Comput. 34(4), 455–470 (2002) CrossRefGoogle Scholar
  3. 3.
    Findlay, J.M., Walker, R., Kentridge, R.W.: Eye Movement Research. Elsevier, Amsterdam (1995) Google Scholar
  4. 4.
    Gao, D., Vasconcelos, N.: Discriminant saliency for visual recognition from cluttered scenes. Adv. Neural Inf. Process. Syst. 17, 481–488 (2005) Google Scholar
  5. 5.
    Gao, D., Vasconcelos, N.: Bottom–up saliency is a discriminant process. In: IEEE International Conference on Computer Vision (2007) Google Scholar
  6. 6.
    Gao, D., Han, S., Vasconcelos, N.: Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(6), 989–1005 (2009) CrossRefGoogle Scholar
  7. 7.
    Granka, L., Joachims, T., Gay, G.: Eye-tracking analysis of user behavior in WWW search. In: Proc. of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 478–479. ACM, New York (2004) Google Scholar
  8. 8.
    Guo, C., Ma, Q., Zhang, L.: Spatiotemporal saliency detection using phase spectrum of quaternion fourier transform. In: Proc. of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  9. 9.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: Alvey Vision Conference, vol. 15, p. 50 (1988) Google Scholar
  10. 10.
    Hou, X., Zhang, L.: Saliency detection: a spectral residual approach. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR07. IEEE Computer Society, pp. 1–8. IEEE Comput. Soc., Los Alamitos (2007) CrossRefGoogle Scholar
  11. 11.
    Itti, L.: Models of bottom–up attention and saliency. In: Neurobiology of Attention, vol. 582. Elsevier, Amsterdam (2005) Google Scholar
  12. 12.
    Itti, L., Koch, C.: Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194–203 (2001) CrossRefGoogle Scholar
  13. 13.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 20(11), 1254–1259 (1998) CrossRefGoogle Scholar
  14. 14.
    Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision, ICCV (2009) Google Scholar
  15. 15.
    Kadir, T., Brady, M.: Scale saliency: a novel approach to salient feature and scale selection. In: International Conference on Visual Information Engineering, VIE 2003, pp. 25–28 (2003) Google Scholar
  16. 16.
    Ke, Y., Sukthankar, R., Hebert, M.: Efficient visual event detection using volumetric features. In: 10th IEEE International Conference on Computer Vision, ICCV 2005, vol. 1 (2005) Google Scholar
  17. 17.
    Kienzle, W., Wichmann, F., Scholkopf, B., Franz, M.: Learning an interest operator from human eye movements. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 17, p. 22 (2006) Google Scholar
  18. 18.
    Kienzle, W., Wichmann, F., Scholkopf, B., Franz, M.: A nonparametric approach to bottom–up visual saliency. Adv. Neural Inf. Process. Syst. 19, 689 (2007) Google Scholar
  19. 19.
    Kienzle, W., Scholkopf, B., Wichmann, F., Franz, M.: How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements. Lect. Notes Comput. Sci. 4713, 405 (2007) CrossRefGoogle Scholar
  20. 20.
    Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Hum. Neurobiol. 4(4), 219–227 (1985) Google Scholar
  21. 21.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005) MathSciNetCrossRefGoogle Scholar
  22. 22.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) Google Scholar
  23. 23.
    Liu, T., Sun, J., Zheng, N., Tang, X., Shum, H.: Learning to detect a salient object. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (2007) Google Scholar
  24. 24.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004) CrossRefGoogle Scholar
  25. 25.
    Mohanna, F., Mokhtarian, F.: Performance evaluation of corner detection algorithms under similarity and affine transforms. In: British Machine Vision Conference, BMVC (2001) Google Scholar
  26. 26.
    Nabney, I.T.: Netlab, corrected edn. Springer, Berlin (2001) Google Scholar
  27. 27.
    Navalpakkam, V., Itti, L.: An integrated model of top–down and bottom–up attention for optimal object detection. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 1–7 (2006) Google Scholar
  28. 28.
    Navalpakkam, V., Itti, L.: Optimal cue selection strategy. Adv. Neural Inf. Process. Syst. 18, 987 (2006) Google Scholar
  29. 29.
    Niebles, J., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial-temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008) CrossRefGoogle Scholar
  30. 30.
    Oikonomopoulos, A., Patras, I., Pantic, M.: Human action recognition with spatiotemporal salient points. IEEE Trans. Syst. Man Cybern., Part B 36(3), 710–719 (2006) CrossRefGoogle Scholar
  31. 31.
    Oliva, A., Torralba, A., Castelhano, M., Henderson, J.: Top–down control of visual attention in object detection. In: IEEE Proc. of the International Conference on Image Processing, vol. 1, pp. 253–256 (2003) Google Scholar
  32. 32.
    Oyekoya, O., Stentiford, F.: An eye tracking interface for image search. In: ETRA ’06: Proc. of the 2006 Symposium on Eye Tracking Research and Applications, New York, NY, USA, pp. 40–40. ACM, New York (2006) CrossRefGoogle Scholar
  33. 33.
    Oyekoya, O., Stentiford, F.: Perceptual image retrieval using eye movements. Int. J. Comput. Math. 84(9), 1379–1391 (2007) MathSciNetMATHCrossRefGoogle Scholar
  34. 34.
    Ruderman, D.: The statistics of natural images. Netw. Comput. Neural Syst. 5(4), 517–548 (1994) MATHCrossRefGoogle Scholar
  35. 35.
    Salojarvi, J., Kojo, I., Simola, J., Kaski, S.: Can relevance be inferred from eye movements in information retrieval. In: Proc. of WSOM, vol. 3, pp. 261–266 (2003) Google Scholar
  36. 36.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 3 (2004) Google Scholar
  37. 37.
    Scovanner, P., Ali, S., Shah, M.: A 3-dimensional sift descriptor and its application to action recognition. In: Proc. of the 15th International Conference on Multimedia, p. 360. ACM, New York (2007) Google Scholar
  38. 38.
    Srivastava, A., Lee, A., Simoncelli, E., Zhu, S.: On advances in statistical modeling of natural images. J. Math. Imaging Vis. 18(1), 17–33 (2003) MathSciNetMATHCrossRefGoogle Scholar
  39. 39.
    Stentiford, F.: An estimator for visual attention through competitive novelty with application to image compression. In: Picture Coding Symposium, pp. 25–27 (2001) Google Scholar
  40. 40.
    Stentiford, F.: An attention based similarity measure with application to content based information retrieval. In: Storage and Retrieval for Media Databases, pp. 20–24 (2003) Google Scholar
  41. 41.
    Torralba, A.: Modeling global scene factors in attention. J. Opt. Soc. Am. A 20(7), 1407–1418 (2003) CrossRefGoogle Scholar
  42. 42.
    Treisman, A., Gelade, G.: A feature-integration theory of attention. Cogn. Psychol. 12(1), 97–136 (1980) CrossRefGoogle Scholar
  43. 43.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple. In: Proc. of CVPR2001, vol. 1 (2001) Google Scholar
  44. 44.
    Wang, Z., Li, B.: A two-stage approach to saliency detection in images. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 965–968 (2008) Google Scholar
  45. 45.
    Ziou, D., Tabbone, S.: Edge detection techniques an overview. Int. J. Pattern Recognit. Image Anal. 8(4), 537–559 (1998) Google Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

There are no affiliations available

Personalised recommendations