Understanding Interactions and Guiding Visual Surveillance by Tracking Attention

  • Ian Reid
  • Ben Benfold
  • Alonso Patron
  • Eric Sommerlade
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6468)


The central tenet of this paper is that by determining where people are looking, other tasks involved with understanding and interrogating a scene are simplified. To this end we describe a fully automatic method to determine a person’s attention based on real-time visual tracking of their head and a coarse classification of their head pose. We estimate the head pose, or coarse gaze, using randomised ferns with decision branches based on both histograms of gradient orientations and colour based features. We use the coarse gaze for three applications to demonstrate its value: (i) we show how by building static and temporally varying maps of areas where people look we are able to identify interesting regions; (ii) we show how by determining the gaze of people in the scene we can more effectively control a multi-camera surveillance system to acquire faces for identification; (iii) we show how by identifying where people are looking we can more effectively classify human interactions.


Head Orientation Visual Surveillance Active Camera Body Detection Expect Information Gain 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Robertson, N., Reid, I., Brady, J.: What are you looking at? gaze estimation in medium-scale images. In: Proc. HAREM Workshop (in assoc. with BMVC) (2005)Google Scholar
  2. 2.
    Robertson, N., Reid, I.D.: Estimating gaze direction from low-resolution faces in video. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 402–415. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Ono, Y., Okabe, T., Sato, Y.: Gaze estimation from low resolution images. In: Chang, L.-W., Lie, W.-N. (eds.) PSIVT 2006. LNCS, vol. 4319, pp. 178–188. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    Benfold, B., Reid, I.: Colour invariant head pose classification in low resolution video. In: Proceedings of the 19th British Machine Vision Conference (2008)Google Scholar
  5. 5.
    Benfold, B., Reid, I.: Guiding visual surveillance by tracking human attention. In: Proc. BMVC (2009)Google Scholar
  6. 6.
    Orozco, J., Gong, S., Xiang, T.: Head pose classification in crowded scenes. In: Proc. BMVC (2009)Google Scholar
  7. 7.
    Sommerlade, E., Benfold, B., Reid, I.: Gaze directed camera control for face image acquisition. under review. In: Intl Conf. on Robotics and Automation (2011)Google Scholar
  8. 8.
    Patron, A., Marszalek, M., Zisserman, A., Reid, I.: High five: Recognising human interactions in tv shows. In: Proc. BMVC (2010)Google Scholar
  9. 9.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE CVPR, vol. 2, pp. 886–893 (2005)Google Scholar
  10. 10.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: IJCAI, pp. 674–679 (1981)Google Scholar
  11. 11.
    Prisacariu, V., Reid, I.: fastHOG - a real-time GPU implementation of HOG. Technical Report 2310/09, Dept of Engineering Science, Oxford University (2009)Google Scholar
  12. 12.
    Lepetit, V., Lagger, P., Fua, P.: Randomized trees for real-time keypoint recognition. In: CVPR, vol. 2, pp. 775–781. IEEE Computer Society, Los Alamitos (2005)Google Scholar
  13. 13.
    Sommerlade, E., Reid, I.: Probabilistic surveillance with multiple active cameras. In: Proc. IEEE Int’l Conf. on Robotics and Automation, pp. 440–445 (2010)Google Scholar
  14. 14.
    Tsochantaridis, I., Hofman, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proc. ICML (2004)Google Scholar
  15. 15.
    Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Pose search: retrieving people using their pose. In: Proc. IEEE CVPR (2009)Google Scholar
  16. 16.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: Proc. IEEE ICCV (2009)Google Scholar
  17. 17.
    Joachims, T., Finley, T., Yu, C.: Cutting plane training of structural svms. Machine Learning 77, 27–59 (2009)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ian Reid
    • 1
  • Ben Benfold
    • 1
  • Alonso Patron
    • 1
  • Eric Sommerlade
    • 1
  1. 1.Department of Engineering ScienceUniversity of OxfordOxfordUK

Personalised recommendations