Estimating the Lecturer’s Head Pose in Seminar Scenarios – A Multi-view Approach

  • Michael Voit
  • Kai Nickel
  • Rainer Stiefelhagen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3869)


In this paper, we present a system to track the horizontal head orientation of a lecturer in a smart seminar room, which is equipped with several cameras. We automatically detect and track the face of the lecturer and use neural networks to classify his or her face orientation in each camera view. By combining the single estimates of the speaker’s head orientation from multiple cameras into one joint hypothesis, we improve overall head pose estimation accuracy. We conducted experiments on annotated recordings from real seminars. Using the proposed fully automatic system we are able to correctly determine the lecturer’s head pose in 59% of the time and for 8 orientation classes. In 92% of the time, the correct pose class or a neighbouring pose class (i.e. a 45 degree error) were estimated.


Facial Image Frontal View Camera View Head Orientation Multiple Camera 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ba, S.O., Obodez, J.-M.: A probabilistic framework for joint head tracking and pose estimation. In: Proceedings of the 17th International Conference on Pattern Recognition (2004)Google Scholar
  2. 2.
    Gee, A.H., Cipolla, R.: Non-intrusive gaze tracking for human-computer interaction. In: Proceedings of Mechatronics and Machine Vision in Practise, pp. 112–117 (1994)Google Scholar
  3. 3.
    Horprasert, T., Yacoob, Y., Davis, L.S.: Computing 3-d head orientation from a monocular image sequence. In: Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (1996)Google Scholar
  4. 4.
    Lienhart, R., Maydt, J.: An extended set of haar-like features for rapid object detection. In: Proceedings of the IEEE International Conference on Image Processing (2002)Google Scholar
  5. 5.
    Nickel, K., Gehrig, T., Stiefelhagen, R., McDonough, J.: A joint particle filter for audio-visual speaker tracking. In: International Conference on Multimodal Interfaces ICMI 2005, Trento, Italy (2005)Google Scholar
  6. 6.
    Stauffer, C., Grimson, W.: Adaptive background mixture models for real-time tracking. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 246–252 (1999)Google Scholar
  7. 7.
    Stiefelhagen, R., Yang, J., Waibel, A.: A modelbased gaze tracking system. In: Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pp. 304–310 (1996)Google Scholar
  8. 8.
    Stiefelhagen, R., Yang, J., Waibel, A.: Simultaneous tracking of head poses in a panoramic view. In: International Conference on Pattern Recognition (2000)Google Scholar
  9. 9.
    Tian, Y.-L., Brown, L., Connell, J., Pankanti, S., Hampapur, A., Senior, A., Bolle, R.: Absolute head pose estimation from overhead wide-angle cameras. In: IEEE International Workshop on Analysis and Modeling of Faces and Gestures (2003)Google Scholar
  10. 10.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2001)Google Scholar
  11. 11.
    Voit, M., Nickel, K., Stiefelhagen, R.: Multi-view head pose estimation using neural networks. In: Second Workshop on Face Processing in Video (FPiV 2005); in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV 2005), May 9-11, 2005, Victoria, BC, Canada (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Michael Voit
    • 1
  • Kai Nickel
    • 1
  • Rainer Stiefelhagen
    • 1
  1. 1.Interactive Systems LabUniversität Karlsruhe (TH)Germany

Personalised recommendations