Neural Network-Based Head Pose Estimation and Multi-view Fusion

  • Michael Voit
  • Kai Nickel
  • Rainer Stiefelhagen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4122)


In this paper, we present two systems that were used for head pose estimation during the CLEAR06 Evaluation. We participated in two tasks: (1) estimating both pan and tilt orientation on synthetic, high resolution head captures, (2) estimating horizontal head orientation only on real seminar recordings that were captured with multiple cameras from different viewing angles. In both systems, we used a neural network to estimate the persons’ head orientation. In case of seminar recordings, a Bayes filter framework is further used to provide a statistical fusion scheme, integrating every camera view into one joint hypothesis. We achieved a mean error of 12.3° on horizontal head orientation estimation, in the monocular, high resolution task. Vertical orientation performed with 12.77° mean error. In case of the multi-view seminar recordings, our system could correctly identify head orientation in 34.9% (one of eight classes). If neighbouring classes were allowed, even 72.9% of the frames were correctly classified.


Camera View Head Orientation Joint Hypothesis Tilt Orientation Head Pose Estimation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pointing’04 icpr workshop.
  2. 2.
    Ba, S.O., Obodez, J.-M.: A probabilistic framework for joint head tracking and pose estimation. In: Proceedings of the 17th International Conference on Pattern Recognition (2004)Google Scholar
  3. 3.
    Gee, A.H., Cipolla, R.: Non-intrusive gaze tracking for human-computer interaction. In: Proceedings of Mechatronics and Machine Vision in Practise, pp. 112–117 (1994)Google Scholar
  4. 4.
    Horprasert, T., Yacoob, Y., Davis, L.S.: Computing 3-d head orientation from a monocular image sequence. In: Proceedings of the 2nd International Conference on Automatic Face and Gesture Recognition (1996)Google Scholar
  5. 5.
    Stiefelhagen, R., Yang, J., Waibel, A.: A modelbased gaze tracking system. In: Proceedings of the IEEE International Joint Symposia on Intelligence and Systems, pp. 304–310. IEEE Computer Society Press, Los Alamitos (1996)CrossRefGoogle Scholar
  6. 6.
    Stiefelhagen, R., Yang, J., Waibel, A.: Simultaneous tracking of head poses in a panoramic view. In: International Conference on Pattern Recognition (2000)Google Scholar
  7. 7.
    Voit, M., Nickel, K., Stiefelhagen, R.: Multi-view head pose estimation using neural networks. In: Second Workshop on Face Processing in Video (FPiV’05), in Proceedings of Second Canadian Conference on Computer and Robot Vision (CRV’05), Victoria, BC, Canada, 9-11 May, pp. 9–11 (2005)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2007

Authors and Affiliations

  • Michael Voit
    • 1
  • Kai Nickel
    • 1
  • Rainer Stiefelhagen
    • 1
  1. 1.Interactive Systems Lab, Universität Karlsruhe (TH)Germany

Personalised recommendations