Robust Stereoscopic Head Pose Estimation in Human-Computer Interaction and a Unified Evaluation Framework

  • Georg Layher
  • Hendrik Liebau
  • Robert Niese
  • Ayoub Al-Hamadi
  • Bernd Michaelis
  • Heiko Neumann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6978)


The automatic processing and estimation of view direction and head pose in interactive scenarios is an actively investigated research topic in the development of advanced human-computer or human-robot interfaces. Still, current state of the art approaches often make rigid assumptions concerning the scene illumination and viewing distance in order to achieve stable results. In addition, there is a lack of rigorous evaluation criteria to compare different computational vision approaches and to judge their flexibility. In this work, we make a step towards the employment of robust computational vision mechanisms to estimate the actor’s head pose and thus the direction of his focus of attention. We propose a domain specific mechanism based on learning to estimate stereo correspondences of image pairs. Furthermore, in order to facilitate the evaluation of computational vision results, we present a data generation framework capable of image synthesis under controlled pose conditions using an arbitrary camera setup with a free number of cameras. We show some computational results of our proposed mechanism as well as an evaluation based on the available reference data.


head pose estimation stereo image processing human computer interaction 


  1. [2007]
    Pentland, A.: Social Signal Processing. IEEE Signal Processing Magazine 24(4), 108–111 (2007)CrossRefGoogle Scholar
  2. [2008]
    Vinciarelli, A., Pantic, M., Bourlard, H., Pentland, A.: Social Signals, their Function, and Automatic Analysis: a Survey. In: Proceedings of the 10th international conference on Multimodal interfaces (ICMI 2008), pp. 61–68. ACM, New York (2008)CrossRefGoogle Scholar
  3. [2008]
    Corso, J.J., Ye, G., Burscbka, D., Hager, G.D.: A Practical Paradigm and Platform for Video-Based Human-Computer Interaction. IEEE Computer 41(5), 48–55 (2008)CrossRefGoogle Scholar
  4. [1996]
    Jacob, R.: Human-Computer Interaction: Input Devices. ACM Computing Surveys 28(1), 177–179 (1996)CrossRefGoogle Scholar
  5. [2004]
    Turk, M.: Computer Vision in the Interface. Commun. ACM. 47(1), 60–67 (2004)CrossRefGoogle Scholar
  6. [2009]
    Murphy-Chutorian, E., Trivedi, M.M.: Head Pose Estimation in Computer Vision: A Survey. IEEE Trans. on Pattern Analysis and Machine Intelligence 31(4), 607–626 (2009)CrossRefGoogle Scholar
  7. [1999]
    Riesenhuber, M., Poggio, T.: Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience 2(11), 1019–1025 (1999)CrossRefGoogle Scholar
  8. [2002]
    Ullman, S., Vidal-Naquet, M., Sali, E.: Visual Features of Intermediate Complexity and their Use in Classification. Nature Neuroscience 5(7), 682–687 (2002)Google Scholar
  9. [2008]
    Mutch, J., Lowe, D.G.: Object Class Recognition and Localization Using Sparse Features with Limited Receptive Fields. Int. J. Comput. Vision 80(1), 45–57 (2008)CrossRefGoogle Scholar
  10. [2005]
    Serre, T., Wolf, L., Poggio, T.: Object Recognition with Features Inspired by Visual Cortex. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), pp. 994–1000. IEEE Computer Society, Washington (2005)Google Scholar
  11. [1978]
    Bracewell, R.N.: The Fourier Transform and its Applications. McGraw-Hill, Columbus (1978)zbMATHGoogle Scholar
  12. [1959]
    Hubel, D.H., Wiesel, T.N.: Receptive Fields of Single Neurones in the Cat’s Striate Cortex. Journal of Physiology 148(3), 574–591 (1959)CrossRefGoogle Scholar
  13. [2011]
    Szeliski, R.: Computer Vision - Algorithms and Applications. Springer, London (2011)zbMATHGoogle Scholar
  14. [1982]
    Barnard, S.T., Fischler, M.A.: Computational Stereo. ACM Computing Surveys (CSUR) 14(4), 553–572 (1982)CrossRefGoogle Scholar
  15. [1988]
    Hannah, M.J.: Digital Stereo Image Matching Technique. In: Proc. XVIth ISPRS Congress (Int’l Soc. for Photogrammtery and Remote Sensing), Commission III, Kyoto, Japan, vol. XXVII, Part B3, pp. 280–293 (1988)Google Scholar
  16. [2005]
    Castrillón-Santana, M., Lorenzo-Navarro, J., Déniz-Suárez, O., Isern-González, J., Falcón-Martel, A.: Multiple Face Detection at Different Resolutions for Perceptual User Interfaces. In: Marques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3522, pp. 445–452. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  17. [2004]
    Viola, P., Jones, M.J.: Robust Real-Time Face Detection. Int. J. Comput. Vision 57(2), 137–154 (2004)CrossRefGoogle Scholar
  18. [2000]
    Phillips, P.J., Moon, H., Rizvi, S.A., Rauss, P.J.: The FERET Evaluation Methodology for Face-Recognition Algorithms. IEEE Trans. on Pattern Analysis and Machine Intelligence 22(1), 1090–1104 (2000)CrossRefGoogle Scholar
  19. [2004]
    Morency, L.-P., Darrell, T.: From Conversational Tooltips to Grounded Discourse: Head Pose Tracking in Interactive Dialog Systems. In: Proceedings of the 6th International Conference on Multimodal Interfaces, pp. 32–37. ACM, New York (2004)CrossRefGoogle Scholar
  20. [1993]
    Lades, M., Vorbrüggen, J.C., Buhmann, J., Lange, J., von der Malsburg, C., Wurtz, R.P., Konen, W.: Distortion Invariant Object Recognition in the Dynamic Link Architecture. IEEE Trans. on Computers 42(3), 300–311 (1993)CrossRefGoogle Scholar
  21. [2006]
    Voit, M., Nickel, K., Stiefelhagen, R.: A Bayesian Approach for Multi-View Head Pose Estimation. In: Proceedings of the 2006 IEEE International Conference Multisensor Fusion and Integration for Intelligent Systems (MFI 2006), pp. 31–34. IEEE Computer Society, Washington (2006)CrossRefGoogle Scholar
  22. [2006]
    Morency, L.-P., Christoudias, C.M., Darrell, T.: Recognizing Gaze Aversion Gestures in Embodied Conversational Discourse. In: Proceedings of the 8th International Conference on Multimodal Interfaces, ICMI 2006 (2006)Google Scholar
  23. [2007]
    Weidenbacher, U., Layher, G., Strauss, P.-M., Neumann, H.: A Comprehensive Head Pose and Gaze Database. In: 3rd IET International Conference on Intelligent Environments, IE 2007 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Georg Layher
    • 1
  • Hendrik Liebau
    • 1
  • Robert Niese
    • 2
  • Ayoub Al-Hamadi
    • 2
  • Bernd Michaelis
    • 2
  • Heiko Neumann
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmGermany
  2. 2.IESKOtto-von-Guericke University MagdeburgGermany

Personalised recommendations