Head Pose Estimation for Sign Language Video

  • Marcos Luzardo
  • Matti Karppa
  • Jorma Laaksonen
  • Tommi Jantunen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7944)


We address the problem of estimating three head pose angles in sign language video using the Pointing04 data set as training data. The proposed model employs facial landmark points and Support Vector Regression learned from the training set to identify yaw and pitch angles independently. A simple geometric approach is used for the roll angle. As a novel development, we propose to use the detected skin tone areas within the face bounding box as additional features for head pose estimation. The accuracy level of the estimators we obtain compares favorably with published results on the same data, but the smaller number of pose angles in our setup may explain some of the observed advantage.

We evaluated the pose angle estimators also against ground truth values from motion capture recording of a sign language video. The correlations for the yaw and roll angles exceeded 0.9 whereas the pitch correlation was slightly worse. As a whole, the results are very promising both from the computer vision and linguistic points of view.


Pitch Angle Support Vector Regression Motion Capture Roll Angle Sign Language Video 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Murphy-Chutorian, E., Trivedi, M.: Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 607–626 (2009)CrossRefGoogle Scholar
  2. 2.
    Wilbur, R.B.: Phonological and prosodic layering of nonmanuals in ASL. In: Emmorey, K., Lane, H. (eds.) The Signs of Language Revisited. An Anthology to Honor Ursula Bellugi and Edward Klima, pp. 215–244. Lawrence Erlbaum Associates, Mahwah (2000)Google Scholar
  3. 3.
    Pfau, R., Quer, J.: Nonmanuals: Their prosodic and grammatical roles. In: Brentari, D. (ed.) Sign Languages, pp. 381–402. Cambridge University Press, Cambridge (2010)CrossRefGoogle Scholar
  4. 4.
    Zeshan, U.: Hand, head and face: Negative constructions in sign languages. Linguistic Typology 8, 1–58 (2004)CrossRefGoogle Scholar
  5. 5.
    Ormel, E., Crasborn, O.: Prosodic correlates of sentences in signed languages: A literature review and suggestions for new types of studies. Sign Language Studies 12, 279–315 (2012)CrossRefGoogle Scholar
  6. 6.
    Uřičář, M., Franc, V., Hlaváč, V.: Detector of facial landmarks learned by the structured output SVM. In: Csurka, G., Braz, J. (eds.) VISAPP 2012: Proceedings of the 7th International Conference on Computer Vision Theory and Applications, vol. 1, pp. 547–556. SciTePress — Science and Technology Publications, Portugal (2012)Google Scholar
  7. 7.
    Smola, A., Schólkopf, B.: A tutorial on support vector regression. Statistics and Computing 14, 199–222 (2004)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Stiefelhagen, R.: Estimating head pose with neural networks — results on the Pointing04 ICPR workshop evaluation data. In: Proceedings of the ICPR Workshop on Visual Observation of Deictic Gestures (2004)Google Scholar
  9. 9.
    Gourier, N., Maisonnasse, J., Hall, D., Crowley, J.L.: Head pose estimation on low resolution images. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 270–280. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  10. 10.
    Wu, J., Pedersen, J., Putthividhya, D., Norgaard, D., Trivedi, M.: A two-level pose estimation framework using majority voting of gabor wavelets and bunch graph analysis. In: Proc. Pointing 2004 Workshop: Visual Observation of Deictic Gestures, Citeseer, pp. 4–12 (2004)Google Scholar
  11. 11.
    Cootes, T., Wheeler, G., Walker, K., Taylor, C.: View-based active appearance models. Image and Vision Computing 20, 657–664 (2002)CrossRefGoogle Scholar
  12. 12.
    Kanaujia, A., Huang, Y., Metaxas, D.: Tracking facial features using mixture of point distribution models. In: Kalra, P.K., Peleg, S. (eds.) ICVGIP 2006. LNCS, vol. 4338, pp. 492–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Li, S.Z., Fu, Q., Gu, L., Schölkopf, B., Cheng, Y., Zhang, H.: Kernel machine based learning for multi-view face detection and pose estimation. In: ICCV, pp. 674–679 (2001)Google Scholar
  14. 14.
    Whitehill, J., Movellan, J.R.: A discriminative approach to frame-by-frame head pose tracking. In: FG, pp. 1–7. IEEE (2008)Google Scholar
  15. 15.
    Li, Y., Gong, S., Sherrah, J., Liddell, H.: Support vector machine based multi-view face detection and recognition. Image and Vision Computing 22, 413–427 (2004)CrossRefGoogle Scholar
  16. 16.
    Foytik, J., Asari, V.K., Youssef, M., Tompkins, R.C.: Head pose estimation from images using canonical correlation analysis. In: 2010 IEEE 39th Applied Imagery Pattern Recognition Workshop (AIPR), pp. 1–7. IEEE (2010)Google Scholar
  17. 17.
    Moon, H., Miller, M.: Estimating facial pose from a sparse representation [face recognition applications]. In: 2004 International Conference on Image Processing, ICIP 2004, vol. 1, pp. 75–78. IEEE (2004)Google Scholar
  18. 18.
    Ji, H., Liu, R., Su, F., Su, Z., Tian, Y.: Robust head pose estimation via convex regularized sparse regression. In: 2011 18th IEEE International Conference on Image Processing (ICIP), pp. 3617–3620. IEEE (2011)Google Scholar
  19. 19.
    Matsumoto, Y., Zelinsky, A.: An algorithm for real-time stereo vision implementation of head pose and gaze direction measurement. In: Proceedings of the Fourth IEEE International Conference on Automatic Face and Gesture Recognition, pp. 499–504. IEEE (2000)Google Scholar
  20. 20.
    Ghaffari, A., Rezvan, M., Khodayari, A., Sadati, S.H., Vahidi-Shams, A.: A new head pose estimating algorithm based on a novel feature space for driver assistant systems. In: 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), pp. 180–185. IEEE (2011)Google Scholar
  21. 21.
    Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality tracking: an integrated system and evaluation for monitoring driver awareness. IEEE Transactions on Intelligent Transportation Systems 11, 300–311 (2010)CrossRefGoogle Scholar
  22. 22.
    Xu, M., Raytchev, B., Sakaue, K., Hasegawa, O., Koizumi, A., Takeuchi, M., Sagawa, H.: A vision-based method for recognizing non-manual information in japanese sign language. In: Tan, T., Shi, Y., Gao, W. (eds.) ICMI 2000. LNCS, vol. 1948, pp. 572–581. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  23. 23.
    Erdem, U., Sclaroff, S.: Automatic detection of relevant head gestures in american sign language communication. In: Proceedings of the 16th International Conference on Pattern Recognition, vol. 1, pp. 460–463. IEEE (2002)Google Scholar
  24. 24.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2001, vol. 1, pp. I–511. IEEE (2001)Google Scholar
  25. 25.
    Gourier, N., Hall, D., Crowley, J.: Estimating face orientation from robust detection of salient facial structures. In: FG Net Workshop on Visual Observation of Deictic Gestures, pp. 1–9 (2004)Google Scholar
  26. 26.
    Ho, H.T., Chellappa, R.: Automatic head pose estimation using randomly projected dense sift descriptors. In: 2012 19th IEEE International Conference on Image Processing (ICIP), pp. 153–156. IEEE (2012)Google Scholar
  27. 27.
    Haj, M.A., Gonzalez, J., Davis, L.S.: On partial least squares in head pose estimation: how to simultaneously deal with misalignment. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2602–2609. IEEE (2012)Google Scholar
  28. 28.
    Guo, G., Fu, Y., Dyer, C.R., Huang, T.S.: Head pose estimation: Classification or regression? In: 19th International Conference on Pattern Recognition, ICPR 2008, pp. 1–4. IEEE (2008)Google Scholar
  29. 29.
    Tu, J., Fu, Y., Hu, Y., Huang, T.: Evaluation of head pose estimation for studio data. In: Stiefelhagen, R., Garofolo, J.S. (eds.) CLEAR 2006. LNCS, vol. 4122, pp. 281–290. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  30. 30.
    Jantunen, T., Burger, B., De Weerdt, D., Seilola, I., Wainio, T.: Experiences from collecting motion capture data on continuous signing. In: Crasborn, O., Efthimiou, E., Fotinea, E., Hanke, T., Kristoffersen, J., Mesch, J. (eds.) Proceedings of the 5th Workshop on the Representation and Processing of Sign Languages: Interactions Between Corpus and Lexicon, Istanbul, Turkey, pp. 75–82 (2012)Google Scholar
  31. 31.
    Littlewort, G., Whitehill, J., Wu, T., Fasel, I.R., Frank, M.G., Movellan, J.R., Bartlett, M.S.: The computer expression recognition toolbox (cert). In: FG, pp. 298–305. IEEE (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Marcos Luzardo
    • 1
  • Matti Karppa
    • 1
  • Jorma Laaksonen
    • 1
  • Tommi Jantunen
    • 2
  1. 1.Department of Information and Computer ScienceAalto University School of ScienceEspooFinland
  2. 2.Sign Language Centre, Department of LanguagesUniversity of JyväskyläFinland

Personalised recommendations