Skip to main content

Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network

  • Conference paper
  • First Online:
Computer Vision -- ACCV 2014 (ACCV 2014)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9005))

Included in the following conference series:

Abstract

We propose an efficient and accurate head orientation estimation algorithm using a monocular camera. Our approach is leveraged by deep neural network and we exploit the architecture in a data regression manner to learn the mapping function between visual appearance and three dimensional head orientation angles. Therefore, in contrast to classification based approaches, our system outputs continuous head orientation. The algorithm uses convolutional filters trained with a large number of augmented head appearances, thus it is user independent and covers large pose variations. Our key observation is that an input image having \(32 \times 32\) resolution is enough to achieve about 3 degrees of mean square error, which can be used for efficient head orientation applications. Therefore, our architecture takes only 1 ms on roughly localized head positions with the aid of GPU. We also propose particle filter based post-processing to enhance stability of the estimation further in video sequences. We compare the performance with the state-of-the-art algorithm which utilizes depth sensor and we validate our head orientation estimator on Internet photos and video.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 31, 607–626 (2009)

    Article  Google Scholar 

  2. Foytik, J., Asari, V.K.: A two-layer framework for piecewise linear manifold-based head pose estimation. Int. J. Comput. Vis. (IJCV) 101, 270–287 (2013)

    Article  MathSciNet  Google Scholar 

  3. Zhu, X., Ramanan, D.: Face detection, pose estimation and landmark localization in the wild. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886 (2012)

    Google Scholar 

  4. Ji, H., Liu, R., Su, F., Su, Z., Tian, Y.: Robust head pose estimation via convex regularized sparse regression. In: Proceedings of International Conference on Image Processing (ICIP), pp. 3617–3620 (2011)

    Google Scholar 

  5. Huang, C., Ding, X., Fang, C.: Head pose estimation based on random forests for multiclass classification. In: Proceedings of International Conference on Pattern Recognition (ICPR), pp. 934–937 (2010)

    Google Scholar 

  6. BenAbdelkader, C.: Robust head pose estimation using supervised manifold learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part VI. LNCS, vol. 6316, pp. 518–531. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  7. Aghajanian, J., Prince, S.J.: Face pose estimation in uncontrolled environments. In: Proceedings of British Machine Vision Conference (BMVC), pp. 1–11 (2009)

    Google Scholar 

  8. Gruji, N., Ili, S., Lepetit, V., Fua, P.: 3d facial pose estimation by image retrieval. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition (2008)

    Google Scholar 

  9. Balasubramanian, V.N., Ye, J., Panchanathan, S.: Biased manifold embedding: a framework for person-independent head pose estimation. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 1–7 (2007)

    Google Scholar 

  10. Breitenstein, M.D., Kuettel, D., Weise, T., van Gool, L.: Real-time face pose estimation from single range images. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)

    Google Scholar 

  11. Padeleris, P., Zabulis, X., Argyros, A.A.: Head pose estimation on depth data based on particle swarm optimization. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 42–49 (2012)

    Google Scholar 

  12. Fanelli, G., Dantone, M., Gall, J., Fossati, A., Gool, L.V.: Random forests for real time 3d face analysis. Int. J. Comput. Vis. (IJCV) 101, 437–458 (2013)

    Article  Google Scholar 

  13. Hug, Y., Chen, L., Zhoug, Y., Zhang, H.: Estimating face pose by facial asymmetry and geometry. In: 6th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2004, pp. 651–656 (2004)

    Google Scholar 

  14. Pathangay, V., Das, S., Greiner, T.: Symmetry-based face pose estimation from a single uncalibrated view. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–8 (2008)

    Google Scholar 

  15. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active shape models-their training and application. Comput. Vis. Image Underst. (CVIU) 61, 38–59 (1995)

    Article  Google Scholar 

  16. Cootes, T.F., Edwards, G., Taylor, C.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 23, 681–685 (2001)

    Article  Google Scholar 

  17. Martins, P., Batista, J.: Accurate single view model-based head pose estimation. In: 8th IEEE International Conference on Automatic Face and Gesture Recognition, FG 2008, pp. 1–6 (2008)

    Google Scholar 

  18. Morency, L.P., Whitehill, J., Movellan, J.: Monocular head pose estimation using generalized adaptive view-based appearance model. Image Vis. Comput. 28, 754–761 (2009)

    Article  Google Scholar 

  19. Gourier, N., Hall, D., Crowley, J.L.: Estimating face orientation from robust detection of salient facial features. In: Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures (2004)

    Google Scholar 

  20. Lecun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1, 541–551 (1989)

    Article  Google Scholar 

  21. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS) (2012)

    Google Scholar 

  22. Sermanet, P., Kavukcuoglu, K., Chintala, S., LeCun, Y.: Pedestrian detection with unsupervised multi-stage feature learning. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3626–3633 (2013)

    Google Scholar 

  23. Burger, H.C., Schuler, C.J., Harmeling, S.: Image denoising: can plain neural networks compete with bm3d? In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  24. Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Proceedings of Computer Vision and Pattern Recognition (CVPR), pp. 3476–3483 (2013)

    Google Scholar 

  25. Zhou, E., Fan, H., Cao, Z., Jiang, Y., Yin, Q.: Extensive facial landmark localization with coarse-to-fine convolutional network cascade. In: IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 386–391 (2013)

    Google Scholar 

  26. Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR) (2014)

    Google Scholar 

  27. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  28. Coates, A., Lee, H., Ng, A.Y.: An analysis of single-layer networks in unsupervised feature learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 215–233 (2011)

    Google Scholar 

  29. Zhu, Z., Luo, P., Wang, X., Tang, X.: Recover canonical-view faces in the wild with deep neural networks. Computing Research Repository (CoRR), arXiv (2014)

    Google Scholar 

  30. Doucet, A., Freitas, N.D., Gorden, N.: Sequential Monte Carlo Methods in Practice. Springer, New York (2001)

    Book  MATH  Google Scholar 

  31. Gordon, N., Salmond, D., Smith, A.: Novel approach to nonlinear/nongaussian Bayesian state estimation. IEE Proc. Radar Sig. Process. 140, 107–113 (1993)

    Article  Google Scholar 

  32. Weise, T., Bouaziz, S., Li, H., Pauly, M.: Realtime performance-based facial animation. In: Proceedings of SIGGRAPH (2011)

    Google Scholar 

  33. Nuevo, J., Bergasa, L.M., Jiménez, P.: Rsmat: Robust simultaneous modeling and tracking. Pattern Recogn. Lett. 31, 2455–2463 (2010)

    Article  Google Scholar 

Download references

Acknowledgement

We appreciate constructive comments from anonymous reviewers. This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIP) (No. 2010-0028680).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to In So Kweon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ahn, B., Park, J., Kweon, I.S. (2015). Real-Time Head Orientation from a Monocular Camera Using Deep Neural Network. In: Cremers, D., Reid, I., Saito, H., Yang, MH. (eds) Computer Vision -- ACCV 2014. ACCV 2014. Lecture Notes in Computer Science(), vol 9005. Springer, Cham. https://doi.org/10.1007/978-3-319-16811-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16811-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16810-4

  • Online ISBN: 978-3-319-16811-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics