Deep Head Pose Estimation from Depth Data for In-Car Automotive Applications

  • Marco Venturelli
  • Guido Borghi
  • Roberto Vezzani
  • Rita Cucchiara
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10188)

Abstract

Recently, deep learning approaches have achieved promising results in various fields of computer vision. In this paper, we tackle the problem of head pose estimation through a Convolutional Neural Network (CNN). Differently from other proposals in the literature, the described system is able to work directly and based only on raw depth data. Moreover, the head pose estimation is solved as a regression problem and does not rely on visual facial features like facial landmarks. We tested our system on a well known public dataset, Biwi Kinect Head Pose, showing that our approach achieves state-of-art results and is able to meet real time performance requirements.

References

  1. 1.
    distraction.gov, official us government website for distracted driving. http://www.distraction.gov/index.html. Accessed 1 Sept 2016
  2. 2.
    Craye, C., Karray, F.: Driver distraction detection and recognition using RGB-D sensor. CoRR, vol. abs/1502.00250 (2015). http://arxiv.org/abs/1502.00250
  3. 3.
    Rahman, H., Begum, S., Ahmed, M.U.: Driver monitoring in the context of autonomous vehicle, November 2015. http://www.es.mdh.se/publications/4021-
  4. 4.
    Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009).  https://doi.org/10.1109/TPAMI.2008.106CrossRefGoogle Scholar
  5. 5.
    Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617–624. IEEE (2011)Google Scholar
  6. 6.
    Ahn, B., Park, J., Kweon, I.S.: Real-time head orientation from a monocular camera using deep neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014, Part III. LNCS, vol. 9005, pp. 82–96. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16811-1_6CrossRefGoogle Scholar
  7. 7.
    Mukherjee, S.S., Robertson, N.M.: Deep head pose: gaze-direction estimation in multimodal video. IEEE Trans. Multimed. 17(11), 2094–2107 (2015)CrossRefGoogle Scholar
  8. 8.
    Liu, X., Liang, W., Wang, Y., Li, S., Pei, M.: 3D head pose estimation with convolutional neural network trained on synthetic images. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 1289–1293. IEEE (2016)Google Scholar
  9. 9.
    Chen, J., Wu, J., Richter, K., Konrad, J., Ishwar, P.: Estimating head pose orientation using extremely low resolution images. In: IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI), pp. 65–68. IEEE (2016)Google Scholar
  10. 10.
    Drouard, V., Ba, S., Evangelidis, G., Deleforge, A., Horaud, R.: Head pose estimation via probabilistic high-dimensional regression. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 4624–4628. IEEE (2015)Google Scholar
  11. 11.
    Malassiotis, S., Strintzis, M.G.: Robust real-time 3D head pose estimation from range data. Pattern Recogn. 38(8), 1153–1165 (2005)CrossRefGoogle Scholar
  12. 12.
    Breitenstein, M.D., Kuettel, D., Weise, T., Van Gool, L., Pfister, H.: Real-time face pose estimation from single range images. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008, pp. 1–8. IEEE (2008)Google Scholar
  13. 13.
    Kondori, F.A., Yousefi, S., Li, H., Sonning, S., Sonning, S.: 3D head pose estimation using the kinect. In: 2011 International Conference on Wireless Communications and Signal Processing (WCSP), pp. 1–4. IEEE (2011)Google Scholar
  14. 14.
    Padeleris, P., Zabulis, X., Argyros, A.A.: Head pose estimation on depth data based on particle swarm optimization. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp. 42–49. IEEE (2012)Google Scholar
  15. 15.
    Papazov, C., Marks, T.K., Jones, M.: Real-time 3D head pose and facial landmark estimation from depth images using triangular surface patch features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4722–4730 (2015)Google Scholar
  16. 16.
    Seemann, E., Nickel, K., Stiefelhagen, R.: Head pose estimation using stereo vision for human-robot interaction. In: FGR, pp. 626–631. IEEE Computer Society (2004). http://dblp.uni-trier.de/db/conf/fgr/fgr2004.html
  17. 17.
    Bleiweiss, A., Werman, M.: Robust head pose estimation by fusing time-of-flight depth and color. In: 2010 IEEE International Workshop on Multimedia Signal Processing (MMSP), pp. 116–121. IEEE (2010)Google Scholar
  18. 18.
    Baltrušaitis, T., Robinson, P., Morency, L.-P.: 3D constrained local model for rigid and non-rigid facial tracking. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2610–2617. IEEE (2012)Google Scholar
  19. 19.
    Yang, J., Liang, W., Jia, Y.: Face pose estimation with combined 2D and 3D hog features. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 2492–2495. IEEE (2012)Google Scholar
  20. 20.
    Saeed, A., Al-Hamadi, A.: Boosted human head pose estimation using Kinect camera. In: 2015 IEEE International Conference on Image Processing (ICIP), pp. 1752–1756. IEEE (2015)Google Scholar
  21. 21.
    Ghiass, R.S., Arandjelović, O., Laurendeau, D.: Highly accurate and fully automatic head pose estimation from a low quality consumer-level RGB-D sensor. In: Proceedings of the 2nd Workshop on Computational Models of Social Interactions: Human-Computer-Media Communication, pp. 25–34. ACM (2015)Google Scholar
  22. 22.
    Viola, P., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)CrossRefGoogle Scholar
  23. 23.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
  24. 24.
    Fanelli, G., Dantone, M., Gall, J., Fossati, A., Van Gool, L.: Random forests for real time 3D face analysis. Int. J. Comput. Vis. 101(3), 437–458 (2013)CrossRefGoogle Scholar
  25. 25.
    Nuevo, J., Bergasa, L.M., Jiménez, P.: RSMAT: robust simultaneous modeling and tracking. Pattern Recogn. Lett. 31, 2455–2463 (2010).  https://doi.org/10.1016/j.patrec.2010.07.016CrossRefGoogle Scholar
  26. 26.
    Bagdanov, A.D., Masi, I., Del Bimbo, A.: The florence 2D/3D hybrid face datset. In: Proceedings of ACM Multimedia International Workshop on Multimedia Access to 3D Human Objects (MA3HO 2011). ACM Press, December 2011Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Marco Venturelli
    • 1
  • Guido Borghi
    • 1
  • Roberto Vezzani
    • 1
  • Rita Cucchiara
    • 1
  1. 1.DIEFUniversity of Modena and Reggio EmiliaModenaItaly

Personalised recommendations