Leveraging deep learning with symbolic sequences for robust head poses estimation

Abstract

Head pose estimation is a challenging topic in computer vision with a large area of applications. There are a lot of methods which have been presented in the literature to undertake pose estimation so far. Even though the efficiency of these methods is acceptable, the sensitivity to external conditions is still being a big challenge. In this paper, we come up with a new model to overcome the problem of head poses estimation. First, the face images are converted into one-dimensional vectors as a time series using the Peano–Hilbert space-filling curve. Then, we convert these numerical series into symbolic sequences with adequate dimensionality reduction approaches. These sequences are then used as input of an encode–decoder neural network to learn and generate labels of the faces orientations. We have evaluated our model on several databases, and the experimental results have shown that the proposed method is very competitive compared to other well-known approaches.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. 1.

    Alioua N, Amine A, Rogozan A, Bensrhair A, Rziza M (2016) Driver head pose estimation using efficient descriptor fusion. EURASIP J Image and Video Process 2016(1):2

    Google Scholar 

  2. 2.

    Wang B, Liang W, Wang Y, Liang Y (2013) Head pose estimation with combined 2D sift and 3D hog features. In: 2013 seventh international conference on image and graphics (ICIG), IEEE, pp 650–655

  3. 3.

    Jones M, Viola P (2003) Fast multi-view face detection. Mitsubishi Electr Res Lab TR-20003-96 3(14):2

    Google Scholar 

  4. 4.

    Sutskever I, Vinyals O,  Le QV (2014) Sequence to sequence learning with neural networks. Advances in NIPS

  5. 5.

    Mekami H, Benabderrahmane S (2010) Towards a new approach for real time face detection and normalization. In: 2010 international conference on machine and web intelligence (ICMWI), IEEE, pp 455–459

  6. 6.

    Murphy-Chutorian E, Trivedi MM (2009) Head pose estimation in computer vision: a survey. IEEE Trans Pattern Anal Mach Intell 31(4):607–626

    Google Scholar 

  7. 7.

    Wiskott L, Würtz RP, Westphal G (2014) Elastic bunch graph matching. Scholarpedia 9(3):10587

    Google Scholar 

  8. 8.

    Elagin E, Steffens J, Neven H (1998) Automatic pose estimation system for human faces based on bunch graph matching technology. In: Third IEEE international conference on automatic face and gesture recognition, 1998. Proceedings. IEEE, pp 136–141

  9. 9.

    Wang J-G, Sung E (2007) Em enhancement of 3D head pose estimated by point at infinity. Image Vis Comput 25(12):1864–1874

    Google Scholar 

  10. 10.

    Ohue K, Yamada Y, Uozumi S, Tokoro S, Hattori A, Hayashi T (2006) Development of a new pre-crash safety system, Technical report, SAE Technical Paper

  11. 11.

    Narayanan A, Kaimal RM, Bijlani K (2014) Yaw estimation using cylindrical and ellipsoidal face models. IEEE Trans Intell Transp Syst 15(5):2308–2320

    Google Scholar 

  12. 12.

    Niyogi S, Freeman W (1996) Example-based head tracking. In: Proceedings of international conference automatic face and gesture recognition, pp 374–378

  13. 13.

    Li C, Zhong F, Zhang Q, Qin X (2018) Accurate and fast 3D head pose estimation with noisy RGBD images. Multimed Tools Appl 77(12):14605–14624

    Google Scholar 

  14. 14.

    Schulz A, Stiefelhagen R (2012) Video-based pedestrian head pose estimation for risk assessment. In: 2012 15th international IEEE conference on intelligent transportation systems (ITSC). IEEE, pp 1771–1776

  15. 15.

    Han B, Lee S, Yang HS (2014) Head pose estimation using image abstraction and local directional quaternary patterns for multiclass classification. Pattern Recog Lett 45:145–153

    Google Scholar 

  16. 16.

    Li W, Huang Y, Peng J (2014) Automatic and robust head pose estimation by block energy map. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 3357–3361

  17. 17.

    Bailly K, Milgram M (2009) Boosting feature selection for neural network based regression. Neural Netw 22(5):748–756

    Google Scholar 

  18. 18.

    Fanelli G, Gall J, Van Gool L (2011) Real time head pose estimation with random regression forests. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 617–624

  19. 19.

    Zhu R, Sang G, Cai Y, You J,  Zhao Q (2013) Head pose estimation with improved random regression forests. In: Chinese conference on biometric recognition. Springer, Cham, pp 457–465

    Google Scholar 

  20. 20.

    Al Haj M, Gonzalez J, Davis LS (2012) On partial least squares in head pose estimation: how to simultaneously deal with misalignment. In: 2012 IEEE conference on computer vision and pattern recognitin (CVPR). IEEE, pp 2602–2609

  21. 21.

    Drouard V, Horaud R, Deleforge A, Ba S, Evangelidis G (2017) Robust head-pose estimation based on partially-latent mixture of linear regressions. IEEE Trans Image Process 26(3):1428–1440

    MathSciNet  MATH  Google Scholar 

  22. 22.

    Balasubramanian VN, Ye J, Panchanathan S (2007) Biased manifold embedding: a framework for person-independent head pose estimation. In: 2007. CVPR’07. IEEE conference on computer vision and pattern recognition. IEEE, pp 1–7

  23. 23.

    Huang D, Storer M, De la Torre F, Bischof H (2011) Supervised local subspace learning for continuous head pose estimation. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 2921–2928

  24. 24.

    Foytik J, Asari VK (2013) A two-layer framework for piecewise linear manifoldbased head pose estimation. Int J Comput Vis 101(2):270–287

    MathSciNet  Google Scholar 

  25. 25.

    Liu Y, Wang Q, Jiang Y, Lei Y (2014) Supervised locality discriminant manifold learning for head pose estimation. Knowl Based Syst 66:126–135

    Google Scholar 

  26. 26.

    Diaz-Chito K, Del Rincón JM, Hernández-Sabaté A, Gil D (2018) Continuous head pose estimation using manifold subspace embedding and multivariate regression. IEEE Access 6:18325–18334

    Google Scholar 

  27. 27.

    Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Google Scholar 

  28. 28.

    Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117

    Google Scholar 

  29. 29.

    Zhang Z, Luo P, Loy CC, Tang X (2016) Learning deep representation for face alignment with auxiliary attributes. IEEE Trans Pattern Anal Mach Intell 38(5):918–930

    Google Scholar 

  30. 30.

    Zhang Z, Luo P, Loy CC, Tang X (2014) Facial landmark detection by deep multi-task learning. In: European conference on computer vision. Springer, pp 94–108

  31. 31.

    Venturelli M, Borghi G, Vezzani R, Cucchiara R (2017) From depth data to head pose estimation: a siamese approach. arXiv:1703.03624

  32. 32.

    Patacchiola M, Cangelosi A (2017) Head pose estimation in the wild using convolutional neural networks and adaptive gradient methods. Pattern Recognit 71:132–143

    Google Scholar 

  33. 33.

    Ruiz N, Chong E, Rehg JM (2018) Fine-grained head pose estimation without keypoints. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 2074–2083

  34. 34.

    Lathuiliere S, Juge R, Mesejo P, Munoz-Salinas R, Horaud R (2017) Deep mixture of linear inverse regressions applied to head-pose estimation. In: IEEE conference on computer vision and pattern recognition, vol. 3, p 7

  35. 35.

    Borghi G, Gasparini R, Vezzani R, Cucchiara R (2017) Embedded recurrent network for head pose estimation in car. In: 2017 IEEE intelligent vehicles symposium (IV). IEEE, pp 1503–1508

  36. 36.

    Xia J, Cao L, Zhang G, Liao J (2019) Head pose estimation in the wild assisted by facial landmarks based on convolutional neural networks. IEEE Access 7:48470–48483

    Google Scholar 

  37. 37.

    Gupta A, Thakkar K, Gandhi V, Narayanan PJ (2019) Nose, eyes and ears: head pose estimation by locating facial keypoints. In: ICASSP 2019–2019 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 1977–1981

  38. 38.

    Xu L, Chen J, Gan Y (2019) Head pose estimation with soft labels using regularized convolutional neural network. Neurocomputing 337:339–353

    Google Scholar 

  39. 39.

    Hsu HW, Wu TY, Wan S, Wong WH, Lee CY (2018) QuatNet: Quaternion-Based head pose estimation with multiregression loss. IEEE Trans Multimed 21(4):1035–1046

    Google Scholar 

  40. 40.

    Benabderrahmane Sidahmed, Mellouli Nedra, Lamolle Myriam, Paroubek Patrick (2017) Smart4Job: A big data framework for intelligent job offers broadcasting using time series forecasting and semantic classification. Big Data Res 7:16–30

    Google Scholar 

  41. 41.

    Benabderrahmane Sidahmed, Mellouli Nedra, Lamolle Myriam (2018) On the predictive analysis of behavioral massive job data using embedded clustering and deep recurrent neural networks. Knowl Based Syst 151:95–113

    Google Scholar 

  42. 42.

    Benabderrahmane S, Quiniou R,  Guyet T (2014) Evaluating distance measures and times series clustering for temporal patterns retrieval. In: Proceedings of the 2014 IEEE 15th international conference on information reuse and integration (IEEE IRI 2014). IEEE, pp 434–441  

  43. 43.

    Fu T-C (2011) A review on time series data mining. Eng Appl Artif Intell 24(1):164–181

    Google Scholar 

  44. 44.

    Kleist C (2015) Time series data mining methods: a review. Unpublished master’s thesis). Humboldt-Universität zu Berlin, Germany

  45. 45.

    Wang X, Mueen A, Ding H, Trajcevski G, Scheuermann P, Keogh E (2013) Experimental comparison of representation methods and distance measures for time series data. Data Min Knowl Discov 26(2):275–309

    MathSciNet  Google Scholar 

  46. 46.

    Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing sax: a novel symbolic representation of time series. Data Min Knowl Discov 15(2):107–144

    MathSciNet  Google Scholar 

  47. 47.

    Mekami H, Benabderrahmane S (2017) Sax2face: Estimating facial poses with Peano–Hilbert curves and sax symbolic time series. Procedia Comput Sci 109:217–224

    Google Scholar 

  48. 48.

    Mekami H, Benabderrahmane S, Bounoua A, Taleb-Ahmed A (2018) Local patterns and big time series data for facial poses classification. J Comput 13(1):18–35

    Google Scholar 

  49. 49.

    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Google Scholar 

  50. 50.

    Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. arXiv:1406.1078

  51. 51.

    Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Google Scholar 

  52. 52.

    Phillips PJ, Wechsler H, Huang J, Rauss PJ (1998) The feret database and evaluation procedure for face-recognition algorithms. Image Vis Comput 16(5):295–306

    Google Scholar 

  53. 53.

    Gao W, Cao B, Shan S, Chen X, Zhou D, Zhang X, Zhao D (2008) The cas-peal large-scale chinese face database and baseline evaluations. IEEE Trans Syst Man Cybern Part A Syst Hum 38(1):149–161

    Google Scholar 

  54. 54.

    Gourier N, Hall D, Crowley JL (2004) Estimating face orientation from robust detection of salient facial features. In: ICPR international workshop on visual observation of Deictic Gestures, Citeseer

  55. 55.

    Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: A system for largescale machine learning. In: OSDI, vol 16. pp 265–283

  56. 56.

    Ma B, Li A, Chai X, Shan S (2014) Covga: A novel descriptor based on symmetry of regions for head pose estimation. Neurocomputing 143:97–108

    Google Scholar 

  57. 57.

    Ma B, Huang R, Qin L (2015) Vod: a novel image representation for head yaw estimation. Neurocomputing 148:455–466

    Google Scholar 

  58. 58.

    Huang C, Ding X, Fang C (2010) Head pose estimation based on random forests for multiclass classification. In: 2010 20th international conference on pattern recognition (ICPR). IEEE, pp 934–937

  59. 59.

    Cai Y, Yang M-L, Li J (2015) Multiclass classification based on a deep convolutional network for head pose estimation. Front Inf Technol Electron Eng 16(11):930–939

    Google Scholar 

  60. 60.

    Ma B, Shan S, Chen X, Gao W (2008) Head yaw estimation from asymmetry of facial appearance. IEEE Trans Syst Man Cybern Part B Cybern 38(6):1501–1512

    Google Scholar 

  61. 61.

    Gao B-B, Xing C, Xie C-W, Wu J, Geng X (2017) Deep label distribution learning with label ambiguity. IEEE Trans Image Process 26(6):2825–2838

    MathSciNet  MATH  Google Scholar 

  62. 62.

    Geng X, Xia Y (2014) Head pose estimation based on multivariate label distribution. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1837–1842

  63. 63.

    Liu Y, Chen J, Su Z, Luo Z, Luo N, Liu L, Zhang K (2016) Robust head pose estimation using dirichlet-tree distribution enhanced random forests. Neurocomputing 173:42–53

    Google Scholar 

  64. 64.

    Liu Y, Xie Z, Yuan X, Chen J, Song W (2017) Multi-level structured hybrid forest for joint head detection and pose estimation. Neurocomputing 266:206–215

    Google Scholar 

  65. 65.

    Geng X (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28(7):1734–1748

    Google Scholar 

  66. 66.

    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833

  67. 67.

    Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Hayet Mekami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mekami, H., Bounoua, A. & Benabderrahmane, S. Leveraging deep learning with symbolic sequences for robust head poses estimation. Pattern Anal Applic 23, 1391–1406 (2020). https://doi.org/10.1007/s10044-019-00857-5

Download citation

Keywords

  • Head pose estimation
  • Time series
  • Encode–decoder recurrent network
  • Symbolic aggregate approximation
  • Sequence to sequence