Advertisement

Deep Directional Statistics: Pose Estimation with Uncertainty Quantification

  • Sergey Prokudin
  • Peter Gehler
  • Sebastian Nowozin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11213)

Abstract

Modern deep learning systems successfully solve many perception tasks such as object pose estimation when the input image is of high quality. However, in challenging imaging conditions such as on low resolution images or when the image is corrupted by imaging artifacts, current systems degrade considerably in accuracy. While a loss in performance is unavoidable, we would like our models to quantify their uncertainty to achieve robustness against images of varying quality. Probabilistic deep learning models combine the expressive power of deep learning with uncertainty quantification. In this paper we propose a novel probabilistic deep learning model for the task of angular regression. Our model uses von Mises distributions to predict a distribution over object pose angle. Whereas a single von Mises distribution is making strong assumptions about the shape of the distribution, we extend the basic model to predict a mixture of von Mises distributions. We show how to learn a mixture model using a finite and infinite number of mixture components. Our model allows for likelihood-based training and efficient inference at test time. We demonstrate on a number of challenging pose estimation datasets that our model produces calibrated probability predictions and competitive or superior point estimates compared to the current state-of-the-art.

Keywords

Pose estimation Deep probabilistic models Uncertainty quantification Directional statistics 

Notes

Acknowledgments

This work was supported by Microsoft Research through its PhD Scholarship Programme.

Supplementary material

474192_1_En_33_MOESM1_ESM.pdf (1.8 mb)
Supplementary material 1 (pdf 1816 KB)

References

  1. 1.
    Marchand, E., Uchiyama, H., Spindler, F.: Pose estimation for augmented reality: a hands-on survey. IEEE Trans. Vis. Comput. Graph. 22(12), 2633–2651 (2016)CrossRefGoogle Scholar
  2. 2.
    Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009)CrossRefGoogle Scholar
  3. 3.
    Poirson, P., Ammirato, P., Fu, C.Y., Liu, W., Kosecka, J., Berg, A.C.: Fast single shot detection and pose estimation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 676–684. IEEE (2016)Google Scholar
  4. 4.
    Massa, F., Marlet, R., Aubry, M.: Crafting a multi-task CNN for viewpoint estimation. arXiv preprint arXiv:1609.03894 (2016)
  5. 5.
    Beyer, L., Hermans, A., Leibe, B.: Biternion nets: continuous head pose regression from discrete training labels. In: Gall, J., Gehler, P., Leibe, B. (eds.) GCPR 2015. LNCS, vol. 9358, pp. 157–168. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-24947-6_13CrossRefGoogle Scholar
  6. 6.
    Berger, J.O.: Statistical Decision Theory and Bayesian Analysis. Springer, Heidelberg (1980).  https://doi.org/10.1007/978-1-4757-4286-2CrossRefGoogle Scholar
  7. 7.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 75–82. IEEE (2014)Google Scholar
  8. 8.
    Siriteerakul, T.: Advance in head pose estimation from low resolution images: a review. Int. J. Comput. Sci. Issues 9(2) (2012)Google Scholar
  9. 9.
    Odobez, J.M.: IDIAP Head Pose Database. https://www.idiap.ch/dataset/headpose
  10. 10.
    Gourier, N., Hall, D., Crowley, J.L.: Estimating face orientation from robust detection of salient facial structures. In: FG Net Workshop on Visual Observation of Deictic Gestures, vol. 6 (2004)Google Scholar
  11. 11.
    Demirkus, M., Clark, J.J., Arbel, T.: Robust semi-automatic head pose labeling for real-world face video sequences. Multimedia Tools Appl. 70(1), 495–523 (2014)CrossRefGoogle Scholar
  12. 12.
    Murphy-Chutorian, E., Doshi, A., Trivedi, M.M.: Head pose estimation for driver assistance systems: a robust algorithm and experimental evaluation. In: IEEE Intelligent Transportation Systems Conference, ITSC 2007, pp. 709–714. IEEE (2007)Google Scholar
  13. 13.
    Fisher, R., Santos-Victor, J., Crowley, J.: Caviar: context aware vision using image-based active recognition (2005)Google Scholar
  14. 14.
    Benfold, B., Reid, I.: Unsupervised learning of a scene-specific coarse gaze estimator. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2344–2351. IEEE (2011)Google Scholar
  15. 15.
    Fanelli, G., Gall, J., Van Gool, L.: Real time head pose estimation with random regression forests. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 617–624. IEEE (2011)Google Scholar
  16. 16.
    Chamveha, I., et al.: Head direction estimation from low resolution images with scene adaptation. Comput. Vis. Image Underst. 117(10), 1502–1511 (2013)CrossRefGoogle Scholar
  17. 17.
    Chen, C., Odobez, J.M.: We are not contortionists: coupled adaptive learning for head and body orientation estimation in surveillance video. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1544–1551. IEEE (2012)Google Scholar
  18. 18.
    Flohr, F., Dumitru-Guzu, M., Kooij, J.F.P., Gavrila, D.: A probabilistic framework for joint pedestrian head and body orientation estimation. IEEE Trans. Intell. Transp. Syst. 16, 1872–1882 (2015)CrossRefGoogle Scholar
  19. 19.
    Osadchy, M., Cun, Y.L., Miller, M.L.: Synergistic face detection and pose estimation with energy-based models. J. Mach. Learn. Res. 8(May), 1197–1215 (2007)Google Scholar
  20. 20.
    Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2578–2585. IEEE (2012)Google Scholar
  21. 21.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886. IEEE (2012)Google Scholar
  22. 22.
    Lu, J., Tan, Y.P.: Ordinary preserving manifold analysis for human age and head pose estimation. IEEE Trans. Hum.-Mach. Syst. 43(2), 249–258 (2013)CrossRefGoogle Scholar
  23. 23.
    Huang, D., Storer, M., De la Torre, F., Bischof, H.: Supervised local subspace learning for continuous head pose estimation. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921–2928. IEEE (2011)Google Scholar
  24. 24.
    Tosato, D., Spera, M., Cristani, M., Murino, V.: Characterizing humans on riemannian manifolds. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1972–1984 (2013)CrossRefGoogle Scholar
  25. 25.
    BenAbdelkader, C.: Robust head pose estimation using supervised manifold learning. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 518–531. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15567-3_38CrossRefGoogle Scholar
  26. 26.
    Geng, X., Xia, Y.: Head pose estimation based on multivariate label distribution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1837–1842 (2014)Google Scholar
  27. 27.
    Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)Google Scholar
  28. 28.
    Ba, S.O., Odobez, J.M.: A probabilistic framework for joint head tracking and pose estimation. In: Proceedings of the 17th International Conference on Pattern Recognition, ICPR 2004, vol. 4, pp. 264–267. IEEE (2004)Google Scholar
  29. 29.
    Demirkus, M., Precup, D., Clark, J.J., Arbel, T.: Probabilistic temporal head pose estimation using a hierarchical graphical model. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 328–344. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_22CrossRefGoogle Scholar
  30. 30.
    Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? arXiv preprint arXiv:1703.04977 (2017)
  31. 31.
    Savarese, S., Fei-Fei, L.: 3D generic object categorization, localization and pose estimation. In: IEEE 11th International Conference on Computer Vision, ICCV 2007, pp. 1–8. IEEE (2007)Google Scholar
  32. 32.
    Ozuysal, M., Lepetit, V., Fua, P.: Pose estimation for category specific multiview object localization. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 778–785 (2009)Google Scholar
  33. 33.
    Xiang, Y., Mottaghi, R., Savarese, S.: Beyond PASCAL: a benchmark for 3D object detection in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2014)Google Scholar
  34. 34.
    Pepik, B., Gehler, P., Stark, M., Schiele, B.: 3D\(^{2}\)PM – 3D deformable part models. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 356–370. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33783-3_26CrossRefGoogle Scholar
  35. 35.
    Pepik, B., Stark, M., Gehler, P., Schiele, B.: Teaching 3D geometry to deformable part models. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3362–3369. IEEE, Providence, June 2012. Oral PresentationGoogle Scholar
  36. 36.
    Su, H., Qi, C.R., Li, Y., Guibas, L.J.: Render for CNN: viewpoint estimation in images using CNNs trained with rendered 3D model views. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2686–2694 (2015)Google Scholar
  37. 37.
    Braun, M., Rao, Q., Wang, Y., Flohr, F.: Pose-RCNN: joint object detection and pose estimation using 3D object proposals. In: 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), pp. 1546–1551. IEEE (2016)Google Scholar
  38. 38.
    Tulsiani, S., Malik, J.: Viewpoints and keypoints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1510–1519 (2015)Google Scholar
  39. 39.
    Crivellaro, A., Rad, M., Verdie, Y., Moo Yi, K., Fua, P., Lepetit, V.: A novel representation of parts for accurate 3D object detection and tracking in monocular images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4391–4399 (2015)Google Scholar
  40. 40.
    Rad, M., Lepetit, V.: BB8: a scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: International Conference on Computer Vision, vol. 1, p. 5 (2017)Google Scholar
  41. 41.
    Grabner, A., Roth, P.M., Lepetit, V.: 3D pose estimation and 3D model retrieval for objects in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3022–3031 (2018)Google Scholar
  42. 42.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  43. 43.
    Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol. 4 (2012)Google Scholar
  44. 44.
    Mardia, K.V., Jupp, P.E.: Directional Statistics, vol. 494. Wiley, Hoboken (2009)zbMATHGoogle Scholar
  45. 45.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
  46. 46.
    Rezende, D.J., Mohamed, S., Wierstra, D.: Stochastic backpropagation and approximate inference in deep generative models. arXiv preprint arXiv:1401.4082 (2014)
  47. 47.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: Advances in Neural Information Processing Systems, pp. 3483–3491 (2015)Google Scholar
  48. 48.
    Doersch, C.: Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908 (2016)
  49. 49.
    Burda, Y., Grosse, R., Salakhutdinov, R.: Importance weighted autoencoders. arXiv preprint arXiv:1509.00519 (2015)
  50. 50.
    Premachandran, V., Tarlow, D., Batra, D.: Empirical minimum Bayes risk prediction: how to extract an extra few % performance from vision models with just three more parameters. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1043–1050 (2014)Google Scholar
  51. 51.
    Bouchacourt, D., Mudigonda, P.K., Nowozin, S.: DISCO nets: DISsimilarity COefficients networks. In: Advances in Neural Information Processing Systems, pp. 352–360 (2016)Google Scholar
  52. 52.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  53. 53.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(Feb), 281–305 (2012)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Benfold, B., Reid, I.: Stable multi-target tracking in real-time surveillance video. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3457–3464. IEEE (2011)Google Scholar
  55. 55.
    Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  56. 56.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  57. 57.
    Good, I.J.: Rational decisions. J. R. Stat. Soc. Ser. B (Methodol.) 107–114 (1952)Google Scholar
  58. 58.
    Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102(477), 359–378 (2007)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Max Planck Institute for Intelligent SystemsTübingenGermany
  2. 2.AmazonTübingenGermany
  3. 3.Microsoft ResearchCambridgeUK

Personalised recommendations