Advertisement

Can 3D Pose Be Learned from 2D Projections Alone?

  • Dylan DroverEmail author
  • Rohith M. VEmail author
  • Ching-Hang ChenEmail author
  • Amit AgrawalEmail author
  • Ambrish TyagiEmail author
  • Cong Phuoc HuynhEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

3D pose estimation from a single image is a challenging task in computer vision. We present a weakly supervised approach to estimate 3D pose points, given only 2D pose landmarks. Our method does not require correspondences between 2D and 3D points to build explicit 3D priors. We utilize an adversarial framework to impose a prior on the 3D structure, learned solely from their random 2D projections. Given a set of 2D pose landmarks, the generator network hypothesizes their depths to obtain a 3D skeleton. We propose a novel Random Projection layer, which randomly projects the generated 3D skeleton and sends the resulting 2D pose to the discriminator. The discriminator improves by discriminating between the generated poses and pose samples from a real distribution of 2D poses. Training does not require correspondence between the 2D inputs to either the generator or the discriminator. We apply our approach to the task of 3D human pose estimation. Results on Human3.6M dataset demonstrates that our approach outperforms many previous supervised and weakly supervised approaches.

Keywords

Weakly supervised learning Generative Adversarial Networks 3D pose estimation Projective geometry 

References

  1. 1.
    Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1446–1455 (2015)Google Scholar
  2. 2.
    Amin, S., Andriluka, M., Rohrbach, M., Schiele, B.: Multi-view pictorial structures for 3d human pose estimation. In: BMVC (2013)Google Scholar
  3. 3.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: CVPR (2014)Google Scholar
  4. 4.
    Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part V. LNCS, vol. 9909, pp. 561–578. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_34CrossRefGoogle Scholar
  5. 5.
    Brau, E., Jiang, H.: 3D human pose estimation via deep learning from 2D annotations. In: Fourth International Conference on 3D Vision, pp. 582–591 (2016)Google Scholar
  6. 6.
    Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  7. 7.
    Chen, C.H., Ramanan, D.: 3D human pose estimation = 2D pose estimation + matching. In: CVPR (2017)Google Scholar
  8. 8.
    Chen, Y., Shen, C., Wei, X.S., Liu, L., Yang, J.: Adversarial posenet: a structure-aware convolutional network for human pose estimation. In: IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  9. 9.
    Fang, H.S., Xu, Y., Wang, W., Liu, X., Zhu, S.C.: Learning pose grammar to encode human body configuration for 3D pose estimation. In: AAAI Conference on Artificial Intelligence (2018)Google Scholar
  10. 10.
    Tung, H.Y.F., Harley, A.W., Seto, W., Fragkiadaki, K.: Adversarial inverse graphics networks: Learning 2d-to-3d lifting and image-to-image translation from unpaired supervision. In: IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  11. 11.
    Forsyth, D.A., Arikan, O., Ikemoto, L.: Computational Studies of Human Motion: Tracking and Motion Synthesis. Now Publishers Inc., Breda (2006)Google Scholar
  12. 12.
    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)Google Scholar
  13. 13.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988. IEEE (2017)Google Scholar
  14. 14.
    Hofmann, M., Gavrila, D.M.: Multi-view 3d human pose estimation in complex environment. In: IJCV (2012)Google Scholar
  15. 15.
    Hogg, D.: Model-based vision: a program to see a walking person. Image Vis. Comput. 1, 5–20 (1983)CrossRefGoogle Scholar
  16. 16.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)Google Scholar
  17. 17.
    Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)CrossRefGoogle Scholar
  18. 18.
    Jiang, H.: 3d human pose reconstruction using millions of exemplars. In: Pattern Recognition (ICPR), 2010 20th International Conference on (2010)Google Scholar
  19. 19.
    Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation (2010)Google Scholar
  20. 20.
    Kanazawa, A., Black, M., Jacobs, D., Malik, J.: End-to-end recovery of human shape and pose. In: TBD (2018)Google Scholar
  21. 21.
    Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. ICLR (2015). http://arxiv.org/abs/1412.6980
  22. 22.
    Li, S., Chan, A.B.: 3D human pose estimation from monocular images with deep convolutional neural network. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014, Part II. LNCS, vol. 9004, pp. 332–347. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-16808-1_23CrossRefGoogle Scholar
  23. 23.
    Li, S., Zhang, W., Chan, A.B.: Maximum-margin structured learning with deep networks for 3d human pose estimation. In: IEEE International Conference on Computer Vision (ICCV), December 2015Google Scholar
  24. 24.
    Lin, M., Lin, L., Liang, X., Wang, K., Cheng, H.: Recurrent 3d pose sequence machines. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  25. 25.
    Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. (TOG) 34(6), 248 (2015)CrossRefGoogle Scholar
  26. 26.
    Martinez, J., Hossain, R., Romero, J., Little, J.: A simple yet effective baseline for 3d human pose estimation. In: ICCV (2017)Google Scholar
  27. 27.
    Mehta, D., et al.: Monocular 3d human pose estimation in the wild using improved cnn supervision. In: International Conference on 3D Vision (3DV), pp. 506–516. IEEE (2017)Google Scholar
  28. 28.
    Mehta, D., et al.: Vnect: real-time 3d human pose estimation with a single rgb camera. ACM Trans. Graph. (TOG) 36(4), 44 (2017)CrossRefGoogle Scholar
  29. 29.
    Moeslund, T.B., Granum, E.: A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)CrossRefGoogle Scholar
  30. 30.
    Moreno-Noguer, F.: 3d human pose estimation from a single image via distance matrix regression. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  31. 31.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part VIII. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46484-8_29CrossRefGoogle Scholar
  32. 32.
    O’Rourke, J., Badler, N.I.: Model-based image analysis of human motion using constraint propagation. IEEE Trans. Pattern Anal. Mach. Intell. 6, 522–536 (1980)CrossRefGoogle Scholar
  33. 33.
    Park, S., Hwang, J., Kwak, N.: 3D human pose estimation using convolutional neural networks with 2D pose information. In: Hua, G., Jégou, H. (eds.) ECCV 2016, Part III. LNCS, vol. 9915, pp. 156–169. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49409-8_15CrossRefGoogle Scholar
  34. 34.
    Pavlakos, G., Zhou, X., Derpanis, K.G., Daniilidis, K.: Coarse-to-fine volumetric prediction for single-image 3d human pose. In: CVPR, July 2017Google Scholar
  35. 35.
    Rafi, U., Gall, J., Leibe, B.: A semantic occlusion model for human pose estimation from a single depth image. In: Proceedings of the IEEE Conference on CVPR Workshops (2015)Google Scholar
  36. 36.
    Ramakrishna, V., Kanade, T., Sheikh, Y.: Reconstructing 3D human pose from 2D image landmarks. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. LNCS, vol. 7575, pp. 573–586. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33765-9_41CrossRefGoogle Scholar
  37. 37.
    Rogez, G., Weinzaepfel, P., Schmid, C.: Lcr-net: localization-classification-regression for human pose. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  38. 38.
    Shotton, J., et al.: Real-time human pose recognition in parts from single depth images. Commun. ACM 56(1), 116–124 (2013)CrossRefGoogle Scholar
  39. 39.
    Tekin, B., Marquez-Neila, P., Salzmann, M., Fua, P.: Learning to fuse 2d and 3d image cues for monocular body pose estimation. In: IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  40. 40.
    Tekin, B., Rozantsev, A., Lepetit, V., Fua, P.: Direct prediction of 3d body poses from motion compensated sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 991–1000 (2016)Google Scholar
  41. 41.
    Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  42. 42.
    Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR, pp. 4724–4732 (2016)Google Scholar
  43. 43.
    Wu, J., et al.: Single image 3d interpreter network. In: ECCV, pp. 365–382 (2016)Google Scholar
  44. 44.
    Xiaohan Nie, B., Wei, P., Zhu, S.C.: Monocular 3d human pose estimation by predicting depth on joints. In: IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  45. 45.
    Yasin, H., Iqbal, U., Kruger, B., Weber, A., Gall, J.: A dual-source approach for 3d pose estimation from a single image. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  46. 46.
    Yub Jung, H., Lee, S., Seok Heo, Y., Dong Yun, I.: Random tree walk toward instantaneous 3d human pose estimation. In: CVPR (2015)Google Scholar
  47. 47.
    Zhou, X., Zhu, M., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Sparseness meets deepness: 3d human pose estimation from monocular video. In: CVPR (2016)Google Scholar
  48. 48.
    Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Monocap: monocular human motion capture using a CNN coupled with a geometric prior. In: TBD (2017)Google Scholar
  49. 49.
    Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: IEEE International Conference on Computer Vision (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Amazon Lab126 Inc.SunnyvaleUSA

Personalised recommendations