A Sequential Approach to 3D Human Pose Estimation: Separation of Localization and Identification of Body Joints

  • Ho Yub Jung
  • Yumin Suh
  • Gyeongsik Moon
  • Kyoung Mu LeeEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9909)


In this paper, we propose a new approach to 3D human pose estimation from a single depth image. Conventionally, 3D human pose estimation is formulated as a detection problem of the desired list of body joints. Most of the previous methods attempted to simultaneously localize and identify body joints, with the expectation that the accomplishment of one task would facilitate the accomplishment of the other. However, we believe that identification hampers localization; therefore, the two tasks should be solved separately for enhanced pose estimation performance. We propose a two-stage framework that initially estimates all the locations of joints and subsequently identifies the estimated joints for a specific pose. The locations of joints are estimated by regressing K closest joints from every pixel with the use of a random tree. The identification of joints are realized by transferring labels from a retrieved nearest exemplar model. Once the 3D configuration of all the joints is derived, identification becomes much easier than when it is done simultaneously with localization, exploiting the reduced solution space. Our proposed method achieves significant performance gain on pose estimation accuracy, thereby improving both localization and identification. Experimental results show that the proposed method exhibits an accuracy significantly higher than those of previous approaches that simultaneously localize and identify the body parts.


Depth camera Human pose Regression forest 



This work was supported by Hankuk University of Foreign Studies Research Fund of 2016.


  1. 1.
    Romero, J., Kjellstrom, H., Kragic, D.: Monocular real-time 3D articulated hand pose estimation. In: Humanoids (2009)Google Scholar
  2. 2.
    Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A.: Real-time human pose recognition in parts from a single depth image. In: CVPR (2011)Google Scholar
  3. 3.
    Girshick, R., Shotton, J., Kohli, P., Criminisi, A., Fitzgibbon, A.: Efficient regression of general-activity human poses from depth images. In: ICCV (2011)Google Scholar
  4. 4.
    Sun, M., Kohli, P., Shotton, J.: Conditional regression forests for human pose estimation. In: CVPR (2012)Google Scholar
  5. 5.
    Yub Jung, H., Lee, S., Seok Heo, Y., Dong Yun, I.: Random tree walk toward instantaneous 3d human pose estimation. In: CVPR (2015)Google Scholar
  6. 6.
    Wei, X., Zhang, P., Chai, J.: Accurate realtime full-body motion capture using a single depth camera. In: SIGGRAPH ASIA (2012)Google Scholar
  7. 7.
    Helten, T., Baak, A., Bharaj, G., Muller, M., Seidel, H., Theobalt, C.: Personalization and evaluation of a real-time depth-based full body tracker. In: 3DV (2014)Google Scholar
  8. 8.
    Gall, J., Stoll, C., de Auiar, E., Theobalt, C., Rosenhahn, B., Seidel, H.P.: Motion capture using joint skeleton tracking and surface estimation. In: CVPR (2009)Google Scholar
  9. 9.
    Grest, D., Krüger, V., Koch, R.: Single view motion tracking by depth and Silhouette information. In: Ersbøll, B.K., Pedersen, K.S. (eds.) SCIA 2007. LNCS, vol. 4522, pp. 719–729. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-73040-8_73 CrossRefGoogle Scholar
  10. 10.
    Ionescu, C., Carreira, J., Sminchisescu, C.: Iterated second-order label sensitive pooling for 3d human pose estimation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1661–1668 (2014)Google Scholar
  11. 11.
    Ye, M., Yang, R.: Real-time simulataneous pose and shape estimation for articulated objects using a single depth camera. In: CVPR (2014)Google Scholar
  12. 12.
    Criminisi, A., Shotton, J.: Decision forests for computer vision and medical image analysis. Springer Science & Business Media (2013)Google Scholar
  13. 13.
    Breiman, L.: Random forest. Mach. Learn. 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  14. 14.
    Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: ICCV (2011)Google Scholar
  15. 15.
    Liang, H., Yuan, J., Thalmann, D., Zhang, Z.: Model-based hand pose estimation via spatial-temporal hand parsing and 3d fingertip localization. In: The Visual Computer (2013)Google Scholar
  16. 16.
    Qian, C., Sun, X., Wei, Y., Tang, X., Sun, J.: Realtime and robust hand tracking from depth. In: CVPR (2014)Google Scholar
  17. 17.
    Plagemann, C., Ganapathi, V., Koller, D., Thrun, S.: Real-time identification and localization of body parts from depth images. In: ICRA (2010)Google Scholar
  18. 18.
    Agarwal, A., Triggs, B.: 3d human pose from silhouettes by relevance vector regression. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, vol. 2, p. II-882. IEEE (2004)Google Scholar
  19. 19.
    Zhang, Z., Liu, Z., Zhang, Z., Zhao, Q.: Semantic saliency driven camera control for personal remote collaboration. In: 2008 IEEE 10th Workshop on Multimedia Signal Processing (2008)Google Scholar
  20. 20.
    Chang, X., Yang, Y., Xing, E., Yu, Y.: Complex event detection using semantic saliency and nearly-isotonic SVM. In: ICML (2015)Google Scholar
  21. 21.
    Gall, J., Lempitsky, V.: Class-specific hough forests for object detection. In: PAMI (2009)Google Scholar
  22. 22.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. Appl. Stat. 28(1), 100–108 (1979)CrossRefzbMATHGoogle Scholar
  23. 23.
    Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. In: Pattern Analysis and Machine Intelligence (1991)Google Scholar
  24. 24.
    Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real-time human pose tracking from range data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 738–751. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_53 Google Scholar
  25. 25.
    Guha, S., Rastogi, R., Shim, K.: Cure: an efficient clustering algorithm for large databases. In: ACM SIGMOD Record (1998)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Ho Yub Jung
    • 2
  • Yumin Suh
    • 1
  • Gyeongsik Moon
    • 1
  • Kyoung Mu Lee
    • 1
    Email author
  1. 1.Department of ECE, ASRISeoul National UniversitySeoulKorea
  2. 2.Division of CESEHankuk University of Foreign StudiesYongin-siKorea

Personalised recommendations