Skip to main content

Human Pose Estimation and Tracking

  • Chapter
  • First Online:
Time-of-Flight and Structured Light Depth Cameras

Abstract

Human pose estimation and tracking is one of the most intriguing yet challenging applications of consumer depth cameras. After an overview of common human hand and body models, we introduce approaches for pose recovery from a single frame, starting from the popular method based on Random Decision Forests proposed by Shotton et al. and used for the Microsoft Kinect. Various pose tracking approaches are then presented to recover the human pose configuration over time from a sequence of frames. We discuss some of the main solutions available today, including algorithms based on numerical optimization methods, filtering approaches, and recent advances based on Markov Random Fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. A. Agarwal, B. Triggs, Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006)

    Google Scholar 

  2. I. Albrecht, J. Haber, H.P. Seidel, Construction and animation of anatomically based human hand models, in Proceedings of ACM SIGGRAPH (Aire-la-Ville, 2003), pp. 98–109

    Google Scholar 

  3. M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002)

    Google Scholar 

  4. C. Barrón, I.A. Kakadiaris, Estimating anthropometry and pose from a single uncalibrated image. Comput. Vis. Image Underst. 81(3), 269–284 (2001)

    Google Scholar 

  5. A. Bottino, A. Laurentini, A silhouette based technique for the reconstruction of human movement. Comput. Vis. Image Underst. 83(1), 79–95 (2001)

    Google Scholar 

  6. M. Bray, E. Koller-Meier, P. Muller, L. Van Gool, N.N. Schraudolph, 3D hand tracking by rapid stochastic gradient descent using a skinning model, in Proceedings of IEEE European Conference on Visual Media Production (2004), pp. 59–68

    Google Scholar 

  7. M. Bray, E. Koller-Meier, L. Van Gool, Smart particle filtering for 3D hand tracking, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (Washington, 2004), pp. 675–680

    Google Scholar 

  8. P. Breuer, C. Eckes, S. Muller, Hand gesture recognition with a novel ir time-of-flight range camera: a pilot study, in Proceedings of International Conference on Computer Vision/Computer Graphics Collaboration Techniques (Springer, Berlin, 2007), pp. 247–260

    Google Scholar 

  9. N.G. Cho, A.L. Yuille, S.W. Lee, Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46(3), 649–661 (2013)

    Google Scholar 

  10. D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 22, 603–619 (2002)

    Google Scholar 

  11. Q. Delamarre, O. Faugeras, 3D articulated models and multiview tracking with physical forces. Comput. Vis. Image Underst. 81(3), 328–357 (2001)

    Google Scholar 

  12. J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by annealed particle filtering, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2000), pp. 126–133

    Google Scholar 

  13. G. Dewaele, F. Devernay, R. Horaud, Hand motion from 3D point trajectories and a smooth surface model, in Proceedings of IEEE European Conference on Computer Vision (Springer, Berlin/Heidelberg, 2004), pp. 495–507

    Google Scholar 

  14. M.A. Fischler, R.A. Elschlager, The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)

    Google Scholar 

  15. A. Fossati, J. Gall, H. Grabner, X. Ren, K. Konolige, Consumer Depth Cameras for Computer Vision: Research Topics and Applications (Springer, London, 2012)

    Google Scholar 

  16. V. Frati, D. Prattichizzo, Using kinect for hand tracking and rendering in wearable haptics, in Proceedings of IEEE World Haptics Conference (2011), pp. 317–321

    Google Scholar 

  17. V. Ganapathi, C. Plagemann, D. Koller, S. Thrun, Real-time human pose tracking from range data, in Proceedings of IEEE European Conference on Computer Vision (Springer, Berlin/Heidelberg, 2012), pp. 738–751

    Google Scholar 

  18. D.M. Gavrila, L.S. Davis, 3-D model-based tracking of humans in action: a multi-view approach, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1996), pp. 73–80

    Google Scholar 

  19. R. Girshick, J. Shotton, P. Kohli, A. Criminisi, A. Fitzgibbon, Efficient regression of general-activity human poses from depth images, in Proceedings of IEEE International Conference on Computer Vision (Washington, 2011), pp. 415–422

    Google Scholar 

  20. D. Grest, J. Woetzel, R. Koch, Nonlinear body pose estimation from depth images, in Proceedings of DAGM Conference on Pattern Recognition (Springer, Berlin/Heidelberg, 2005), pp. 285–292

    Google Scholar 

  21. P. Guan, A. Weiss, A.O. Balan, M.J. Black, Estimating human shape and pose from a single image, in Proceedings of IEEE International Conference on Computer Vision (2009), pp. 1381–1388

    Google Scholar 

  22. H. Hamer, K. Schindler, E. Koller-Meier, L.J. Van Gool, Tracking a hand manipulating an object, in Proceedings of IEEE International Conference on Computer Vision (Kyoto, 2009), pp. 1475–1482

    Google Scholar 

  23. T.K. Ho, Random decision forests, in Proceedings of International Conference on Document Analysis and Recognition (1995), pp. 278–282

    Google Scholar 

  24. D. Hogg, Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983)

    Google Scholar 

  25. S.X. Ju, M.J. Black, Y. Yacoob, Cardboard people: a parameterized model of articulated image motion, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (1996), pp. 38–44

    Google Scholar 

  26. I.A. Kakadiaris, D. Metaxas, Three-dimensional human body model acquisition from multiple views. Int. J. Comput. Vis. 30(3), 191–218 (1998)

    Google Scholar 

  27. R. Kehl, L. Van Gool, Markerless tracking of complex human motions from multiple views. Comput. Vis. Image Underst. 104(2), 190–209 (2006)

    Google Scholar 

  28. C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Real time hand pose estimation using depth sensors, in Proceedings of IEEE International Conference on Computer Vision Workshops (2011), pp. 1228–1234

    Google Scholar 

  29. C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Hand pose estimation and hand shape classification using multi-layered randomized decision forests, in Proceedings of IEEE European Conference on Computer Vision (2012)

    Google Scholar 

  30. S. Knoop, S. Vacek, R. Dillmann, Sensor fusion for 3D human body tracking with an articulated 3D body model, in Proceedings of IEEE International Conference on Robotics and Automation (Orlando, 2006), pp. 1686–1691

    Google Scholar 

  31. J.J. Kuch, T.S. Huang, Human computer interaction via the human hand: a hand model, in Proceedings of Asilomar Conference on Signals, Systems and Computers (1994), pp. 1252–1256

    Google Scholar 

  32. J. Lee, T.L. Kunii, Constraint-based hand animation, in Models and Techniques in Computer Animation, ed. by N.M. Thalmann, D. Thalmann. Computer Animation Series (Springer, Tokyo, 1993), pp. 110–127

    Google Scholar 

  33. J.P. Lewis, M. Cordner, N. Fong, Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation, in Proceedings of ACM SIGGRAPH (New York, 2000), pp. 165–172

    Google Scholar 

  34. S.Z. Li, Markov Random Field Modeling in Image Analysis, 3rd edn. (Springer, New York, 2009)

    Google Scholar 

  35. T. Liu, W. Liang, X. Wu, L. Chen, Tracking articulated hand underlying graphical model with depth cue, in Proceedings of IEEE International Congress on Image and Signal Processing (Washington, 2008), pp. 249–253

    Google Scholar 

  36. Z. Liu, J. Zhu, J. Bu, C. Chen, A survey of human pose estimation: the body parts parsing based methods. J. Vis. Commun. Image Represent. 32, 10–19 (2015)

    Google Scholar 

  37. T.B. Moeslund, E. Granum, A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)

    Google Scholar 

  38. A. Mohr, M. Gleicher, Building efficient, accurate character skins from examples, in Proceedings of ACM SIGGRAPH (New York, 2003), pp. 562–568

    Google Scholar 

  39. D.D. Morris, J. Rehg, Singularity analysis for articulated object tracking, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1998), pp. 289–296

    Google Scholar 

  40. I. Oikonomidis, N. Kyriazis, A. Argyros, Efficient model-based 3D tracking of hand articulations using kinect, in Proceedings of British Machine Vision Conference (BMVA, Dundee, 2011), pp. 101.1–101.11

    Google Scholar 

  41. J. O’Rourke, N.I. Badler, Model-based image analysis of human motion using constraint propagation. IEEE Trans. Pattern Anal. Mach. Intell. 2(6), 522–536 (1980)

    Google Scholar 

  42. S. Park, S. Yu, J. Kim, S. Kim, S. Lee, 3D hand tracking using kalman filter in depth space. EURASIP J. Adv. Signal Process. 2012(1) (2012)

    Google Scholar 

  43. C. Plagemann, V. Ganapathi, D. Koller, S. Thrun, Real-time identification and localization of body parts from depth images, in Proceedings of IEEE International Conference on Robotics and Automation (2010), pp. 3108–3113

    Google Scholar 

  44. J. Rehg, Visual Analysis of High DOF Articulated Objects with Application to Hand Tracking. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh (1995)

    Google Scholar 

  45. J.M. Rehg, T. Kanade, Digiteyes: vision-based hand tracking for human-computer interaction, in Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects (1994), pp. 16–22

    Google Scholar 

  46. K. Rohr, Towards model-based recognition of human movements in image sequences. CVGIP Image Underst. 59(1), 94–115 (1994)

    Google Scholar 

  47. G. Shakhnarovich, P. Viola, T. Darrell, Fast pose estimation with parameter-sensitive hashing, in Proceedings of IEEE International Conference on Computer Vision (Washington, 2003), p. 750

    Google Scholar 

  48. J. Shotton, Conditional regression forests for human pose estimation, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Washington, 2012), pp. 3394–3401

    Google Scholar 

  49. J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Washington, 2011), pp. 1297–1304

    Google Scholar 

  50. J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2821–2840 (2013)

    Google Scholar 

  51. H. Sidenbladh, M. J. Black, D. J. Fleet, Stochastic tracking of 3D human figures using 2D image motion, in Proceedings of IEEE European Conference on Computer Vision (Springer, London, 2000), pp. 702–718

    Google Scholar 

  52. L. Sigal, Human pose estimation, in Computer Vision, ed. by K. Ikeuchi (Springer, New York, 2014), pp. 362–370

    Google Scholar 

  53. L. Sigal, M. Isard, B.H. Sigelman, M.J. Black, Attractive people: assembling loose-limbed models using non-parametric belief propagation, in Proceedings of Conference on Neural Information Processing Systems (Cambridge, 2003), pp. 1539–1546

    Google Scholar 

  54. B. Stenger, P.R.S. Mendona, R. Cipolla, Model-based 3D tracking of an articulated hand, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2001)

    Google Scholar 

  55. D. Tang, T.H. Yu, T.K. Kim, Real-time articulated hand pose estimation using semi-supervised transductive regression forests, in Proceedings of IEEE International Conference on Computer Vision (2013), pp. 3224–3231

    Google Scholar 

  56. M. Ye, X. Wang, R. Yang, L. Ren, M. Pollefeys, Accurate 3D pose estimation from a single depth image, in Proceedings of IEEE International Conference on Computer Vision (2011), pp. 731–738

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Zanuttigh, P., Marin, G., Dal Mutto, C., Dominio, F., Minto, L., Cortelazzo, G.M. (2016). Human Pose Estimation and Tracking. In: Time-of-Flight and Structured Light Depth Cameras. Springer, Cham. https://doi.org/10.1007/978-3-319-30973-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30973-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30971-2

  • Online ISBN: 978-3-319-30973-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics