Abstract
Human pose estimation and tracking is one of the most intriguing yet challenging applications of consumer depth cameras. After an overview of common human hand and body models, we introduce approaches for pose recovery from a single frame, starting from the popular method based on Random Decision Forests proposed by Shotton et al. and used for the Microsoft Kinect. Various pose tracking approaches are then presented to recover the human pose configuration over time from a sequence of frames. We discuss some of the main solutions available today, including algorithms based on numerical optimization methods, filtering approaches, and recent advances based on Markov Random Fields.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
A. Agarwal, B. Triggs, Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006)
I. Albrecht, J. Haber, H.P. Seidel, Construction and animation of anatomically based human hand models, in Proceedings of ACM SIGGRAPH (Aire-la-Ville, 2003), pp. 98–109
M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002)
C. Barrón, I.A. Kakadiaris, Estimating anthropometry and pose from a single uncalibrated image. Comput. Vis. Image Underst. 81(3), 269–284 (2001)
A. Bottino, A. Laurentini, A silhouette based technique for the reconstruction of human movement. Comput. Vis. Image Underst. 83(1), 79–95 (2001)
M. Bray, E. Koller-Meier, P. Muller, L. Van Gool, N.N. Schraudolph, 3D hand tracking by rapid stochastic gradient descent using a skinning model, in Proceedings of IEEE European Conference on Visual Media Production (2004), pp. 59–68
M. Bray, E. Koller-Meier, L. Van Gool, Smart particle filtering for 3D hand tracking, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (Washington, 2004), pp. 675–680
P. Breuer, C. Eckes, S. Muller, Hand gesture recognition with a novel ir time-of-flight range camera: a pilot study, in Proceedings of International Conference on Computer Vision/Computer Graphics Collaboration Techniques (Springer, Berlin, 2007), pp. 247–260
N.G. Cho, A.L. Yuille, S.W. Lee, Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46(3), 649–661 (2013)
D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 22, 603–619 (2002)
Q. Delamarre, O. Faugeras, 3D articulated models and multiview tracking with physical forces. Comput. Vis. Image Underst. 81(3), 328–357 (2001)
J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by annealed particle filtering, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2000), pp. 126–133
G. Dewaele, F. Devernay, R. Horaud, Hand motion from 3D point trajectories and a smooth surface model, in Proceedings of IEEE European Conference on Computer Vision (Springer, Berlin/Heidelberg, 2004), pp. 495–507
M.A. Fischler, R.A. Elschlager, The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)
A. Fossati, J. Gall, H. Grabner, X. Ren, K. Konolige, Consumer Depth Cameras for Computer Vision: Research Topics and Applications (Springer, London, 2012)
V. Frati, D. Prattichizzo, Using kinect for hand tracking and rendering in wearable haptics, in Proceedings of IEEE World Haptics Conference (2011), pp. 317–321
V. Ganapathi, C. Plagemann, D. Koller, S. Thrun, Real-time human pose tracking from range data, in Proceedings of IEEE European Conference on Computer Vision (Springer, Berlin/Heidelberg, 2012), pp. 738–751
D.M. Gavrila, L.S. Davis, 3-D model-based tracking of humans in action: a multi-view approach, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1996), pp. 73–80
R. Girshick, J. Shotton, P. Kohli, A. Criminisi, A. Fitzgibbon, Efficient regression of general-activity human poses from depth images, in Proceedings of IEEE International Conference on Computer Vision (Washington, 2011), pp. 415–422
D. Grest, J. Woetzel, R. Koch, Nonlinear body pose estimation from depth images, in Proceedings of DAGM Conference on Pattern Recognition (Springer, Berlin/Heidelberg, 2005), pp. 285–292
P. Guan, A. Weiss, A.O. Balan, M.J. Black, Estimating human shape and pose from a single image, in Proceedings of IEEE International Conference on Computer Vision (2009), pp. 1381–1388
H. Hamer, K. Schindler, E. Koller-Meier, L.J. Van Gool, Tracking a hand manipulating an object, in Proceedings of IEEE International Conference on Computer Vision (Kyoto, 2009), pp. 1475–1482
T.K. Ho, Random decision forests, in Proceedings of International Conference on Document Analysis and Recognition (1995), pp. 278–282
D. Hogg, Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983)
S.X. Ju, M.J. Black, Y. Yacoob, Cardboard people: a parameterized model of articulated image motion, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (1996), pp. 38–44
I.A. Kakadiaris, D. Metaxas, Three-dimensional human body model acquisition from multiple views. Int. J. Comput. Vis. 30(3), 191–218 (1998)
R. Kehl, L. Van Gool, Markerless tracking of complex human motions from multiple views. Comput. Vis. Image Underst. 104(2), 190–209 (2006)
C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Real time hand pose estimation using depth sensors, in Proceedings of IEEE International Conference on Computer Vision Workshops (2011), pp. 1228–1234
C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Hand pose estimation and hand shape classification using multi-layered randomized decision forests, in Proceedings of IEEE European Conference on Computer Vision (2012)
S. Knoop, S. Vacek, R. Dillmann, Sensor fusion for 3D human body tracking with an articulated 3D body model, in Proceedings of IEEE International Conference on Robotics and Automation (Orlando, 2006), pp. 1686–1691
J.J. Kuch, T.S. Huang, Human computer interaction via the human hand: a hand model, in Proceedings of Asilomar Conference on Signals, Systems and Computers (1994), pp. 1252–1256
J. Lee, T.L. Kunii, Constraint-based hand animation, in Models and Techniques in Computer Animation, ed. by N.M. Thalmann, D. Thalmann. Computer Animation Series (Springer, Tokyo, 1993), pp. 110–127
J.P. Lewis, M. Cordner, N. Fong, Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation, in Proceedings of ACM SIGGRAPH (New York, 2000), pp. 165–172
S.Z. Li, Markov Random Field Modeling in Image Analysis, 3rd edn. (Springer, New York, 2009)
T. Liu, W. Liang, X. Wu, L. Chen, Tracking articulated hand underlying graphical model with depth cue, in Proceedings of IEEE International Congress on Image and Signal Processing (Washington, 2008), pp. 249–253
Z. Liu, J. Zhu, J. Bu, C. Chen, A survey of human pose estimation: the body parts parsing based methods. J. Vis. Commun. Image Represent. 32, 10–19 (2015)
T.B. Moeslund, E. Granum, A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)
A. Mohr, M. Gleicher, Building efficient, accurate character skins from examples, in Proceedings of ACM SIGGRAPH (New York, 2003), pp. 562–568
D.D. Morris, J. Rehg, Singularity analysis for articulated object tracking, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1998), pp. 289–296
I. Oikonomidis, N. Kyriazis, A. Argyros, Efficient model-based 3D tracking of hand articulations using kinect, in Proceedings of British Machine Vision Conference (BMVA, Dundee, 2011), pp. 101.1–101.11
J. O’Rourke, N.I. Badler, Model-based image analysis of human motion using constraint propagation. IEEE Trans. Pattern Anal. Mach. Intell. 2(6), 522–536 (1980)
S. Park, S. Yu, J. Kim, S. Kim, S. Lee, 3D hand tracking using kalman filter in depth space. EURASIP J. Adv. Signal Process. 2012(1) (2012)
C. Plagemann, V. Ganapathi, D. Koller, S. Thrun, Real-time identification and localization of body parts from depth images, in Proceedings of IEEE International Conference on Robotics and Automation (2010), pp. 3108–3113
J. Rehg, Visual Analysis of High DOF Articulated Objects with Application to Hand Tracking. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh (1995)
J.M. Rehg, T. Kanade, Digiteyes: vision-based hand tracking for human-computer interaction, in Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects (1994), pp. 16–22
K. Rohr, Towards model-based recognition of human movements in image sequences. CVGIP Image Underst. 59(1), 94–115 (1994)
G. Shakhnarovich, P. Viola, T. Darrell, Fast pose estimation with parameter-sensitive hashing, in Proceedings of IEEE International Conference on Computer Vision (Washington, 2003), p. 750
J. Shotton, Conditional regression forests for human pose estimation, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Washington, 2012), pp. 3394–3401
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Washington, 2011), pp. 1297–1304
J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2821–2840 (2013)
H. Sidenbladh, M. J. Black, D. J. Fleet, Stochastic tracking of 3D human figures using 2D image motion, in Proceedings of IEEE European Conference on Computer Vision (Springer, London, 2000), pp. 702–718
L. Sigal, Human pose estimation, in Computer Vision, ed. by K. Ikeuchi (Springer, New York, 2014), pp. 362–370
L. Sigal, M. Isard, B.H. Sigelman, M.J. Black, Attractive people: assembling loose-limbed models using non-parametric belief propagation, in Proceedings of Conference on Neural Information Processing Systems (Cambridge, 2003), pp. 1539–1546
B. Stenger, P.R.S. Mendona, R. Cipolla, Model-based 3D tracking of an articulated hand, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2001)
D. Tang, T.H. Yu, T.K. Kim, Real-time articulated hand pose estimation using semi-supervised transductive regression forests, in Proceedings of IEEE International Conference on Computer Vision (2013), pp. 3224–3231
M. Ye, X. Wang, R. Yang, L. Ren, M. Pollefeys, Accurate 3D pose estimation from a single depth image, in Proceedings of IEEE International Conference on Computer Vision (2011), pp. 731–738
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Zanuttigh, P., Marin, G., Dal Mutto, C., Dominio, F., Minto, L., Cortelazzo, G.M. (2016). Human Pose Estimation and Tracking. In: Time-of-Flight and Structured Light Depth Cameras. Springer, Cham. https://doi.org/10.1007/978-3-319-30973-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-30973-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30971-2
Online ISBN: 978-3-319-30973-6
eBook Packages: Computer ScienceComputer Science (R0)