Human Pose Estimation and Tracking

Zanuttigh, Pietro; Marin, Giulio; Dal Mutto, Carlo; Dominio, Fabio; Minto, Ludovico; Cortelazzo, Guido Maria

doi:10.1007/978-3-319-30973-6_8

Pietro Zanuttigh⁷,
Giulio Marin⁷,
Carlo Dal Mutto⁸,
Fabio Dominio⁷,
Ludovico Minto⁷ &
…
Guido Maria Cortelazzo⁹

3198 Accesses

Abstract

Human pose estimation and tracking is one of the most intriguing yet challenging applications of consumer depth cameras. After an overview of common human hand and body models, we introduce approaches for pose recovery from a single frame, starting from the popular method based on Random Decision Forests proposed by Shotton et al. and used for the Microsoft Kinect. Various pose tracking approaches are then presented to recover the human pose configuration over time from a sequence of frames. We discuss some of the main solutions available today, including algorithms based on numerical optimization methods, filtering approaches, and recent advances based on Markov Random Fields.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

A. Agarwal, B. Triggs, Recovering 3D human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28(1), 44–58 (2006)
Google Scholar
I. Albrecht, J. Haber, H.P. Seidel, Construction and animation of anatomically based human hand models, in Proceedings of ACM SIGGRAPH (Aire-la-Ville, 2003), pp. 98–109
Google Scholar
M.S. Arulampalam, S. Maskell, N. Gordon, T. Clapp, A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002)
Google Scholar
C. Barrón, I.A. Kakadiaris, Estimating anthropometry and pose from a single uncalibrated image. Comput. Vis. Image Underst. 81(3), 269–284 (2001)
Google Scholar
A. Bottino, A. Laurentini, A silhouette based technique for the reconstruction of human movement. Comput. Vis. Image Underst. 83(1), 79–95 (2001)
Google Scholar
M. Bray, E. Koller-Meier, P. Muller, L. Van Gool, N.N. Schraudolph, 3D hand tracking by rapid stochastic gradient descent using a skinning model, in Proceedings of IEEE European Conference on Visual Media Production (2004), pp. 59–68
Google Scholar
M. Bray, E. Koller-Meier, L. Van Gool, Smart particle filtering for 3D hand tracking, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (Washington, 2004), pp. 675–680
Google Scholar
P. Breuer, C. Eckes, S. Muller, Hand gesture recognition with a novel ir time-of-flight range camera: a pilot study, in Proceedings of International Conference on Computer Vision/Computer Graphics Collaboration Techniques (Springer, Berlin, 2007), pp. 247–260
Google Scholar
N.G. Cho, A.L. Yuille, S.W. Lee, Adaptive occlusion state estimation for human pose tracking under self-occlusions. Pattern Recogn. 46(3), 649–661 (2013)
Google Scholar
D. Comaniciu, P. Meer, Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 22, 603–619 (2002)
Google Scholar
Q. Delamarre, O. Faugeras, 3D articulated models and multiview tracking with physical forces. Comput. Vis. Image Underst. 81(3), 328–357 (2001)
Google Scholar
J. Deutscher, A. Blake, I. Reid, Articulated body motion capture by annealed particle filtering, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2000), pp. 126–133
Google Scholar
G. Dewaele, F. Devernay, R. Horaud, Hand motion from 3D point trajectories and a smooth surface model, in Proceedings of IEEE European Conference on Computer Vision (Springer, Berlin/Heidelberg, 2004), pp. 495–507
Google Scholar
M.A. Fischler, R.A. Elschlager, The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)
Google Scholar
A. Fossati, J. Gall, H. Grabner, X. Ren, K. Konolige, Consumer Depth Cameras for Computer Vision: Research Topics and Applications (Springer, London, 2012)
Google Scholar
V. Frati, D. Prattichizzo, Using kinect for hand tracking and rendering in wearable haptics, in Proceedings of IEEE World Haptics Conference (2011), pp. 317–321
Google Scholar
V. Ganapathi, C. Plagemann, D. Koller, S. Thrun, Real-time human pose tracking from range data, in Proceedings of IEEE European Conference on Computer Vision (Springer, Berlin/Heidelberg, 2012), pp. 738–751
Google Scholar
D.M. Gavrila, L.S. Davis, 3-D model-based tracking of humans in action: a multi-view approach, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1996), pp. 73–80
Google Scholar
R. Girshick, J. Shotton, P. Kohli, A. Criminisi, A. Fitzgibbon, Efficient regression of general-activity human poses from depth images, in Proceedings of IEEE International Conference on Computer Vision (Washington, 2011), pp. 415–422
Google Scholar
D. Grest, J. Woetzel, R. Koch, Nonlinear body pose estimation from depth images, in Proceedings of DAGM Conference on Pattern Recognition (Springer, Berlin/Heidelberg, 2005), pp. 285–292
Google Scholar
P. Guan, A. Weiss, A.O. Balan, M.J. Black, Estimating human shape and pose from a single image, in Proceedings of IEEE International Conference on Computer Vision (2009), pp. 1381–1388
Google Scholar
H. Hamer, K. Schindler, E. Koller-Meier, L.J. Van Gool, Tracking a hand manipulating an object, in Proceedings of IEEE International Conference on Computer Vision (Kyoto, 2009), pp. 1475–1482
Google Scholar
T.K. Ho, Random decision forests, in Proceedings of International Conference on Document Analysis and Recognition (1995), pp. 278–282
Google Scholar
D. Hogg, Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983)
Google Scholar
S.X. Ju, M.J. Black, Y. Yacoob, Cardboard people: a parameterized model of articulated image motion, in Proceedings of IEEE International Conference on Automatic Face and Gesture Recognition (1996), pp. 38–44
Google Scholar
I.A. Kakadiaris, D. Metaxas, Three-dimensional human body model acquisition from multiple views. Int. J. Comput. Vis. 30(3), 191–218 (1998)
Google Scholar
R. Kehl, L. Van Gool, Markerless tracking of complex human motions from multiple views. Comput. Vis. Image Underst. 104(2), 190–209 (2006)
Google Scholar
C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Real time hand pose estimation using depth sensors, in Proceedings of IEEE International Conference on Computer Vision Workshops (2011), pp. 1228–1234
Google Scholar
C. Keskin, F. Kıraç, Y.E. Kara, L. Akarun, Hand pose estimation and hand shape classification using multi-layered randomized decision forests, in Proceedings of IEEE European Conference on Computer Vision (2012)
Google Scholar
S. Knoop, S. Vacek, R. Dillmann, Sensor fusion for 3D human body tracking with an articulated 3D body model, in Proceedings of IEEE International Conference on Robotics and Automation (Orlando, 2006), pp. 1686–1691
Google Scholar
J.J. Kuch, T.S. Huang, Human computer interaction via the human hand: a hand model, in Proceedings of Asilomar Conference on Signals, Systems and Computers (1994), pp. 1252–1256
Google Scholar
J. Lee, T.L. Kunii, Constraint-based hand animation, in Models and Techniques in Computer Animation, ed. by N.M. Thalmann, D. Thalmann. Computer Animation Series (Springer, Tokyo, 1993), pp. 110–127
Google Scholar
J.P. Lewis, M. Cordner, N. Fong, Pose space deformation: a unified approach to shape interpolation and skeleton-driven deformation, in Proceedings of ACM SIGGRAPH (New York, 2000), pp. 165–172
Google Scholar
S.Z. Li, Markov Random Field Modeling in Image Analysis, 3rd edn. (Springer, New York, 2009)
Google Scholar
T. Liu, W. Liang, X. Wu, L. Chen, Tracking articulated hand underlying graphical model with depth cue, in Proceedings of IEEE International Congress on Image and Signal Processing (Washington, 2008), pp. 249–253
Google Scholar
Z. Liu, J. Zhu, J. Bu, C. Chen, A survey of human pose estimation: the body parts parsing based methods. J. Vis. Commun. Image Represent. 32, 10–19 (2015)
Google Scholar
T.B. Moeslund, E. Granum, A survey of computer vision-based human motion capture. Comput. Vis. Image Underst. 81(3), 231–268 (2001)
Google Scholar
A. Mohr, M. Gleicher, Building efficient, accurate character skins from examples, in Proceedings of ACM SIGGRAPH (New York, 2003), pp. 562–568
Google Scholar
D.D. Morris, J. Rehg, Singularity analysis for articulated object tracking, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (1998), pp. 289–296
Google Scholar
I. Oikonomidis, N. Kyriazis, A. Argyros, Efficient model-based 3D tracking of hand articulations using kinect, in Proceedings of British Machine Vision Conference (BMVA, Dundee, 2011), pp. 101.1–101.11
Google Scholar
J. O’Rourke, N.I. Badler, Model-based image analysis of human motion using constraint propagation. IEEE Trans. Pattern Anal. Mach. Intell. 2(6), 522–536 (1980)
Google Scholar
S. Park, S. Yu, J. Kim, S. Kim, S. Lee, 3D hand tracking using kalman filter in depth space. EURASIP J. Adv. Signal Process. 2012(1) (2012)
Google Scholar
C. Plagemann, V. Ganapathi, D. Koller, S. Thrun, Real-time identification and localization of body parts from depth images, in Proceedings of IEEE International Conference on Robotics and Automation (2010), pp. 3108–3113
Google Scholar
J. Rehg, Visual Analysis of High DOF Articulated Objects with Application to Hand Tracking. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh (1995)
Google Scholar
J.M. Rehg, T. Kanade, Digiteyes: vision-based hand tracking for human-computer interaction, in Proceedings of IEEE Workshop on Motion of Non-rigid and Articulated Objects (1994), pp. 16–22
Google Scholar
K. Rohr, Towards model-based recognition of human movements in image sequences. CVGIP Image Underst. 59(1), 94–115 (1994)
Google Scholar
G. Shakhnarovich, P. Viola, T. Darrell, Fast pose estimation with parameter-sensitive hashing, in Proceedings of IEEE International Conference on Computer Vision (Washington, 2003), p. 750
Google Scholar
J. Shotton, Conditional regression forests for human pose estimation, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Washington, 2012), pp. 3394–3401
Google Scholar
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, A. Blake, Real-time human pose recognition in parts from single depth images, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (Washington, 2011), pp. 1297–1304
Google Scholar
J. Shotton, R. Girshick, A. Fitzgibbon, T. Sharp, M. Cook, M. Finocchio, R. Moore, P. Kohli, A. Criminisi, A. Kipman, Efficient human pose estimation from single depth images. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2821–2840 (2013)
Google Scholar
H. Sidenbladh, M. J. Black, D. J. Fleet, Stochastic tracking of 3D human figures using 2D image motion, in Proceedings of IEEE European Conference on Computer Vision (Springer, London, 2000), pp. 702–718
Google Scholar
L. Sigal, Human pose estimation, in Computer Vision, ed. by K. Ikeuchi (Springer, New York, 2014), pp. 362–370
Google Scholar
L. Sigal, M. Isard, B.H. Sigelman, M.J. Black, Attractive people: assembling loose-limbed models using non-parametric belief propagation, in Proceedings of Conference on Neural Information Processing Systems (Cambridge, 2003), pp. 1539–1546
Google Scholar
B. Stenger, P.R.S. Mendona, R. Cipolla, Model-based 3D tracking of an articulated hand, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (2001)
Google Scholar
D. Tang, T.H. Yu, T.K. Kim, Real-time articulated hand pose estimation using semi-supervised transductive regression forests, in Proceedings of IEEE International Conference on Computer Vision (2013), pp. 3224–3231
Google Scholar
M. Ye, X. Wang, R. Yang, L. Ren, M. Pollefeys, Accurate 3D pose estimation from a single depth image, in Proceedings of IEEE International Conference on Computer Vision (2011), pp. 731–738
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, University of Padova, Padova, Italy
Pietro Zanuttigh, Giulio Marin, Fabio Dominio & Ludovico Minto
Aquifi Inc., Palo Alto, CA, USA
Carlo Dal Mutto
3D Everywhere s.r.l., Padova, Italy
Guido Maria Cortelazzo

Authors

Pietro Zanuttigh
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Marin
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Dal Mutto
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Dominio
View author publications
You can also search for this author in PubMed Google Scholar
Ludovico Minto
View author publications
You can also search for this author in PubMed Google Scholar
Guido Maria Cortelazzo
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zanuttigh, P., Marin, G., Dal Mutto, C., Dominio, F., Minto, L., Cortelazzo, G.M. (2016). Human Pose Estimation and Tracking. In: Time-of-Flight and Structured Light Depth Cameras. Springer, Cham. https://doi.org/10.1007/978-3-319-30973-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-30973-6_8
Published: 25 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30971-2
Online ISBN: 978-3-319-30973-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics