Abstract
We address the problem of 2D and 3D human pose estimation using monocular camera information only. Generative approaches usually consist of two computationally demanding steps. First, different configurations of a complex 3D body model are projected into the image plane. Second, the projected synthetic person images and images of real persons are compared on a feature basis, like silhouettes or edges. In order to lower the computational costs of generative models, we propose to use vote distributions for anatomical landmarks generated by an Implicit Shape Model for each landmark. These vote distributions represent the image evidence in a more compact form and make the use of a simple 3D stick-figure body model possible since projected 3D marker points of the stick-figure can be compared with vote locations directly with negligible computational costs, which allows to consider near to half a million of different 3D poses per second on standard hardware and further to consider a huge set of 3D pose and configuration hypotheses in each frame. The approach is evaluated on the new Utrecht Multi-Person Motion (UMPM) benchmark with the result of an average joint angle reconstruction error of 8.0°.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sminchisescu, C.: 3D human Motion Reconstruction in Monocular Video. Techniques and Challenges. In: Human Motion Capture: Modeling, Analysis, Animation, vol. 36. Springer (2007) ISBN 978-1-4020-6692-4
Poppe, R.: Vision-based human motion analysis: An overview. CVIU 108, 4–18 (2007)
Ji, X., Liu, H.: Advances in view-invariant human motion analysis: A review. IEEE Transactions on Systems, Man, and Cybernetics, Part C 40, 13–24 (2010)
Bregler, C., Malik, J.: Tracking people with twists and exponential maps, p. 8. IEEE Computer Society, Los Alamitos (1998)
Roth, S., Sigal, L., Black, M.J.: Gibbs likelihoods for bayesian tracking. In: CVPR, pp. 886–893 (2004)
Sminchisescu, C., Triggs, B.: Kinematic jump processes for monocular 3d human tracking. In: CVPR, vol. 1, p. 69 (2003)
Sigal, L., Balan, A.O., Black, M.J.: Combined discriminative and generative articulated pose and non-rigid shape estimation. In: NIPS (2007)
Drummond, T., Cipolla, R.: Real-time tracking of highly articulated structures in the presence of noisy measurements. In: ICCV, pp. 315–320 (2001)
Charles, J., Everingham, M.: Learning shape models for monocular human pose estimation from the microsoft xbox kinect. In: ICCV Workshops, pp. 1202–1208. IEEE (2011)
Leibe, B., Leonardis, A., Schiele, B.: Robust object detection with interleaved categorization and segmentation. IJCV 77, 259–289 (2008)
Müller, J., Arens, M.: Human pose estimation with implicit shape models. In: ACM Artemis, ARTEMIS 2010, pp. 9–14. ACM, New York (2010)
Sigal, L., Black, M.J.: Predicting 3D People from 2D Pictures. In: Perales, F.J., Fisher, R.B. (eds.) AMDO 2006. LNCS, vol. 4069, pp. 185–195. Springer, Heidelberg (2006)
Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: Proc. of CVPR 2010, USA (2010)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection, vol. 1, pp. 886–893 (2005)
Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC, Aberystwyth, UK (2010)
Dollár, P., Tu, Z., Perona, P., Belongie, S.: Integral channel features. In: BMVC (2009)
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: An evaluation of the state of the art. PAMI 99 (2011)
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: CVPR, pp. 1014–1021 (2009)
Taylor, C.J.: Reconstruction of articulated objects from point correspondences in a single uncalibrated image. CVIU 80, 349–363 (2000)
Aa, N.v.d., Luo, X., Giezeman, G., Tan, R., Veltkamp, R.: Utrecht multi-person motion (umpm) benchmark: a multi-person dataset with synchronized video and motion capture data for evaluation of articulated human motion and interaction. In: HICV Workshop, in Conj. with ICCV (2011)
Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. PAMI 28, 44–58 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brauer, J., Hübner, W., Arens, M. (2012). Generative 2D and 3D Human Pose Estimation with Vote Distributions. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2012. Lecture Notes in Computer Science, vol 7431. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33179-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-642-33179-4_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33178-7
Online ISBN: 978-3-642-33179-4
eBook Packages: Computer ScienceComputer Science (R0)