Skip to main content
Book cover

Human Motion pp 185–211Cite as

3D Human Motion Analysis in Monocular Video: Techniques and Challenges

  • Chapter

Part of the book series: Computational Imaging and Vision ((CIVI,volume 36))

Extracting meaningful 3D human motion information from video sequences is of interest for applications like intelligent human–computer interfaces, biometrics, video browsing and indexing, virtual reality or video surveillance. Analyzing videos of humans in unconstrained environments is an open and currently active research problem, facing outstanding scientific and computational challenges. The proportions of the human body vary largely across individuals, due to gender, age, weight or race. Aside from this variability, any single human body has many degrees of freedom due to articulation and the individual limbs are deformable due to moving muscle and clothing. Finally, real-world events involve multiple interacting humans occluded by each other or by other objects and the scene conditions may also vary due to camera motion or lighting changes. All these factors make appropriate models of human structure, motion and action difficult to construct and difficult to estimate from images. In this chapter we give an overview of the problem of reconstructing 3D human motion using sequences of images acquired with a single video camera. We explain the difficulties involved, discuss ways to address them using generative and discriminative models and speculate on open problems and future research directions.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. CMU Human Motion Capture DataBase. Available online at http://mocap.cs. cmu.edu/search.html, 2003.

  2. Agarwal A. and Triggs B. Monocular human motion capture with a mixture of regressors. In Workshop on Vision for Human Computer Interaction, 2005.

    Google Scholar 

  3. Allen B., Curless B., and Popovic Z. The space of human body shapes: recon-struction and parameterization from range scans. In SIGGRAPH, 2003.

    Google Scholar 

  4. Belkin M. and Niyogi P. Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. In Advances in Neural Information Processing Sys-tems, 2002.

    Google Scholar 

  5. Bertero M., Poggio T., and Torre V. Ill-posed Problems in Early Vision. Proc. of IEEE, 1988.

    Google Scholar 

  6. Bishop C. and Svensen M. Bayesian mixtures of experts. In Uncertainty in Artificial Intelligence, 2003.

    Google Scholar 

  7. Blake A. and Isard M. Active Contours. Springer, 2000.

    Google Scholar 

  8. Brand M. Shadow Puppetry. In IEEE International Conference on Computer Vision, pp. 1237-44, 1999.

    Google Scholar 

  9. Bregler C. and Malik J. Tracking People with Twists and Exponential Maps. In IEEE International Conference on Computer Vision and Pattern Recognition, 1998.

    Google Scholar 

  10. Carranza J., Theobalt C., Magnor M., and Seidel H.-P. Free-viewpoint video of human actors. In SIGGRAPH, 2003.

    Google Scholar 

  11. Cham T. and Rehg J. A Multiple Hypothesis Approach to Figure Tracking. In IEEE International Conference on Computer Vision and Pattern Recognition, vol 2, pp. 239-245, 1999.

    Google Scholar 

  12. Choo K. and Fleet D. People Tracking Using Hybrid Monte Carlo Filtering. In IEEE International Conference on Computer Vision, 2001.

    Google Scholar 

  13. Deutscher J., Blake A., and Reid I. Articulated Body Motion Capture by An-nealed Particle Filtering. In IEEE International Conference on Computer Vision and Pattern Recognition, 2000.

    Google Scholar 

  14. Donoho D. and Grimes C. Hessian Eigenmaps: Locally Linear Embedding Tech-niques for High-dimensional Data. Proceeding of the National Acadamy of Arts and Sciences, 2003.

    Google Scholar 

  15. Donoho D. and Grimes C. When Does ISOMAP Recover the Natural Parameter-ization of Families of Articulated Images? Technical report, Dept. of Statistics, Stanford University, 2003.

    Google Scholar 

  16. Drummond T. and Cipolla R. Real-time Tracking of Highly Articulated Struc-tures in the Presence of Noisy Measurements. In IEEE International Conference on Computer Vision, 2001.

    Google Scholar 

  17. Duane S., Kennedy A.D., Pendleton B.J., and Roweth D. Hybrid Monte Carlo. Physics Letters B, 195(2): 216-222, 1987.

    Article  Google Scholar 

  18. Elgammal A. and Lee C. Inferring 3d body pose from silhouettes using activity manifold learning. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

  19. Gavrila D. The Visual Analysis of Human Movement: A Survey. Computer Vision and Image Understanding, 73(1):82-98, 1999.

    Article  MATH  Google Scholar 

  20. Gordon N., Salmond D., and Smith A. Novel Approach to Non-linear/Non-Gaussian State Estimation. IEE Proceedings F, 1993.

    Google Scholar 

  21. Howe N., Leventon M., and Freeman W. Bayesian Reconstruction of 3D Human Motion from Single-Camera Video. Advances in Neural Information Processing Systems, 1999.

    Google Scholar 

  22. Isard M. and Blake A. A Smoothing Filter for CONDENSATION. In European Conference on Computer Vision, 1998.

    Google Scholar 

  23. Isard M. and Blake A. CONDENSATION - Conditional Density Propagation for Visual Tracking. International Journal of Computer Vision, 1998.

    Google Scholar 

  24. Isard M. and Blake A. Icondensation: Unifying low-level and high-level tracking in a stochastic framework. In European Conference on Computer Vision, 1998.

    Google Scholar 

  25. Jordan M. Learning in Graphical Models. MIT Press, 1998.

    Google Scholar 

  26. Kakadiaris I. and Metaxas D. Model-Based Estimation of 3D Human Motion with Occlusion Prediction Based on Active Multi-Viewpoint Selection. In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 81-87,1996.

    Google Scholar 

  27. Kehl R., Bray M., and Gool L.V. Full body tracking from multiple views using stochastic sampling. In IEEE International Conference on Computer Vision and Pattern Recognition, 2005.

    Google Scholar 

  28. Lan X. and Huttenlocher D. Beyond trees: common factor models for 2d human pose recovery. In IEEE International Conference on Computer Vision, 2005.

    Google Scholar 

  29. Lee H.J. and Chen Z.. Determination of 3D Human Body Postures from a Single View. Computer Vision, Graphics and Image Processing, 30:148-168, 1985.

    Article  MathSciNet  Google Scholar 

  30. Lee M. and Cohen I. Proposal maps driven mcmc for estimating human body pose in static images. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

  31. Li R., Yang M., Sclaroff S., and Tian T. Monocular Tracking of 3D Human Mo-tion with a Coordianted Mixture of Factor Analyzers. In European Conference on Computer Vision, 2006.

    Google Scholar 

  32. Mackay D. Bayesian Interpolation. Neural Computation, 4(5):720-736, 1992.

    Article  Google Scholar 

  33. McCallum A., Freitag D., and Pereira F. Maximum entropy Markov models for information extraction and segmentation. In International Conference on Machine Learning, 2000.

    Google Scholar 

  34. Mori G. and Malik J. Estimating Human Body Configurations Using Shape Context Matching. In European Conference on Computer Vision, 2002.

    Google Scholar 

  35. Neal R. Annealed Importance Sampling. Statistics and Computing, 11:125-139, 2001.

    Article  MathSciNet  Google Scholar 

  36. Ramanan D. and Sminchisescu C. Training Deformable Models for Localization. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

    Google Scholar 

  37. Rosales R. and Sclaroff S. Learning Body Pose Via Specialized Maps. In Ad- vances in Neural Information Processing Systems, 2002.

    Google Scholar 

  38. Roth S., Sigal L., and Black M. Gibbs Likelihoods for Bayesian Tracking. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

  39. Roweis S. and Saul L. Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science, 2000.

    Google Scholar 

  40. Schölkopf B., Smola A. and Müller K. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Computation, 10:1299-1319, 1998.

    Article  Google Scholar 

  41. Shakhnarovich G., Viola P., and Darrell T. Fast Pose Estimation with Parameter Sensitive Hashing. In IEEE International Conference on Computer Vision, 2003.

    Google Scholar 

  42. Sidenbladh H. and Black M. Learning Image Statistics for Bayesian Tracking. In IEEE International Conference on Computer Vision, 2001.

    Google Scholar 

  43. Sidenbladh H., Black M., and Fleet D. Stochastic Tracking of 3D Human Figures Using 2D Image Motion. In European Conference on Computer Vision, 2000.

    Google Scholar 

  44. Sigal L., Bhatia S., Roth S., Black M., and Isard M. Tracking Loose-limbed People. In IEEE International Conference on Computer Vision and Pattern Recognition, 2004.

    Google Scholar 

  45. Sminchisescu C. Consistency and Coupling in Human Model Likelihoods. In IEEE International Conference on Automatic Face and Gesture Recognition, pages 27-32, Washington DC, 2002.

    Google Scholar 

  46. Sminchisescu C. and Jepson A. Density propagation for continuous temporal chains. Generative and discriminative models. Technical Report CSRG-401, University of Toronto, October 2004.

    Google Scholar 

  47. Sminchisescu C. and Jepson A. Generative modelling for Continuous Non-Linearly Embedded Visual Inference. In International Conference on Machine Learning, pp. 759-766, Banff, 2004.

    Google Scholar 

  48. Sminchisescu C. and Jepson A. Variational Mixture Smoothing for Non-Linear Dynamical Systems. In IEEE International Conference on Computer Vision and Pattern Recognition, vol 2, pp. 608-615, Washington DC, 2004.

    Google Scholar 

  49. Sminchisescu C., Kanaujia A., Li Z., and Metaxas D. Learning to reconstruct 3D human motion from Bayesian mixtures of experts. A probabilistic discriminative approach. Technical Report CSRG-502, University of Toronto, October, 2004.

    Google Scholar 

  50. Sminchisescu C., Kanaujia A., Li Z., and Metaxas D. Conditional models for contextual human motion recognition. In IEEE International Conference on Computer Vision, vol 2, pp. 1808-1815, 2005.

    Google Scholar 

  51. Sminchisescu C., Kanaujia A., Li Z., and Metaxas D. Discriminative Density Propagation for 3D Human Motion Estimation. In IEEE International Confer-ence on Computer Vision and Pattern Recognition, vol 1, pp. 390-397, 2005.

    Google Scholar 

  52. Sminchisescu C., Kanaujia A. and Metaxas D. BM 3 E : Discriminative Density Propagation for Visual Tracking. In IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007.

    Google Scholar 

  53. Sminchisescu C., Kanaujia A., and Metaxas D. Learning Joint Top-down and Bottom-up Processes for 3D Visual Inference. In IEEE International Conference on Computer Vision and Pattern Recognition, 2006.

    Google Scholar 

  54. Sminchisescu C. and Triggs B. Building Roadmaps of Local Minima of Vi-sual Models. In European Conference on Computer Vision, vol 1, pp. 566-582, Copenhagen, 2002.

    Google Scholar 

  55. Sminchisescu C. and Triggs B. Hyperdynamics Importance Sampling. In Euro-pean Conference on Computer Vision, vol 1, pp. 769-783, Copenhagen, 2002.

    Google Scholar 

  56. Sminchisescu C. and Triggs B. Estimating Articulated Human Motion with Covariance Scaled Sampling. International Journal of Robotics Research, 22 (6):371-393, 2003.

    Article  Google Scholar 

  57. Sminchisescu C. and Triggs B. Kinematic Jump Processes for Monocular 3D Human Tracking. In IEEE International Conference on Computer Vision and Pattern Recognition, vol 1, pp. 69-76, Madison, 2003.

    Google Scholar 

  58. Sminchisescu C. and Welling M. Generalized Darting Monte-Carlo. In 9th International Conference on Artificial Intelligence and Statistics, 2007.

    Google Scholar 

  59. Sudderth E., Ihler A., Freeman W., and Wilsky A. Non-parametric belief prop-agation. In IEEE International Conference on Computer Vision and Pattern Recognition, 2003.

    Google Scholar 

  60. Taylor C.J. Reconstruction of Articulated Objects from Point Correspondences in a Single Uncalibrated Image. In IEEE International Conference on Computer Vision and Pattern Recognition, pp. 677-684, 2000.

    Google Scholar 

  61. Tenenbaum J., Silva V., and Langford J. A Global Geometric Framewok for Nonlinear Dimensionality Reduction. Science, 2000.

    Google Scholar 

  62. .Tipping M. Sparse Bayesian learning and the Relevance Vector Machine. Jour- nal of Machine Learning Research, 2001.

    Google Scholar 

  63. Tomasi C., Petrov S., and Sastry A. 3d tracking = classification + interpolation. In IEEE International Conference on Computer Vision, 2003.

    Google Scholar 

  64. Urtasun R., Fleet D., Hertzmann A., and Fua P. Priors for people tracking in small training sets. In IEEE International Conference on Computer Vision, 2005.

    Google Scholar 

  65. Wachter S. and Nagel H. Tracking Persons in Monocular Image Sequences. Computer Vision and Image Understanding, 74(3):174-192, 1999.

    Article  Google Scholar 

  66. Waterhouse S., Mackay D., and Robinson T. Bayesian Methods for Mixtures of Experts. In Advances in Neural Information Processing Systems, 1996.

    Google Scholar 

  67. Weston J., Chapelle O., Elisseeff A., Schölkopf B., and Vapnik V. Kernel De-pendency Estimation. In Advances in Neural Information Processing Systems, 2002.

    Google Scholar 

  68. Wipf D., Palmer J., and Rao B. Perspectives on Sparse Bayesian Learning. In Advances in Neural Information Processing Systems, 2003.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer

About this chapter

Cite this chapter

Sminchisescu, C. (2008). 3D Human Motion Analysis in Monocular Video: Techniques and Challenges. In: Rosenhahn, B., Klette, R., Metaxas, D. (eds) Human Motion. Computational Imaging and Vision, vol 36. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6693-1_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-6693-1_8

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-6692-4

  • Online ISBN: 978-1-4020-6693-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics