Advertisement

Video-Based People Tracking

  • Marcus A. Brubaker
  • Leonid Sigal
  • David J. Fleet

Abstract

Vision-based human pose tracking promises to be a key enabling technology for myriad applications, including the analysis of human activities for perceptive environments and novel man-machine interfaces. While progress toward that goal has been exciting, and limited applications have been demonstrated, the recovery of human pose from video in unconstrained settings remains challenging. One of the key challenges stems from the complexity of the human kinematic structure itself. The sheer number and variety of joints in the human body (the nature of which is an active area of biomechanics research) entails the estimation of many parameters. The estimation problem is also challenging because muscles and other body tissues obscure the skeletal structure, making it impossible to directly observe the pose of the skeleton.

Keywords

Computer Vision Markov Chain Monte Carlo Prior Model Proposal Distribution Joint Limit 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Agarwal A, Triggs B (2006) Recovering 3d human pose from monocular images. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(1):44–58CrossRefGoogle Scholar
  2. [2]
    Anguelov D, Srinivasan P, Koller D, Thrun S, Rodgers J, Davis J (2005) SCAPE: Shape Completion and Animation of People. ACM Transactions on Graphics 24(3):408–416CrossRefGoogle Scholar
  3. [3]
    Balan A, Black MJ (2008) The naked truth: Estimating body shape under clothing. In: IEEE European Conference on Computer VisionGoogle Scholar
  4. [4]
    Balan AO, Sigal L, Black MJ, Davis JE, Haussecker HW (2007) Detailed human shape and pose from images. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  5. [5]
    Barrow HG, Tenenbaum JM, Bolles RC, Wolf HC (1977) Parametric correspondenceand chamfer matching: Two new techniques for image matching. In: International Joint Conference on Artificial Intelligence, pp 659–663Google Scholar
  6. [6]
    Bo L, Sminchisescu C, Kanaujia A, Metaxas D (2008) Fast algorithms for large scale conditional 3d prediction. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  7. [7]
    Brubaker M, Fleet DJ (2008) The kneed walker for human pose tracking. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  8. [8]
    Brubaker M, Fleet DJ, Hertzmann A (2007) Physics-based person tracking using simplified lower-body dynamics. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  9. [9]
    Choo K, Fleet DJ (2001) People tracking using hybrid Monte Carlo filtering. In: IEEE International Conference on Computer Vision, vol II, pp 321–328Google Scholar
  10. [10]
    Corazza S, Muendermann L, Chaudhari A, Demattio T, Cobelli C, Andriacchi T (2006) A markerless motion capture system to study musculoskeletal biomechanics: visual hull and simulated annealing approach. Annals of Biomedical Engineering 34(6):1019–1029CrossRefGoogle Scholar
  11. [11]
    de la Gorce M, Paragos N, Fleet DJ (2008) Model-based hand tracking with texture, shading and self-occlusions. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  12. [12]
    Demirdjian D, Ko T, Darrell T (2005) Untethered gesture acquisition and recognition for virtual world manipulation. Virtual Reality 8(4):222–230CrossRefGoogle Scholar
  13. [13]
    Deutscher J, Reid I (2005) Articulated body motion capture by stochastic search. International Journal of Computer Vision 61(2):185–205CrossRefGoogle Scholar
  14. [14]
    Doucet A, Godsill S, Andrieu C (2000) On sequential Monte Carlo sampling methods for Bayesian filtering. Statistics and Computing 10(3):197–208CrossRefGoogle Scholar
  15. [15]
    Felzenszwalb P, Huttenlocher DP (2005) Pictorial structures for object recognition. International Journal of Computer Vision 61(1):55–79CrossRefGoogle Scholar
  16. [16]
    Forsyth DA, Ponce J (2003) Computer Vision: A Modern Approach. Prentice HallGoogle Scholar
  17. [17]
    Forsyth DA, Arikan O, Ikemoto L, O’Brien J, Ramanan D (2006) Computational studies of human motion: Part 1, tracking and motion synthesis. Foundations and Trends in Computer Graphics and Vision 1(2&3):1–255Google Scholar
  18. [18]
    Gall J, Potthoff J, Schnorr C, Rosenhahn B, Seidel HP (2007) Interacting and annealing particle filters: Mathematics and a recipe for applications. Journal of Mathematical Imaging and Vision 28:1–18CrossRefMathSciNetGoogle Scholar
  19. [19]
    Gavrila DM, Davis LS (1996) 3-D model-based tracking of humans in action: a multi-view approach. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 73–80Google Scholar
  20. [20]
    Gordon N, Salmond DJ, Smith AFM(1993) Novel approach to nonlinear/non- Gaussian Bayesian state estimation. IEE Proceedings Part F Radar and signal processing 140:107–113Google Scholar
  21. [21]
    Grassia FS (1998) Practical parameterization of rotations using the exponential map. Journal of Graphics Tools 3(3):29–48Google Scholar
  22. [22]
    Herda L, Urtasun R, Fua P (2005) Hierarchical implicit surface joint limits for human body tracking. Computer Vision and Image Understanding 99(2):189–209CrossRefGoogle Scholar
  23. [23]
    Horprasert T, Harwood D, Davis L (1999) A statistical approach for realtime robust background subtraction and shadow detection. In: FRAME-RATE: Frame-rate applications, methods and experiences with regularly available technology and equipmentGoogle Scholar
  24. [24]
    Howe N (2007) Silhouette lookup for monocular 3d pose tracking. Image and Vision Computing 25:331–341CrossRefGoogle Scholar
  25. [25]
    Huttenlocher DP, Klanderman GA, Rucklidge WJ (1993) Comparing images using the Hausdorf distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(9):850–863CrossRefGoogle Scholar
  26. [26]
    Isard M, Blake A (1998) CONDENSATION - conditional density propagation for visual tracking. International Journal of Computer Vision 29(1):5–28CrossRefGoogle Scholar
  27. [27]
    Isard M, MacCormick J (2001) BraMBLe: a bayesian multiple-blob tracker. In: IEEE International Conference on Computer Vision, vol 2, pp 34–41Google Scholar
  28. [28]
    Jepson AD, Fleet DJ, El-Maraghi TF (2003) Robust Online Appearance Models for Visual Tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(25):1296–1311CrossRefGoogle Scholar
  29. [29]
    Kakadiaris L, Metaxas D (2000) Model-based estimation of 3D human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12):1453–1459CrossRefGoogle Scholar
  30. [30]
    Kanaujia A, Sminchisescu C, Metaxas D (2007) Semi-supervised hierarchical models for 3d human pose reconstruction. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  31. [31]
    Kanaujia A, Sminchisescu C, Metaxas D (2007) Spectral latent variable models for perceptual inference. In: IEEE International Conference on Computer VisionGoogle Scholar
  32. [32]
    Kollnig H, Nagel HH (1997) 3d pose estimation by directly matching polyhedral models to gray value gradients. International Journal of Computer Vision 23(3):283–302CrossRefGoogle Scholar
  33. [33]
    Kong A, Liu JS, Wong WH (1994) Sequential imputations and bayesian missing data problems. Journal of the American Statistical Association 89(425):278–288MATHCrossRefGoogle Scholar
  34. [34]
    Kuipers JB (2002) Quaternions and Rotation Sequences: A Primer with Applications to Orbits, Aerospace, and Virtual Reality. Princeton University PressGoogle Scholar
  35. [35]
    Lee CS, Elgammal A (2007) Modeling view and posture manifolds for tracking. In: IEEE International Conference on Computer VisionGoogle Scholar
  36. [36]
    Li R, Tian TP, Sclaroff S (2007) Simultaneous learning of non-linear manifold and dynamical models for high-dimensional time series. In: IEEE International Conference on Computer VisionGoogle Scholar
  37. [37]
    Metaxas D, Terzopoulos D (1993) Shape and nonrigid motion estimation through physics-based synthesis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(6):580–591CrossRefGoogle Scholar
  38. [38]
    Moeslund TB, Hilton A, Krüger V (2006) A survey of advances in visionbased human motion capture and analysis. Computer Vision and Image Understanding 104(2-3):90–126CrossRefGoogle Scholar
  39. [39]
    Mori G, Malik J (2002) Estimating human body configurations using shape context matching. In: IEEE European Conference on Computer Vision, pp 666–680Google Scholar
  40. [40]
    Navaratnam R, Fitzgibbon A, Cipolla R (2007) The joint manifold model for semi-supervised multi-valued regression. In: IEEE International Conference on Computer VisionGoogle Scholar
  41. [41]
    Neal RM(1993) Probabilistic inference using markov chain monte carlo methods. Tech. Rep. CRG-TR-93-1, Department of Computer Science, University of TorontoGoogle Scholar
  42. [42]
    Neal RM (2001) Annealed importance sampling. Statistics and Computing 11:125–139CrossRefMathSciNetGoogle Scholar
  43. [43]
    Nestares O, Fleet DJ (2001) Probabilistic tracking of motion boundaries with spatiotemporal predictions. In: IEEE Conference on Computer Vision and Pattern Recognition, vol II, pp 358–365Google Scholar
  44. [44]
    Ning H, XuW, Gong Y, Huang TS (2008) Latent pose estimator for continuous action recognition. In: IEEE European Conference on Computer VisionGoogle Scholar
  45. [45]
    North B, Blake A (1997) Using expectation-maximisation to learn dynamical models from visual data. In: British Machine Vision ConferenceGoogle Scholar
  46. [46]
    Pavolvic V, Rehg J, Cham TJ, Murphy K (1999) A dynamic bayesian network approach to figure tracking using learned dynamic models. In: IEEE International Conference on Computer Vision, pp 94–101Google Scholar
  47. [47]
    Plankers R, Fua P (2001) Articulated soft objects for video-based body modeling. In: IEEE InternationalConference on Computer Vision, vol 1, pp 394–401Google Scholar
  48. [48]
    Poon E, Fleet DJ (2002) Hybrid Monte Carlo filtering: edge-based people tracking. In: Workshop on Motion and Video Computing, pp 151–158Google Scholar
  49. [49]
    Prati A, Mikic I, Trivedi MM, Cucchiara R (2003) Detecting moving shadows: Algorithms and evaluation. IEEE Transactions on Pattern Analysis and Machine Intelligence 25(7):918–923CrossRefGoogle Scholar
  50. [50]
    Ramanan D, Forsyth DA, Zisserman A (2007) Tracking people by learning their appearance. IEEE Transactions on Pattern Analysis and Machine Intelligence 29:65–81CrossRefGoogle Scholar
  51. [51]
    Rehg J, Kanade T (1995) Model-based tracking of self-occluding articulated objects. In: IEEE International Conference on Computer Vision, pp 612–617Google Scholar
  52. [52]
    Ren L, Shakhnarovich G, Hodgins J, Pfister H, Viola P (2005) Learning silhouette features for control of human motion. ACM Transactions on Graphics 24(4):1303–1331CrossRefGoogle Scholar
  53. [53]
    Rosales R, Sclaroff S (2002) Learning body pose via specialized maps. In: Advances in Neural Information Processing SystemsGoogle Scholar
  54. [54]
    Rosenhahn B, Kersting U, Powel K, Seidel HP (2006) Cloth X-Ray: MoCap of people wearing textiles. In: Pattern Recognition, DAGM 86 Marcus A. Brubaker, Leonid Sigal and David J. FleetGoogle Scholar
  55. [55]
    Shakhnarovich G, Viola P, Darrell TJ (2003) Fast pose estimation with parameter-sensitive hashing. In: IEEE International Conference on Computer Vision, pp 750–757Google Scholar
  56. [56]
    Sidenbladh H, Black M, Fleet D (2000) Stochastic tracking of 3d human figures using 2d image motion. In: IEEE European Conference on Computer Vision, vol 2, pp 702–718Google Scholar
  57. [57]
    Sidenbladh H, Black MJ, Sigal L (2002) Implicit probabilistic models of human motion for synthesis and tracking. In: IEEE European Conference on Computer Vision, vol 1, pp 784–800Google Scholar
  58. [58]
    Sigal L, Black MJ (2006) Measure locally, reason globally: Occlusionsensitive articulated pose estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 2041–2048Google Scholar
  59. [59]
    Sigal L, Balan A, Black MJ (2007) Combined discriminative and generative articulated pose and non-rigid shape estimation. In: Advances in Neural Information Processing SystemsGoogle Scholar
  60. [60]
    Sminchisescu C, Jepson A (2004) Generative modeling for continuous nonlinearly embedded visual inference. In: International Conference on Machine Learning, pp 759–766Google Scholar
  61. [61]
    Sminchisescu C, Kanaujia A, Li Z, Metaxas D (2005) Discriminative density propagation for 3d human motion estimation. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 1, pp 390–397Google Scholar
  62. [62]
    Sminchisescu C, Kanajujia A, Metaxas D (2006) Learning joint top-down and bottom-up processes for 3d visual inference. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 1743–1752Google Scholar
  63. [63]
    Stauffer C, Grimson W (1999) Adaptive background mixture models for realtime tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 2, pp 246–252Google Scholar
  64. [64]
    Stenger BDR (2004) Model-based hand tracking using a hierarchical bayesian filter. PhD thesis, University of CambridgeGoogle Scholar
  65. [65]
    Sukel K, Catrambone R, Essa I, Brostow G (2003) Presenting movement in a computer-based dance tutor. International Journal of Human-Computer Interaction 15(3):433–452CrossRefGoogle Scholar
  66. [66]
    Taylor CJ (2000) Reconstruction of articulated objects from point correspondences in a single uncalibrated image. Computer Vision and Image Understanding 80(10):349–363MATHCrossRefGoogle Scholar
  67. [67]
    Tomasi C, Kanade T (1991) Detection and tracking of point features. Tech. Rep. CMU-CS-91-132, Carnegie Mellon UniversityGoogle Scholar
  68. [68]
    Urtasun R, Darrell T (2008) Local probabilistic regression for activityindependent human pose inference. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  69. [69]
    Urtasun R, Fleet DJ, Hertzmann A, Fua P (2005) Priors for people tracking from small training sets. In: IEEE International Conference on Computer Vision, vol 1, pp 403–410Google Scholar
  70. [70]
    Urtasun R, Fleet DJ, Fua P (2006) 3D people tracking with gaussian process dynamical models. In: IEEE Conference on Computer Vision and Pattern Recognition, vol 1, pp 238–245Google Scholar
  71. [71]
    Urtasun R, Fleet DJ, Fua P (2006) Motion models for 3D people tracking. Computer Vision and Image Understanding 104(2-3):157–177CrossRefGoogle Scholar
  72. [72]
    Vondrak M, Sigal L, Jenkins OC (2008) Physical simulation for probabilistic motion tracking. In: IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  73. [73]
    Wachter S, Nagel HH (1999) Tracking persons in monocular image sequences. Computer Vision and Image Understanding 74(3):174–192CrossRefGoogle Scholar
  74. [74]
    Wang JM, Fleet DJ, Hertzmann A (2006) Gaussian process dynamical models. In: Advances in Neural Information Processing Systems 18, pp 1441–1448Google Scholar
  75. [75]
    Wren CR, Pentland A (1998) Dynamic models of human motion. In: IEEE International Conference on Automatic Face and Gesture Recognition, pp 22–27Google Scholar
  76. [76]
    Wren CR, Azarbayejani A, Darrell T, Pentland AP (1997) Pfinder: Real-time tracking of the human body. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7):780–785CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations